go/src/cmd/compile/internal/ssa/debug.go

1085 lines
30 KiB
Go
Raw Normal View History

[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
// Copyright 2017 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package ssa
import (
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
"cmd/internal/dwarf"
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
"cmd/internal/obj"
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
"encoding/hex"
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
"fmt"
"math/bits"
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
"sort"
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
"strings"
)
type SlotID int32
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
type VarID int32
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
// A FuncDebug contains all the debug information for the variables in a
// function. Variables are identified by their LocalSlot, which may be the
// result of decomposing a larger variable.
type FuncDebug struct {
// Slots is all the slots used in the debug info, indexed by their SlotID.
Slots []LocalSlot
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// The user variables, indexed by VarID.
Vars []GCNode
// The slots that make up each variable, indexed by VarID.
VarSlots [][]SlotID
// The location list data, indexed by VarID. Must be processed by PutLocationList.
LocationLists [][]byte
// Filled in by the user. Translates Block and Value ID to PC.
GetPC func(ID, ID) int64
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
type BlockDebug struct {
// Whether the block had any changes to user variables at all.
relevant bool
// State at the end of the block if it's fully processed. Immutable once initialized.
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
endState []liveSlot
}
cmd/compile: cover control flow insns in location lists The information that's used to generate DWARF location lists is very ssa.Value centric; it uses Values as start and end coordinates to define ranges. That mostly works fine, but control flow instructions don't come from Values, so the ranges couldn't cover them. Control flow instructions are generated when the SSA representation is converted to assembly, so that's the best place to extend the ranges to cover them. (Before that, there's nothing to refer to, and afterward the boundaries between blocks have been lost.) That requires block information in the debugInfo type, which then flows down to make everything else awkward. On the plus side, there's a little less copying slices around than there used to be, so it should be a little faster. Previously, the ranges for empty blocks were not very meaningful. That was fine, because they had no Values to cover, so no debug information was generated for them. But they do have control flow instructions (that's why they exist) and so now it's important that the information be correct. Introduce two sentinel values, BlockStart and BlockEnd, that denote the boundary of a block, even if the block is empty. BlockEnd replaces the previous SurvivedBlock flag. There's one more problem: the last instruction in the function will be a control flow instruction, so any live ranges need to be extended past it. But there's no instruction after it to use as the end of the range. Instead, leave the EndProg field of those ranges as nil and fix it up to point to past the end of the assembled text at the very last moment. Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b Reviewed-on: https://go-review.googlesource.com/63010 Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// A liveSlot is a slot that's live in loc at entry/exit of a block.
type liveSlot struct {
// An inlined VarLoc, so it packs into 16 bytes instead of 20.
Registers RegisterSet
StackOffset
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
slot SlotID
}
func (loc liveSlot) absent() bool {
return loc.Registers == 0 && !loc.onStack()
}
// StackOffset encodes whether a value is on the stack and if so, where. It is
// a 31-bit integer followed by a presence flag at the low-order bit.
type StackOffset int32
func (s StackOffset) onStack() bool {
return s != 0
}
func (s StackOffset) stackOffsetValue() int32 {
return int32(s) >> 1
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// stateAtPC is the current state of all variables at some point.
type stateAtPC struct {
// The location of each known slot, indexed by SlotID.
slots []VarLoc
// The slots present in each register, indexed by register number.
registers [][]SlotID
cmd/compile: cover control flow insns in location lists The information that's used to generate DWARF location lists is very ssa.Value centric; it uses Values as start and end coordinates to define ranges. That mostly works fine, but control flow instructions don't come from Values, so the ranges couldn't cover them. Control flow instructions are generated when the SSA representation is converted to assembly, so that's the best place to extend the ranges to cover them. (Before that, there's nothing to refer to, and afterward the boundaries between blocks have been lost.) That requires block information in the debugInfo type, which then flows down to make everything else awkward. On the plus side, there's a little less copying slices around than there used to be, so it should be a little faster. Previously, the ranges for empty blocks were not very meaningful. That was fine, because they had no Values to cover, so no debug information was generated for them. But they do have control flow instructions (that's why they exist) and so now it's important that the information be correct. Introduce two sentinel values, BlockStart and BlockEnd, that denote the boundary of a block, even if the block is empty. BlockEnd replaces the previous SurvivedBlock flag. There's one more problem: the last instruction in the function will be a control flow instruction, so any live ranges need to be extended past it. But there's no instruction after it to use as the end of the range. Instead, leave the EndProg field of those ranges as nil and fix it up to point to past the end of the assembled text at the very last moment. Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b Reviewed-on: https://go-review.googlesource.com/63010 Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// reset fills state with the live variables from live.
func (state *stateAtPC) reset(live []liveSlot) {
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
slots, registers := state.slots, state.registers
for i := range slots {
slots[i] = VarLoc{}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
for i := range registers {
registers[i] = registers[i][:0]
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
for _, live := range live {
slots[live.slot] = VarLoc{live.Registers, live.StackOffset}
if live.Registers == 0 {
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
continue
}
mask := uint64(live.Registers)
for {
if mask == 0 {
break
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
reg := uint8(bits.TrailingZeros64(mask))
mask &^= 1 << reg
registers[reg] = append(registers[reg], live.slot)
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
}
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
state.slots, state.registers = slots, registers
cmd/compile: cover control flow insns in location lists The information that's used to generate DWARF location lists is very ssa.Value centric; it uses Values as start and end coordinates to define ranges. That mostly works fine, but control flow instructions don't come from Values, so the ranges couldn't cover them. Control flow instructions are generated when the SSA representation is converted to assembly, so that's the best place to extend the ranges to cover them. (Before that, there's nothing to refer to, and afterward the boundaries between blocks have been lost.) That requires block information in the debugInfo type, which then flows down to make everything else awkward. On the plus side, there's a little less copying slices around than there used to be, so it should be a little faster. Previously, the ranges for empty blocks were not very meaningful. That was fine, because they had no Values to cover, so no debug information was generated for them. But they do have control flow instructions (that's why they exist) and so now it's important that the information be correct. Introduce two sentinel values, BlockStart and BlockEnd, that denote the boundary of a block, even if the block is empty. BlockEnd replaces the previous SurvivedBlock flag. There's one more problem: the last instruction in the function will be a control flow instruction, so any live ranges need to be extended past it. But there's no instruction after it to use as the end of the range. Instead, leave the EndProg field of those ranges as nil and fix it up to point to past the end of the assembled text at the very last moment. Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b Reviewed-on: https://go-review.googlesource.com/63010 Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
}
func (s *debugState) LocString(loc VarLoc) string {
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
if loc.absent() {
return "<nil>"
}
cmd/compile: cover control flow insns in location lists The information that's used to generate DWARF location lists is very ssa.Value centric; it uses Values as start and end coordinates to define ranges. That mostly works fine, but control flow instructions don't come from Values, so the ranges couldn't cover them. Control flow instructions are generated when the SSA representation is converted to assembly, so that's the best place to extend the ranges to cover them. (Before that, there's nothing to refer to, and afterward the boundaries between blocks have been lost.) That requires block information in the debugInfo type, which then flows down to make everything else awkward. On the plus side, there's a little less copying slices around than there used to be, so it should be a little faster. Previously, the ranges for empty blocks were not very meaningful. That was fine, because they had no Values to cover, so no debug information was generated for them. But they do have control flow instructions (that's why they exist) and so now it's important that the information be correct. Introduce two sentinel values, BlockStart and BlockEnd, that denote the boundary of a block, even if the block is empty. BlockEnd replaces the previous SurvivedBlock flag. There's one more problem: the last instruction in the function will be a control flow instruction, so any live ranges need to be extended past it. But there's no instruction after it to use as the end of the range. Instead, leave the EndProg field of those ranges as nil and fix it up to point to past the end of the assembled text at the very last moment. Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b Reviewed-on: https://go-review.googlesource.com/63010 Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
var storage []string
if loc.onStack() {
cmd/compile: cover control flow insns in location lists The information that's used to generate DWARF location lists is very ssa.Value centric; it uses Values as start and end coordinates to define ranges. That mostly works fine, but control flow instructions don't come from Values, so the ranges couldn't cover them. Control flow instructions are generated when the SSA representation is converted to assembly, so that's the best place to extend the ranges to cover them. (Before that, there's nothing to refer to, and afterward the boundaries between blocks have been lost.) That requires block information in the debugInfo type, which then flows down to make everything else awkward. On the plus side, there's a little less copying slices around than there used to be, so it should be a little faster. Previously, the ranges for empty blocks were not very meaningful. That was fine, because they had no Values to cover, so no debug information was generated for them. But they do have control flow instructions (that's why they exist) and so now it's important that the information be correct. Introduce two sentinel values, BlockStart and BlockEnd, that denote the boundary of a block, even if the block is empty. BlockEnd replaces the previous SurvivedBlock flag. There's one more problem: the last instruction in the function will be a control flow instruction, so any live ranges need to be extended past it. But there's no instruction after it to use as the end of the range. Instead, leave the EndProg field of those ranges as nil and fix it up to point to past the end of the assembled text at the very last moment. Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b Reviewed-on: https://go-review.googlesource.com/63010 Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
storage = append(storage, "stack")
}
mask := uint64(loc.Registers)
for {
if mask == 0 {
break
cmd/compile: cover control flow insns in location lists The information that's used to generate DWARF location lists is very ssa.Value centric; it uses Values as start and end coordinates to define ranges. That mostly works fine, but control flow instructions don't come from Values, so the ranges couldn't cover them. Control flow instructions are generated when the SSA representation is converted to assembly, so that's the best place to extend the ranges to cover them. (Before that, there's nothing to refer to, and afterward the boundaries between blocks have been lost.) That requires block information in the debugInfo type, which then flows down to make everything else awkward. On the plus side, there's a little less copying slices around than there used to be, so it should be a little faster. Previously, the ranges for empty blocks were not very meaningful. That was fine, because they had no Values to cover, so no debug information was generated for them. But they do have control flow instructions (that's why they exist) and so now it's important that the information be correct. Introduce two sentinel values, BlockStart and BlockEnd, that denote the boundary of a block, even if the block is empty. BlockEnd replaces the previous SurvivedBlock flag. There's one more problem: the last instruction in the function will be a control flow instruction, so any live ranges need to be extended past it. But there's no instruction after it to use as the end of the range. Instead, leave the EndProg field of those ranges as nil and fix it up to point to past the end of the assembled text at the very last moment. Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b Reviewed-on: https://go-review.googlesource.com/63010 Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
}
reg := uint8(bits.TrailingZeros64(mask))
mask &^= 1 << reg
storage = append(storage, s.registers[reg].String())
cmd/compile: cover control flow insns in location lists The information that's used to generate DWARF location lists is very ssa.Value centric; it uses Values as start and end coordinates to define ranges. That mostly works fine, but control flow instructions don't come from Values, so the ranges couldn't cover them. Control flow instructions are generated when the SSA representation is converted to assembly, so that's the best place to extend the ranges to cover them. (Before that, there's nothing to refer to, and afterward the boundaries between blocks have been lost.) That requires block information in the debugInfo type, which then flows down to make everything else awkward. On the plus side, there's a little less copying slices around than there used to be, so it should be a little faster. Previously, the ranges for empty blocks were not very meaningful. That was fine, because they had no Values to cover, so no debug information was generated for them. But they do have control flow instructions (that's why they exist) and so now it's important that the information be correct. Introduce two sentinel values, BlockStart and BlockEnd, that denote the boundary of a block, even if the block is empty. BlockEnd replaces the previous SurvivedBlock flag. There's one more problem: the last instruction in the function will be a control flow instruction, so any live ranges need to be extended past it. But there's no instruction after it to use as the end of the range. Instead, leave the EndProg field of those ranges as nil and fix it up to point to past the end of the assembled text at the very last moment. Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b Reviewed-on: https://go-review.googlesource.com/63010 Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
return strings.Join(storage, ",")
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// A VarLoc describes the storage for part of a user variable.
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
type VarLoc struct {
// The registers this variable is available in. There can be more than
// one in various situations, e.g. it's being moved between registers.
Registers RegisterSet
StackOffset
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
func (loc VarLoc) absent() bool {
return loc.Registers == 0 && !loc.onStack()
cmd/compile: cover control flow insns in location lists The information that's used to generate DWARF location lists is very ssa.Value centric; it uses Values as start and end coordinates to define ranges. That mostly works fine, but control flow instructions don't come from Values, so the ranges couldn't cover them. Control flow instructions are generated when the SSA representation is converted to assembly, so that's the best place to extend the ranges to cover them. (Before that, there's nothing to refer to, and afterward the boundaries between blocks have been lost.) That requires block information in the debugInfo type, which then flows down to make everything else awkward. On the plus side, there's a little less copying slices around than there used to be, so it should be a little faster. Previously, the ranges for empty blocks were not very meaningful. That was fine, because they had no Values to cover, so no debug information was generated for them. But they do have control flow instructions (that's why they exist) and so now it's important that the information be correct. Introduce two sentinel values, BlockStart and BlockEnd, that denote the boundary of a block, even if the block is empty. BlockEnd replaces the previous SurvivedBlock flag. There's one more problem: the last instruction in the function will be a control flow instruction, so any live ranges need to be extended past it. But there's no instruction after it to use as the end of the range. Instead, leave the EndProg field of those ranges as nil and fix it up to point to past the end of the assembled text at the very last moment. Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b Reviewed-on: https://go-review.googlesource.com/63010 Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
}
var BlockStart = &Value{
ID: -10000,
Op: OpInvalid,
Aux: "BlockStart",
}
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
cmd/compile: cover control flow insns in location lists The information that's used to generate DWARF location lists is very ssa.Value centric; it uses Values as start and end coordinates to define ranges. That mostly works fine, but control flow instructions don't come from Values, so the ranges couldn't cover them. Control flow instructions are generated when the SSA representation is converted to assembly, so that's the best place to extend the ranges to cover them. (Before that, there's nothing to refer to, and afterward the boundaries between blocks have been lost.) That requires block information in the debugInfo type, which then flows down to make everything else awkward. On the plus side, there's a little less copying slices around than there used to be, so it should be a little faster. Previously, the ranges for empty blocks were not very meaningful. That was fine, because they had no Values to cover, so no debug information was generated for them. But they do have control flow instructions (that's why they exist) and so now it's important that the information be correct. Introduce two sentinel values, BlockStart and BlockEnd, that denote the boundary of a block, even if the block is empty. BlockEnd replaces the previous SurvivedBlock flag. There's one more problem: the last instruction in the function will be a control flow instruction, so any live ranges need to be extended past it. But there's no instruction after it to use as the end of the range. Instead, leave the EndProg field of those ranges as nil and fix it up to point to past the end of the assembled text at the very last moment. Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b Reviewed-on: https://go-review.googlesource.com/63010 Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
var BlockEnd = &Value{
ID: -20000,
Op: OpInvalid,
Aux: "BlockEnd",
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
// RegisterSet is a bitmap of registers, indexed by Register.num.
type RegisterSet uint64
// logf prints debug-specific logging to stdout (always stdout) if the current
// function is tagged by GOSSAFUNC (for ssa output directed either to stdout or html).
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
func (s *debugState) logf(msg string, args ...interface{}) {
if s.f.PrintOrHtmlSSA {
fmt.Printf(msg, args...)
}
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
type debugState struct {
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// See FuncDebug.
slots []LocalSlot
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
vars []GCNode
varSlots [][]SlotID
lists [][]byte
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// The user variable that each slot rolls up to, indexed by SlotID.
slotVars []VarID
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
f *Func
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
loggingEnabled bool
registers []Register
stackOffset func(LocalSlot) int32
ctxt *obj.Link
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// The names (slots) associated with each value, indexed by Value ID.
valueNames [][]SlotID
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// The current state of whatever analysis is running.
currentState stateAtPC
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
liveCount []int
changedVars *sparseSet
// The pending location list entry for each user variable, indexed by VarID.
pendingEntries []pendingEntry
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
varParts map[GCNode][]SlotID
blockDebug []BlockDebug
pendingSlotLocs []VarLoc
liveSlots []liveSlot
liveSlotSliceBegin int
partsByVarOffset sort.Interface
}
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
func (state *debugState) initializeCache(f *Func, numVars, numSlots int) {
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
// One blockDebug per block. Initialized in allocBlock.
if cap(state.blockDebug) < f.NumBlocks() {
state.blockDebug = make([]BlockDebug, f.NumBlocks())
} else {
// This local variable, and the ones like it below, enable compiler
// optimizations. Don't inline them.
b := state.blockDebug[:f.NumBlocks()]
for i := range b {
b[i] = BlockDebug{}
}
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
}
// A list of slots per Value. Reuse the previous child slices.
if cap(state.valueNames) < f.NumValues() {
old := state.valueNames
state.valueNames = make([][]SlotID, f.NumValues())
copy(state.valueNames, old)
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
}
vn := state.valueNames[:f.NumValues()]
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
for i := range vn {
vn[i] = vn[i][:0]
}
// Slot and register contents for currentState. Cleared by reset().
if cap(state.currentState.slots) < numSlots {
state.currentState.slots = make([]VarLoc, numSlots)
} else {
state.currentState.slots = state.currentState.slots[:numSlots]
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
}
if cap(state.currentState.registers) < len(state.registers) {
state.currentState.registers = make([][]SlotID, len(state.registers))
} else {
state.currentState.registers = state.currentState.registers[:len(state.registers)]
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
}
// Used many times by mergePredecessors.
if cap(state.liveCount) < numSlots {
state.liveCount = make([]int, numSlots)
} else {
state.liveCount = state.liveCount[:numSlots]
}
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
// A relatively small slice, but used many times as the return from processValue.
state.changedVars = newSparseSet(numVars)
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
// A pending entry per user variable, with space to track each of its pieces.
numPieces := 0
for i := range state.varSlots {
numPieces += len(state.varSlots[i])
}
if cap(state.pendingSlotLocs) < numPieces {
state.pendingSlotLocs = make([]VarLoc, numPieces)
} else {
psl := state.pendingSlotLocs[:numPieces]
for i := range psl {
psl[i] = VarLoc{}
}
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
}
if cap(state.pendingEntries) < numVars {
state.pendingEntries = make([]pendingEntry, numVars)
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
}
pe := state.pendingEntries[:numVars]
freePieceIdx := 0
for varID, slots := range state.varSlots {
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
pe[varID] = pendingEntry{
pieces: state.pendingSlotLocs[freePieceIdx : freePieceIdx+len(slots)],
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
}
freePieceIdx += len(slots)
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
}
state.pendingEntries = pe
if cap(state.lists) < numVars {
state.lists = make([][]byte, numVars)
} else {
state.lists = state.lists[:numVars]
for i := range state.lists {
state.lists[i] = nil
}
}
state.liveSlots = state.liveSlots[:0]
state.liveSlotSliceBegin = 0
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
}
func (state *debugState) allocBlock(b *Block) *BlockDebug {
return &state.blockDebug[b.ID]
}
func (state *debugState) appendLiveSlot(ls liveSlot) {
state.liveSlots = append(state.liveSlots, ls)
}
func (state *debugState) getLiveSlotSlice() []liveSlot {
s := state.liveSlots[state.liveSlotSliceBegin:]
state.liveSlotSliceBegin = len(state.liveSlots)
return s
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
func (s *debugState) blockEndStateString(b *BlockDebug) string {
endState := stateAtPC{slots: make([]VarLoc, len(s.slots)), registers: make([][]SlotID, len(s.registers))}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
endState.reset(b.endState)
return s.stateString(endState)
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
func (s *debugState) stateString(state stateAtPC) string {
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
var strs []string
for slotID, loc := range state.slots {
if !loc.absent() {
strs = append(strs, fmt.Sprintf("\t%v = %v\n", s.slots[slotID], s.LocString(loc)))
}
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
strs = append(strs, "\n")
for reg, slots := range state.registers {
if len(slots) != 0 {
var slotStrs []string
for _, slot := range slots {
slotStrs = append(slotStrs, s.slots[slot].String())
}
strs = append(strs, fmt.Sprintf("\t%v = %v\n", &s.registers[reg], slotStrs))
}
}
if len(strs) == 1 {
return "(no vars)\n"
cmd/compile: cover control flow insns in location lists The information that's used to generate DWARF location lists is very ssa.Value centric; it uses Values as start and end coordinates to define ranges. That mostly works fine, but control flow instructions don't come from Values, so the ranges couldn't cover them. Control flow instructions are generated when the SSA representation is converted to assembly, so that's the best place to extend the ranges to cover them. (Before that, there's nothing to refer to, and afterward the boundaries between blocks have been lost.) That requires block information in the debugInfo type, which then flows down to make everything else awkward. On the plus side, there's a little less copying slices around than there used to be, so it should be a little faster. Previously, the ranges for empty blocks were not very meaningful. That was fine, because they had no Values to cover, so no debug information was generated for them. But they do have control flow instructions (that's why they exist) and so now it's important that the information be correct. Introduce two sentinel values, BlockStart and BlockEnd, that denote the boundary of a block, even if the block is empty. BlockEnd replaces the previous SurvivedBlock flag. There's one more problem: the last instruction in the function will be a control flow instruction, so any live ranges need to be extended past it. But there's no instruction after it to use as the end of the range. Instead, leave the EndProg field of those ranges as nil and fix it up to point to past the end of the assembled text at the very last moment. Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b Reviewed-on: https://go-review.googlesource.com/63010 Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
return strings.Join(strs, "")
cmd/compile: cover control flow insns in location lists The information that's used to generate DWARF location lists is very ssa.Value centric; it uses Values as start and end coordinates to define ranges. That mostly works fine, but control flow instructions don't come from Values, so the ranges couldn't cover them. Control flow instructions are generated when the SSA representation is converted to assembly, so that's the best place to extend the ranges to cover them. (Before that, there's nothing to refer to, and afterward the boundaries between blocks have been lost.) That requires block information in the debugInfo type, which then flows down to make everything else awkward. On the plus side, there's a little less copying slices around than there used to be, so it should be a little faster. Previously, the ranges for empty blocks were not very meaningful. That was fine, because they had no Values to cover, so no debug information was generated for them. But they do have control flow instructions (that's why they exist) and so now it's important that the information be correct. Introduce two sentinel values, BlockStart and BlockEnd, that denote the boundary of a block, even if the block is empty. BlockEnd replaces the previous SurvivedBlock flag. There's one more problem: the last instruction in the function will be a control flow instruction, so any live ranges need to be extended past it. But there's no instruction after it to use as the end of the range. Instead, leave the EndProg field of those ranges as nil and fix it up to point to past the end of the assembled text at the very last moment. Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b Reviewed-on: https://go-review.googlesource.com/63010 Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
}
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
// BuildFuncDebug returns debug information for f.
// f must be fully processed, so that each Value is where it will be when
// machine code is emitted.
func BuildFuncDebug(ctxt *obj.Link, f *Func, loggingEnabled bool, stackOffset func(LocalSlot) int32) *FuncDebug {
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
if f.RegAlloc == nil {
f.Fatalf("BuildFuncDebug on func %v that has not been fully processed", f)
}
state := &f.Cache.debugState
state.loggingEnabled = loggingEnabled
state.f = f
state.registers = f.Config.registers
state.stackOffset = stackOffset
state.ctxt = ctxt
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
if state.loggingEnabled {
state.logf("Generating location lists for function %q\n", f.Name)
}
if state.varParts == nil {
state.varParts = make(map[GCNode][]SlotID)
} else {
for n := range state.varParts {
delete(state.varParts, n)
}
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
// Recompose any decomposed variables, and establish the canonical
// IDs for each var and slot by filling out state.vars and state.slots.
state.slots = state.slots[:0]
state.vars = state.vars[:0]
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
for i, slot := range f.Names {
state.slots = append(state.slots, slot)
if slot.N.IsSynthetic() {
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
continue
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
topSlot := &slot
for topSlot.SplitOf != nil {
topSlot = topSlot.SplitOf
}
if _, ok := state.varParts[topSlot.N]; !ok {
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
state.vars = append(state.vars, topSlot.N)
}
state.varParts[topSlot.N] = append(state.varParts[topSlot.N], SlotID(i))
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
// Recreate the LocalSlot for each stack-only variable.
// This would probably be better as an output from stackframe.
for _, b := range f.Blocks {
for _, v := range b.Values {
if v.Op == OpVarDef || v.Op == OpVarKill {
n := v.Aux.(GCNode)
if n.IsSynthetic() {
continue
}
if _, ok := state.varParts[n]; !ok {
slot := LocalSlot{N: n, Type: v.Type, Off: 0}
state.slots = append(state.slots, slot)
state.varParts[n] = []SlotID{SlotID(len(state.slots) - 1)}
state.vars = append(state.vars, n)
}
}
}
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// Fill in the var<->slot mappings.
if cap(state.varSlots) < len(state.vars) {
state.varSlots = make([][]SlotID, len(state.vars))
} else {
state.varSlots = state.varSlots[:len(state.vars)]
for i := range state.varSlots {
state.varSlots[i] = state.varSlots[i][:0]
}
}
if cap(state.slotVars) < len(state.slots) {
state.slotVars = make([]VarID, len(state.slots))
} else {
state.slotVars = state.slotVars[:len(state.slots)]
}
if state.partsByVarOffset == nil {
state.partsByVarOffset = &partsByVarOffset{}
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
for varID, n := range state.vars {
parts := state.varParts[n]
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
state.varSlots[varID] = parts
for _, slotID := range parts {
state.slotVars[slotID] = VarID(varID)
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
*state.partsByVarOffset.(*partsByVarOffset) = partsByVarOffset{parts, state.slots}
sort.Sort(state.partsByVarOffset)
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
}
state.initializeCache(f, len(state.varParts), len(state.slots))
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
for i, slot := range f.Names {
if slot.N.IsSynthetic() {
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
continue
}
for _, value := range f.NamedValues[slot] {
state.valueNames[value.ID] = append(state.valueNames[value.ID], SlotID(i))
}
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
blockLocs := state.liveness()
state.buildLocationLists(blockLocs)
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
return &FuncDebug{
Slots: state.slots,
VarSlots: state.varSlots,
Vars: state.vars,
LocationLists: state.lists,
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
}
// liveness walks the function in control flow order, calculating the start
// and end state of each block.
func (state *debugState) liveness() []*BlockDebug {
blockLocs := make([]*BlockDebug, state.f.NumBlocks())
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
// Reverse postorder: visit a block after as many as possible of its
// predecessors have been visited.
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
po := state.f.Postorder()
for i := len(po) - 1; i >= 0; i-- {
b := po[i]
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
// Build the starting state for the block from the final
// state of its predecessors.
startState, startValid := state.mergePredecessors(b, blockLocs)
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
changed := false
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
if state.loggingEnabled {
state.logf("Processing %v, initial state:\n%v", b, state.stateString(state.currentState))
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
// Update locs/registers with the effects of each Value.
for _, v := range b.Values {
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
slots := state.valueNames[v.ID]
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
// Loads and stores inherit the names of their sources.
var source *Value
switch v.Op {
case OpStoreReg:
source = v.Args[0]
case OpLoadReg:
switch a := v.Args[0]; a.Op {
case OpArg, OpPhi:
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
source = a
case OpStoreReg:
source = a.Args[0]
default:
if state.loggingEnabled {
state.logf("at %v: load with unexpected source op: %v (%v)\n", v, a.Op, a)
}
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// Update valueNames with the source so that later steps
// don't need special handling.
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
if source != nil {
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
slots = append(slots, state.valueNames[source.ID]...)
state.valueNames[v.ID] = slots
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
reg, _ := state.f.getHome(v.ID).(*Register)
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
c := state.processValue(v, slots, reg)
changed = changed || c
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
if state.loggingEnabled {
state.f.Logf("Block %v done, locs:\n%v", b, state.stateString(state.currentState))
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
locs := state.allocBlock(b)
locs.relevant = changed
if !changed && startValid {
locs.endState = startState
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
} else {
for slotID, slotLoc := range state.currentState.slots {
if slotLoc.absent() {
continue
}
state.appendLiveSlot(liveSlot{slot: SlotID(slotID), Registers: slotLoc.Registers, StackOffset: slotLoc.StackOffset})
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
locs.endState = state.getLiveSlotSlice()
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
blockLocs[b.ID] = locs
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
return blockLocs
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
// mergePredecessors takes the end state of each of b's predecessors and
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// intersects them to form the starting state for b. It returns that state in
// the BlockDebug, and fills state.currentState with it.
func (state *debugState) mergePredecessors(b *Block, blockLocs []*BlockDebug) ([]liveSlot, bool) {
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
// Filter out back branches.
var predsBuf [10]*Block
preds := predsBuf[:0]
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
for _, pred := range b.Preds {
if blockLocs[pred.b.ID] != nil {
preds = append(preds, pred.b)
}
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
if state.loggingEnabled {
// The logf below would cause preds to be heap-allocated if
// it were passed directly.
preds2 := make([]*Block, len(preds))
copy(preds2, preds)
state.logf("Merging %v into %v\n", preds2, b)
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
if len(preds) == 0 {
state.currentState.reset(nil)
return nil, true
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
p0 := blockLocs[preds[0].ID].endState
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
if len(preds) == 1 {
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
state.currentState.reset(p0)
return p0, true
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
if state.loggingEnabled {
state.logf("Starting %v with state from %v:\n%v", b, preds[0], state.blockEndStateString(blockLocs[preds[0].ID]))
}
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
slotLocs := state.currentState.slots
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
for _, predSlot := range p0 {
slotLocs[predSlot.slot] = VarLoc{predSlot.Registers, predSlot.StackOffset}
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
state.liveCount[predSlot.slot] = 1
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
for i := 1; i < len(preds); i++ {
if state.loggingEnabled {
state.logf("Merging in state from %v:\n%v", preds[i], state.blockEndStateString(blockLocs[preds[i].ID]))
}
for _, predSlot := range blockLocs[preds[i].ID].endState {
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
state.liveCount[predSlot.slot]++
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
liveLoc := slotLocs[predSlot.slot]
if !liveLoc.onStack() || !predSlot.onStack() || liveLoc.StackOffset != predSlot.StackOffset {
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
liveLoc.StackOffset = 0
}
liveLoc.Registers &= predSlot.Registers
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
slotLocs[predSlot.slot] = liveLoc
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
}
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
// Check if the final state is the same as the first predecessor's
// final state, and reuse it if so. In principle it could match any,
// but it's probably not worth checking more than the first.
unchanged := true
for _, predSlot := range p0 {
if state.liveCount[predSlot.slot] != len(preds) ||
slotLocs[predSlot.slot].Registers != predSlot.Registers ||
slotLocs[predSlot.slot].StackOffset != predSlot.StackOffset {
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
unchanged = false
break
}
}
if unchanged {
if state.loggingEnabled {
state.logf("After merge, %v matches %v exactly.\n", b, preds[0])
}
state.currentState.reset(p0)
return p0, true
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
for reg := range state.currentState.registers {
state.currentState.registers[reg] = state.currentState.registers[reg][:0]
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// A slot is live if it was seen in all predecessors, and they all had
// some storage in common.
for _, predSlot := range p0 {
slotLoc := slotLocs[predSlot.slot]
if state.liveCount[predSlot.slot] != len(preds) {
// Seen in only some predecessors. Clear it out.
slotLocs[predSlot.slot] = VarLoc{}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
continue
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// Present in all predecessors.
mask := uint64(slotLoc.Registers)
for {
if mask == 0 {
break
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
reg := uint8(bits.TrailingZeros64(mask))
mask &^= 1 << reg
state.currentState.registers[reg] = append(state.currentState.registers[reg], predSlot.slot)
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
}
return nil, false
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
// processValue updates locs and state.registerContents to reflect v, a value with
cmd/compile: adjust locationlist lifetimes A statement like foo = bar + qux might compile to AX := AX + BX resulting in a regkill for AX before this instruction. The buggy behavior is to kill AX "at" this instruction, before it has executed. (Code generation of no-instruction values like RegKills applies their effects at the next actual instruction emitted). However, bar is still associated with AX until after the instruction executes, so the effect of the regkill must occur at the boundary between this instruction and the next. Similarly, the new value bound to AX is not visible until this instruction executes (and in the case of values that require multiple instructions in code generation, until all of them have executed). The ranges are adjusted so that a value's start occurs at the next following instruction after its evaluation, and the end occurs after (execution of) the first instruction following the end of the lifetime as a value. (Notice the asymmetry; the entire value must be finished before it is visible, but execution of a single instruction invalidates. However, the value *is* visible before that next instruction executes). The test was adjusted to make it insensitive to the result numbering for variables printed by gdb, since that is not relevant to the test and makes the differences introduced by small changes larger than necessary/useful. The test was also improved to present variable probes more intuitively, and also to allow explicit indication of "this variable was optimized out" Change-Id: I39453eead8399e6bb05ebd957289b112d1100c0e Reviewed-on: https://go-review.googlesource.com/74090 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-27 17:02:11 -04:00
// the names in vSlots and homed in vReg. "v" becomes visible after execution of
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// the instructions evaluating it. It returns which VarIDs were modified by the
// Value's execution.
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
func (state *debugState) processValue(v *Value, vSlots []SlotID, vReg *Register) bool {
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
locs := state.currentState
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
changed := false
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
setSlot := func(slot SlotID, loc VarLoc) {
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
changed = true
state.changedVars.add(ID(state.slotVars[slot]))
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
state.currentState.slots[slot] = loc
}
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// Handle any register clobbering. Call operations, for example,
// clobber all registers even though they don't explicitly write to
// them.
clobbers := uint64(opcodeTable[v.Op].reg.clobbers)
for {
if clobbers == 0 {
break
}
reg := uint8(bits.TrailingZeros64(clobbers))
clobbers &^= 1 << reg
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
for _, slot := range locs.registers[reg] {
if state.loggingEnabled {
state.logf("at %v: %v clobbered out of %v\n", v, state.slots[slot], &state.registers[reg])
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
last := locs.slots[slot]
if last.absent() {
state.f.Fatalf("at %v: slot %v in register %v with no location entry", v, state.slots[slot], &state.registers[reg])
continue
}
regs := last.Registers &^ (1 << reg)
setSlot(slot, VarLoc{regs, last.StackOffset})
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
locs.registers[reg] = locs.registers[reg][:0]
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
switch {
case v.Op == OpVarDef, v.Op == OpVarKill:
n := v.Aux.(GCNode)
if n.IsSynthetic() {
break
}
slotID := state.varParts[n][0]
var stackOffset StackOffset
if v.Op == OpVarDef {
stackOffset = StackOffset(state.stackOffset(state.slots[slotID])<<1 | 1)
}
setSlot(slotID, VarLoc{0, stackOffset})
if state.loggingEnabled {
if v.Op == OpVarDef {
state.logf("at %v: stack-only var %v now live\n", v, state.slots[slotID])
} else {
state.logf("at %v: stack-only var %v now dead\n", v, state.slots[slotID])
}
}
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
case v.Op == OpArg:
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
home := state.f.getHome(v.ID).(LocalSlot)
stackOffset := state.stackOffset(home)<<1 | 1
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
for _, slot := range vSlots {
if state.loggingEnabled {
state.logf("at %v: arg %v now on stack in location %v\n", v, state.slots[slot], home)
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
if last := locs.slots[slot]; !last.absent() {
state.logf("at %v: unexpected arg op on already-live slot %v\n", v, state.slots[slot])
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
setSlot(slot, VarLoc{0, StackOffset(stackOffset)})
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
case v.Op == OpStoreReg:
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
home := state.f.getHome(v.ID).(LocalSlot)
stackOffset := state.stackOffset(home)<<1 | 1
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
for _, slot := range vSlots {
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
last := locs.slots[slot]
if last.absent() {
if state.loggingEnabled {
state.logf("at %v: unexpected spill of unnamed register %s\n", v, vReg)
}
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
break
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
setSlot(slot, VarLoc{last.Registers, StackOffset(stackOffset)})
if state.loggingEnabled {
state.logf("at %v: %v spilled to stack location %v\n", v, state.slots[slot], home)
}
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
case vReg != nil:
if state.loggingEnabled {
newSlots := make([]bool, len(state.slots))
for _, slot := range vSlots {
newSlots[slot] = true
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
for _, slot := range locs.registers[vReg.num] {
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
if !newSlots[slot] {
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
state.logf("at %v: overwrote %v in register %v\n", v, state.slots[slot], vReg)
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
}
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
for _, slot := range locs.registers[vReg.num] {
last := locs.slots[slot]
setSlot(slot, VarLoc{last.Registers &^ (1 << uint8(vReg.num)), last.StackOffset})
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
locs.registers[vReg.num] = locs.registers[vReg.num][:0]
locs.registers[vReg.num] = append(locs.registers[vReg.num], vSlots...)
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
for _, slot := range vSlots {
if state.loggingEnabled {
state.logf("at %v: %v now in %s\n", v, state.slots[slot], vReg)
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
last := locs.slots[slot]
setSlot(slot, VarLoc{1<<uint8(vReg.num) | last.Registers, last.StackOffset})
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
}
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
return changed
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
// varOffset returns the offset of slot within the user variable it was
// decomposed from. This has nothing to do with its stack offset.
func varOffset(slot LocalSlot) int64 {
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
offset := slot.Off
s := &slot
for ; s.SplitOf != nil; s = s.SplitOf {
offset += s.SplitOffset
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
return offset
}
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
type partsByVarOffset struct {
slotIDs []SlotID
slots []LocalSlot
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
func (a partsByVarOffset) Len() int { return len(a.slotIDs) }
func (a partsByVarOffset) Less(i, j int) bool {
return varOffset(a.slots[a.slotIDs[i]]) < varOffset(a.slots[a.slotIDs[j]])
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
}
func (a partsByVarOffset) Swap(i, j int) { a.slotIDs[i], a.slotIDs[j] = a.slotIDs[j], a.slotIDs[i] }
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// A pendingEntry represents the beginning of a location list entry, missing
// only its end coordinate.
type pendingEntry struct {
present bool
startBlock, startValue ID
// The location of each piece of the variable, in the same order as the
// SlotIDs in varParts.
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
pieces []VarLoc
}
func (e *pendingEntry) clear() {
e.present = false
e.startBlock = 0
e.startValue = 0
for i := range e.pieces {
e.pieces[i] = VarLoc{}
}
}
// canMerge returns true if the location description for new is the same as
// pending.
func canMerge(pending, new VarLoc) bool {
if pending.absent() && new.absent() {
return true
}
if pending.absent() || new.absent() {
return false
}
if pending.onStack() {
return pending.StackOffset == new.StackOffset
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
if pending.Registers != 0 && new.Registers != 0 {
return firstReg(pending.Registers) == firstReg(new.Registers)
}
return false
}
// firstReg returns the first register in set that is present.
func firstReg(set RegisterSet) uint8 {
if set == 0 {
// This is wrong, but there seem to be some situations where we
// produce locations with no storage.
return 0
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
return uint8(bits.TrailingZeros64(uint64(set)))
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
// buildLocationLists builds location lists for all the user variables in
// state.f, using the information about block state in blockLocs.
// The returned location lists are not fully complete. They are in terms of
// SSA values rather than PCs, and have no base address/end entries. They will
// be finished by PutLocationList.
func (state *debugState) buildLocationLists(blockLocs []*BlockDebug) {
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// Run through the function in program text order, building up location
// lists as we go. The heavy lifting has mostly already been done.
for _, b := range state.f.Blocks {
if !blockLocs[b.ID].relevant {
continue
}
state.mergePredecessors(b, blockLocs)
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
zeroWidthPending := false
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
for _, v := range b.Values {
slots := state.valueNames[v.ID]
reg, _ := state.f.getHome(v.ID).(*Register)
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
changed := state.processValue(v, slots, reg)
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
if opcodeTable[v.Op].zeroWidth {
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
if changed {
zeroWidthPending = true
cmd/compile/internal/ssa: reduce location list memory use Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-01-22 17:06:26 -05:00
}
continue
}
if !changed && !zeroWidthPending {
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
continue
}
zeroWidthPending = false
for _, varID := range state.changedVars.contents() {
state.updateVar(VarID(varID), v, state.currentState.slots)
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
state.changedVars.clear()
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
}
if state.loggingEnabled {
state.logf("location lists:\n")
}
// Flush any leftover entries live at the end of the last block.
for varID := range state.lists {
state.writePendingEntry(VarID(varID), state.f.Blocks[len(state.f.Blocks)-1].ID, BlockEnd.ID)
list := state.lists[varID]
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
if state.loggingEnabled {
if len(list) == 0 {
state.logf("\t%v : empty list\n", state.vars[varID])
} else {
state.logf("\t%v : %q\n", state.vars[varID], hex.EncodeToString(state.lists[varID]))
}
}
}
}
// updateVar updates the pending location list entry for varID to
// reflect the new locations in curLoc, caused by v.
func (state *debugState) updateVar(varID VarID, v *Value, curLoc []VarLoc) {
// Assemble the location list entry with whatever's live.
empty := true
for _, slotID := range state.varSlots[varID] {
if !curLoc[slotID].absent() {
empty = false
break
}
}
pending := &state.pendingEntries[varID]
if empty {
state.writePendingEntry(varID, v.Block.ID, v.ID)
pending.clear()
return
}
// Extend the previous entry if possible.
if pending.present {
merge := true
for i, slotID := range state.varSlots[varID] {
if !canMerge(pending.pieces[i], curLoc[slotID]) {
merge = false
break
}
}
if merge {
return
}
}
state.writePendingEntry(varID, v.Block.ID, v.ID)
pending.present = true
pending.startBlock = v.Block.ID
pending.startValue = v.ID
for i, slot := range state.varSlots[varID] {
pending.pieces[i] = curLoc[slot]
}
}
// writePendingEntry writes out the pending entry for varID, if any,
// terminated at endBlock/Value.
func (state *debugState) writePendingEntry(varID VarID, endBlock, endValue ID) {
pending := state.pendingEntries[varID]
if !pending.present {
return
}
// Pack the start/end coordinates into the start/end addresses
// of the entry, for decoding by PutLocationList.
start, startOK := encodeValue(state.ctxt, pending.startBlock, pending.startValue)
end, endOK := encodeValue(state.ctxt, endBlock, endValue)
if !startOK || !endOK {
// If someone writes a function that uses >65K values,
// they get incomplete debug info on 32-bit platforms.
return
}
list := state.lists[varID]
list = appendPtr(state.ctxt, list, start)
list = appendPtr(state.ctxt, list, end)
// Where to write the length of the location description once
// we know how big it is.
sizeIdx := len(list)
list = list[:len(list)+2]
if state.loggingEnabled {
var partStrs []string
for i, slot := range state.varSlots[varID] {
partStrs = append(partStrs, fmt.Sprintf("%v@%v", state.slots[slot], state.LocString(pending.pieces[i])))
}
state.logf("Add entry for %v: \tb%vv%v-b%vv%v = \t%v\n", state.vars[varID], pending.startBlock, pending.startValue, endBlock, endValue, strings.Join(partStrs, " "))
}
for i, slotID := range state.varSlots[varID] {
loc := pending.pieces[i]
slot := state.slots[slotID]
if !loc.absent() {
if loc.onStack() {
if loc.stackOffsetValue() == 0 {
list = append(list, dwarf.DW_OP_call_frame_cfa)
} else {
list = append(list, dwarf.DW_OP_fbreg)
list = dwarf.AppendSleb128(list, int64(loc.stackOffsetValue()))
}
} else {
regnum := state.ctxt.Arch.DWARFRegisters[state.registers[firstReg(loc.Registers)].ObjNum()]
if regnum < 32 {
list = append(list, dwarf.DW_OP_reg0+byte(regnum))
} else {
list = append(list, dwarf.DW_OP_regx)
list = dwarf.AppendUleb128(list, uint64(regnum))
}
}
}
if len(state.varSlots[varID]) > 1 {
list = append(list, dwarf.DW_OP_piece)
list = dwarf.AppendUleb128(list, uint64(slot.Type.Size()))
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
}
state.ctxt.Arch.ByteOrder.PutUint16(list[sizeIdx:], uint16(len(list)-sizeIdx-2))
state.lists[varID] = list
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
// PutLocationList adds list (a location list in its intermediate representation) to listSym.
func (debugInfo *FuncDebug) PutLocationList(list []byte, ctxt *obj.Link, listSym, startPC *obj.LSym) {
getPC := debugInfo.GetPC
// Re-read list, translating its address from block/value ID to PC.
for i := 0; i < len(list); {
begin := getPC(decodeValue(ctxt, readPtr(ctxt, list[i:])))
end := getPC(decodeValue(ctxt, readPtr(ctxt, list[i+ctxt.Arch.PtrSize:])))
// Horrible hack. If a range contains only zero-width
// instructions, e.g. an Arg, and it's at the beginning of the
// function, this would be indistinguishable from an
// end entry. Fudge it.
if begin == 0 && end == 0 {
end = 1
}
writePtr(ctxt, list[i:], uint64(begin))
writePtr(ctxt, list[i+ctxt.Arch.PtrSize:], uint64(end))
i += 2 * ctxt.Arch.PtrSize
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
i += 2 + int(ctxt.Arch.ByteOrder.Uint16(list[i:]))
}
// Base address entry.
listSym.WriteInt(ctxt, listSym.Size, ctxt.Arch.PtrSize, ^0)
listSym.WriteAddr(ctxt, listSym.Size, ctxt.Arch.PtrSize, startPC, 0)
// Location list contents, now with real PCs.
listSym.WriteBytes(ctxt, listSym.Size, list)
// End entry.
listSym.WriteInt(ctxt, listSym.Size, ctxt.Arch.PtrSize, 0)
listSym.WriteInt(ctxt, listSym.Size, ctxt.Arch.PtrSize, 0)
}
// Pack a value and block ID into an address-sized uint, returning ~0 if they
// don't fit.
func encodeValue(ctxt *obj.Link, b, v ID) (uint64, bool) {
if ctxt.Arch.PtrSize == 8 {
result := uint64(b)<<32 | uint64(uint32(v))
//ctxt.Logf("b %#x (%d) v %#x (%d) -> %#x\n", b, b, v, v, result)
return result, true
}
if ctxt.Arch.PtrSize != 4 {
panic("unexpected pointer size")
}
if ID(int16(b)) != b || ID(int16(v)) != v {
return 0, false
}
return uint64(b)<<16 | uint64(uint16(v)), true
}
// Unpack a value and block ID encoded by encodeValue.
func decodeValue(ctxt *obj.Link, word uint64) (ID, ID) {
if ctxt.Arch.PtrSize == 8 {
b, v := ID(word>>32), ID(word)
//ctxt.Logf("%#x -> b %#x (%d) v %#x (%d)\n", word, b, b, v, v)
return b, v
}
if ctxt.Arch.PtrSize != 4 {
panic("unexpected pointer size")
}
return ID(word >> 16), ID(int16(word))
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
// Append a pointer-sized uint to buf.
func appendPtr(ctxt *obj.Link, buf []byte, word uint64) []byte {
if cap(buf) < len(buf)+20 {
b := make([]byte, len(buf), 20+cap(buf)*2)
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
copy(b, buf)
buf = b
}
writeAt := len(buf)
buf = buf[0 : len(buf)+ctxt.Arch.PtrSize]
writePtr(ctxt, buf[writeAt:], word)
return buf
}
// Write a pointer-sized uint to the beginning of buf.
func writePtr(ctxt *obj.Link, buf []byte, word uint64) {
switch ctxt.Arch.PtrSize {
case 4:
ctxt.Arch.ByteOrder.PutUint32(buf, uint32(word))
case 8:
ctxt.Arch.ByteOrder.PutUint64(buf, word)
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
default:
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
panic("unexpected pointer size")
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
// Read a pointer-sized uint from the beginning of buf.
func readPtr(ctxt *obj.Link, buf []byte) uint64 {
switch ctxt.Arch.PtrSize {
case 4:
return uint64(ctxt.Arch.ByteOrder.Uint32(buf))
case 8:
return ctxt.Arch.ByteOrder.Uint64(buf)
default:
panic("unexpected pointer size")
}
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}