mirror of
https://github.com/golang/go.git
synced 2025-12-08 06:10:04 +00:00
8 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
39eea62340 |
cmd/compile/internal/ssa: reduce location list memory use
Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
||
|
|
2075a9323d |
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
||
|
|
d58d90152b |
cmd/compile: adjust locationlist lifetimes
A statement like foo = bar + qux might compile to AX := AX + BX resulting in a regkill for AX before this instruction. The buggy behavior is to kill AX "at" this instruction, before it has executed. (Code generation of no-instruction values like RegKills applies their effects at the next actual instruction emitted). However, bar is still associated with AX until after the instruction executes, so the effect of the regkill must occur at the boundary between this instruction and the next. Similarly, the new value bound to AX is not visible until this instruction executes (and in the case of values that require multiple instructions in code generation, until all of them have executed). The ranges are adjusted so that a value's start occurs at the next following instruction after its evaluation, and the end occurs after (execution of) the first instruction following the end of the lifetime as a value. (Notice the asymmetry; the entire value must be finished before it is visible, but execution of a single instruction invalidates. However, the value *is* visible before that next instruction executes). The test was adjusted to make it insensitive to the result numbering for variables printed by gdb, since that is not relevant to the test and makes the differences introduced by small changes larger than necessary/useful. The test was also improved to present variable probes more intuitively, and also to allow explicit indication of "this variable was optimized out" Change-Id: I39453eead8399e6bb05ebd957289b112d1100c0e Reviewed-on: https://go-review.googlesource.com/74090 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
||
|
|
73f1a1a1a7 |
cmd/compile/internal/ssa: use reverse postorder traversal
Instead of the hand-written control flow analysis in debug info generation, use a reverse postorder traversal, which is basically the same thing. It should be slightly faster. More importantly, the previous version simply gave up in the case of non-reducible functions, and produced output that caused a later stage to crash. It turns out that there's a non-reducible function in compress/flate, so that wasn't a theoretical issue. With this change, all blocks will be visited, even for non-reducible functions. Change-Id: Id47536764ee93203c6b4105a1a3013fe3265aa12 Reviewed-on: https://go-review.googlesource.com/73110 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
||
|
|
0b6b5641d7 |
cmd/compile: use correct stack slots in location lists
When variables need to be spilled to the stack, they usually get their own stack slot. Local variables have a slot allocated if they need one, and arguments start out on the stack. Before this CL, the debug information made the assumption that this was always the case, and so didn't bother storing an actual stack offset during SSA analysis. There's at least one case where this isn't true: variables that alias arguments. Since the argument is the source of the variable, the variable will begin its life on the stack in the argument's stack slot, not its own. Therefore the debug info needs to track the actual stack slot for each location entry. No detectable performance change, despite the O(N) loop in getHomeSlot. Change-Id: I2701adb7eddee17d4524336cb7aa6786e8f32b46 Reviewed-on: https://go-review.googlesource.com/67231 Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com> Reviewed-by: David Chase <drchase@google.com> |
||
|
|
6bbe1bc940 |
cmd/compile: cover control flow insns in location lists
The information that's used to generate DWARF location lists is very ssa.Value centric; it uses Values as start and end coordinates to define ranges. That mostly works fine, but control flow instructions don't come from Values, so the ranges couldn't cover them. Control flow instructions are generated when the SSA representation is converted to assembly, so that's the best place to extend the ranges to cover them. (Before that, there's nothing to refer to, and afterward the boundaries between blocks have been lost.) That requires block information in the debugInfo type, which then flows down to make everything else awkward. On the plus side, there's a little less copying slices around than there used to be, so it should be a little faster. Previously, the ranges for empty blocks were not very meaningful. That was fine, because they had no Values to cover, so no debug information was generated for them. But they do have control flow instructions (that's why they exist) and so now it's important that the information be correct. Introduce two sentinel values, BlockStart and BlockEnd, that denote the boundary of a block, even if the block is empty. BlockEnd replaces the previous SurvivedBlock flag. There's one more problem: the last instruction in the function will be a control flow instruction, so any live ranges need to be extended past it. But there's no instruction after it to use as the end of the range. Instead, leave the EndProg field of those ranges as nil and fix it up to point to past the end of the assembled text at the very last moment. Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b Reviewed-on: https://go-review.googlesource.com/63010 Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
||
|
|
bf4d8d3d05 |
cmd/compile: rename SSA Register.Name to Register.String
Just to get rid of lots of .Name() stutter in printf calls. Change-Id: I86cf00b3f7b2172387a1c6a7f189c1897fab6300 Reviewed-on: https://go-review.googlesource.com/56630 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
||
|
|
4c54a047c6 |
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> |