go/src/cmd/internal/obj/plist.go

209 lines
5.1 KiB
Go
Raw Normal View History

// Copyright 2013 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package obj
import (
"cmd/internal/objabi"
"fmt"
"strings"
)
type Plist struct {
Firstpc *Prog
Curfn interface{} // holds a *gc.Node, if non-nil
}
cmd/compile: teach assemblers to accept a Prog allocator The existing bulk Prog allocator is not concurrency-safe. To allow for concurrency-safe bulk allocation of Progs, I want to move Prog allocation and caching upstream, to the clients of cmd/internal/obj. This is a preliminary enabling refactoring. After this CL, instead of calling Ctxt.NewProg throughout the assemblers, we thread through a newprog function that returns a new Prog. That function is set up to be Ctxt.NewProg, so there are no real changes in this CL; this CL only establishes the plumbing. Passes toolstash-check -all. Negligible compiler performance impact. Updates #15756 name old time/op new time/op delta Template 213ms ± 3% 214ms ± 4% ~ (p=0.574 n=49+47) Unicode 90.1ms ± 5% 89.9ms ± 4% ~ (p=0.417 n=50+49) GoTypes 585ms ± 4% 584ms ± 3% ~ (p=0.466 n=49+49) SSA 6.50s ± 3% 6.52s ± 2% ~ (p=0.251 n=49+49) Flate 128ms ± 4% 128ms ± 4% ~ (p=0.673 n=49+50) GoParser 152ms ± 3% 152ms ± 3% ~ (p=0.810 n=48+49) Reflect 372ms ± 4% 372ms ± 5% ~ (p=0.778 n=49+50) Tar 113ms ± 5% 111ms ± 4% -0.98% (p=0.016 n=50+49) XML 208ms ± 3% 208ms ± 2% ~ (p=0.483 n=47+49) [Geo mean] 285ms 285ms -0.17% name old user-ns/op new user-ns/op delta Template 253M ± 8% 254M ± 9% ~ (p=0.899 n=50+50) Unicode 106M ± 9% 106M ±11% ~ (p=0.642 n=50+50) GoTypes 736M ± 4% 740M ± 4% ~ (p=0.121 n=50+49) SSA 8.82G ± 3% 8.88G ± 2% +0.65% (p=0.006 n=49+48) Flate 147M ± 4% 147M ± 5% ~ (p=0.844 n=47+48) GoParser 179M ± 4% 178M ± 6% ~ (p=0.785 n=50+50) Reflect 443M ± 6% 441M ± 5% ~ (p=0.850 n=48+47) Tar 126M ± 5% 126M ± 5% ~ (p=0.734 n=50+50) XML 244M ± 5% 244M ± 5% ~ (p=0.594 n=49+50) [Geo mean] 341M 341M +0.11% Change-Id: Ice962f61eb3a524c2db00a166cb582c22caa7d68 Reviewed-on: https://go-review.googlesource.com/39633 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-04 14:31:55 -07:00
// ProgAlloc is a function that allocates Progs.
// It is used to provide access to cached/bulk-allocated Progs to the assemblers.
type ProgAlloc func() *Prog
func Flushplist(ctxt *Link, plist *Plist, newprog ProgAlloc, myimportpath string) {
// Build list of symbols, and assign instructions to lists.
var curtext *LSym
var etext *Prog
var text []*LSym
var plink *Prog
for p := plist.Firstpc; p != nil; p = plink {
if ctxt.Debugasm > 0 && ctxt.Debugvlog {
fmt.Printf("obj: %v\n", p)
}
plink = p.Link
p.Link = nil
switch p.As {
case AEND:
continue
case ATEXT:
s := p.From.Sym
if s == nil {
// func _() { }
curtext = nil
continue
}
text = append(text, s)
etext = p
curtext = s
continue
case AFUNCDATA:
// Rewrite reference to go_args_stackmap(SB) to the Go-provided declaration information.
if curtext == nil { // func _() {}
continue
}
if p.To.Sym.Name == "go_args_stackmap" {
if p.From.Type != TYPE_CONST || p.From.Offset != objabi.FUNCDATA_ArgsPointerMaps {
ctxt.Diag("FUNCDATA use of go_args_stackmap(SB) without FUNCDATA_ArgsPointerMaps")
}
p.To.Sym = ctxt.LookupDerived(curtext, curtext.Name+".args_stackmap")
}
}
if curtext == nil {
etext = nil
continue
}
etext.Link = p
etext = p
}
cmd/compile: add Prog cache to Progs The existing bulk/cached Prog allocator, Ctxt.NewProg, is not concurrency-safe. This CL moves Prog allocation to its clients, the compiler and the assembler. The assembler is so fast and generates so few Progs that it does not need optimization of Prog allocation. I could not generate measureable changes. And even if I could, the assembly is a miniscule portion of build times. The compiler already has a natural place to manage Prog allocation; this CL migrates the Prog cache there. It will be made concurrency-safe in a later CL by partitioning the Prog cache into chunks and assigning each chunk to a different goroutine to manage. This CL does cause a performance degradation when the compiler is invoked with the -S flag (to dump assembly). However, such usage is rare and almost always done manually. The one instance I know of in a test is TestAssembly in cmd/compile/internal/gc, and I did not detect a measurable performance impact there. Passes toolstash-check -all. Minor compiler performance impact. Updates #15756 Performance impact from just this CL: name old time/op new time/op delta Template 213ms ± 4% 213ms ± 4% ~ (p=0.571 n=49+49) Unicode 89.1ms ± 3% 89.4ms ± 3% ~ (p=0.388 n=47+48) GoTypes 581ms ± 2% 584ms ± 3% +0.56% (p=0.019 n=47+48) SSA 6.48s ± 2% 6.53s ± 2% +0.84% (p=0.000 n=47+49) Flate 128ms ± 4% 128ms ± 4% ~ (p=0.832 n=49+49) GoParser 152ms ± 3% 152ms ± 3% ~ (p=0.815 n=48+47) Reflect 371ms ± 4% 371ms ± 3% ~ (p=0.617 n=50+47) Tar 112ms ± 4% 112ms ± 3% ~ (p=0.724 n=49+49) XML 208ms ± 3% 208ms ± 4% ~ (p=0.678 n=49+50) [Geo mean] 284ms 285ms +0.18% name old user-ns/op new user-ns/op delta Template 251M ± 7% 252M ±11% ~ (p=0.704 n=49+50) Unicode 107M ± 7% 108M ± 5% +1.25% (p=0.036 n=50+49) GoTypes 738M ± 3% 740M ± 3% ~ (p=0.305 n=49+48) SSA 8.83G ± 2% 8.86G ± 4% ~ (p=0.098 n=47+50) Flate 146M ± 6% 147M ± 3% ~ (p=0.584 n=48+41) GoParser 178M ± 6% 179M ± 5% +0.93% (p=0.036 n=49+48) Reflect 441M ± 4% 446M ± 7% ~ (p=0.218 n=44+49) Tar 126M ± 5% 126M ± 5% ~ (p=0.766 n=48+49) XML 245M ± 5% 244M ± 4% ~ (p=0.359 n=50+50) [Geo mean] 341M 342M +0.51% Performance impact from this CL combined with its parent: name old time/op new time/op delta Template 213ms ± 3% 214ms ± 4% ~ (p=0.685 n=47+50) Unicode 89.8ms ± 6% 90.5ms ± 6% ~ (p=0.055 n=50+50) GoTypes 584ms ± 3% 585ms ± 2% ~ (p=0.710 n=49+47) SSA 6.50s ± 2% 6.53s ± 2% +0.39% (p=0.011 n=46+50) Flate 128ms ± 3% 128ms ± 4% ~ (p=0.855 n=47+49) GoParser 152ms ± 3% 152ms ± 3% ~ (p=0.666 n=49+49) Reflect 371ms ± 3% 372ms ± 3% ~ (p=0.298 n=48+48) Tar 112ms ± 5% 113ms ± 3% ~ (p=0.107 n=49+49) XML 208ms ± 3% 208ms ± 2% ~ (p=0.881 n=50+49) [Geo mean] 285ms 285ms +0.26% name old user-ns/op new user-ns/op delta Template 254M ± 9% 252M ± 8% ~ (p=0.290 n=49+50) Unicode 106M ± 6% 108M ± 7% +1.44% (p=0.034 n=50+50) GoTypes 741M ± 4% 743M ± 4% ~ (p=0.992 n=50+49) SSA 8.86G ± 2% 8.83G ± 3% ~ (p=0.158 n=47+49) Flate 147M ± 4% 148M ± 5% ~ (p=0.832 n=50+49) GoParser 179M ± 5% 178M ± 5% ~ (p=0.370 n=48+50) Reflect 441M ± 6% 445M ± 7% ~ (p=0.246 n=45+47) Tar 126M ± 6% 126M ± 6% ~ (p=0.815 n=49+50) XML 244M ± 3% 245M ± 4% ~ (p=0.190 n=50+50) [Geo mean] 342M 342M +0.17% Change-Id: I020f1c079d495fbe2e15ccb51e1ea2cc1b5a1855 Reviewed-on: https://go-review.googlesource.com/39634 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-05 07:05:35 -07:00
if newprog == nil {
newprog = ctxt.NewProg
}
cmd/compile: teach assemblers to accept a Prog allocator The existing bulk Prog allocator is not concurrency-safe. To allow for concurrency-safe bulk allocation of Progs, I want to move Prog allocation and caching upstream, to the clients of cmd/internal/obj. This is a preliminary enabling refactoring. After this CL, instead of calling Ctxt.NewProg throughout the assemblers, we thread through a newprog function that returns a new Prog. That function is set up to be Ctxt.NewProg, so there are no real changes in this CL; this CL only establishes the plumbing. Passes toolstash-check -all. Negligible compiler performance impact. Updates #15756 name old time/op new time/op delta Template 213ms ± 3% 214ms ± 4% ~ (p=0.574 n=49+47) Unicode 90.1ms ± 5% 89.9ms ± 4% ~ (p=0.417 n=50+49) GoTypes 585ms ± 4% 584ms ± 3% ~ (p=0.466 n=49+49) SSA 6.50s ± 3% 6.52s ± 2% ~ (p=0.251 n=49+49) Flate 128ms ± 4% 128ms ± 4% ~ (p=0.673 n=49+50) GoParser 152ms ± 3% 152ms ± 3% ~ (p=0.810 n=48+49) Reflect 372ms ± 4% 372ms ± 5% ~ (p=0.778 n=49+50) Tar 113ms ± 5% 111ms ± 4% -0.98% (p=0.016 n=50+49) XML 208ms ± 3% 208ms ± 2% ~ (p=0.483 n=47+49) [Geo mean] 285ms 285ms -0.17% name old user-ns/op new user-ns/op delta Template 253M ± 8% 254M ± 9% ~ (p=0.899 n=50+50) Unicode 106M ± 9% 106M ±11% ~ (p=0.642 n=50+50) GoTypes 736M ± 4% 740M ± 4% ~ (p=0.121 n=50+49) SSA 8.82G ± 3% 8.88G ± 2% +0.65% (p=0.006 n=49+48) Flate 147M ± 4% 147M ± 5% ~ (p=0.844 n=47+48) GoParser 179M ± 4% 178M ± 6% ~ (p=0.785 n=50+50) Reflect 443M ± 6% 441M ± 5% ~ (p=0.850 n=48+47) Tar 126M ± 5% 126M ± 5% ~ (p=0.734 n=50+50) XML 244M ± 5% 244M ± 5% ~ (p=0.594 n=49+50) [Geo mean] 341M 341M +0.11% Change-Id: Ice962f61eb3a524c2db00a166cb582c22caa7d68 Reviewed-on: https://go-review.googlesource.com/39633 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-04 14:31:55 -07:00
// Add reference to Go arguments for C or assembly functions without them.
for _, s := range text {
if !strings.HasPrefix(s.Name, "\"\".") {
continue
}
found := false
for p := s.Func.Text; p != nil; p = p.Link {
if p.As == AFUNCDATA && p.From.Type == TYPE_CONST && p.From.Offset == objabi.FUNCDATA_ArgsPointerMaps {
found = true
break
}
}
if !found {
p := Appendp(s.Func.Text, newprog)
p.As = AFUNCDATA
p.From.Type = TYPE_CONST
p.From.Offset = objabi.FUNCDATA_ArgsPointerMaps
p.To.Type = TYPE_MEM
p.To.Name = NAME_EXTERN
p.To.Sym = ctxt.LookupDerived(s, s.Name+".args_stackmap")
}
}
// Turn functions into machine code images.
for _, s := range text {
mkfwd(s)
cmd/compile: teach assemblers to accept a Prog allocator The existing bulk Prog allocator is not concurrency-safe. To allow for concurrency-safe bulk allocation of Progs, I want to move Prog allocation and caching upstream, to the clients of cmd/internal/obj. This is a preliminary enabling refactoring. After this CL, instead of calling Ctxt.NewProg throughout the assemblers, we thread through a newprog function that returns a new Prog. That function is set up to be Ctxt.NewProg, so there are no real changes in this CL; this CL only establishes the plumbing. Passes toolstash-check -all. Negligible compiler performance impact. Updates #15756 name old time/op new time/op delta Template 213ms ± 3% 214ms ± 4% ~ (p=0.574 n=49+47) Unicode 90.1ms ± 5% 89.9ms ± 4% ~ (p=0.417 n=50+49) GoTypes 585ms ± 4% 584ms ± 3% ~ (p=0.466 n=49+49) SSA 6.50s ± 3% 6.52s ± 2% ~ (p=0.251 n=49+49) Flate 128ms ± 4% 128ms ± 4% ~ (p=0.673 n=49+50) GoParser 152ms ± 3% 152ms ± 3% ~ (p=0.810 n=48+49) Reflect 372ms ± 4% 372ms ± 5% ~ (p=0.778 n=49+50) Tar 113ms ± 5% 111ms ± 4% -0.98% (p=0.016 n=50+49) XML 208ms ± 3% 208ms ± 2% ~ (p=0.483 n=47+49) [Geo mean] 285ms 285ms -0.17% name old user-ns/op new user-ns/op delta Template 253M ± 8% 254M ± 9% ~ (p=0.899 n=50+50) Unicode 106M ± 9% 106M ±11% ~ (p=0.642 n=50+50) GoTypes 736M ± 4% 740M ± 4% ~ (p=0.121 n=50+49) SSA 8.82G ± 3% 8.88G ± 2% +0.65% (p=0.006 n=49+48) Flate 147M ± 4% 147M ± 5% ~ (p=0.844 n=47+48) GoParser 179M ± 4% 178M ± 6% ~ (p=0.785 n=50+50) Reflect 443M ± 6% 441M ± 5% ~ (p=0.850 n=48+47) Tar 126M ± 5% 126M ± 5% ~ (p=0.734 n=50+50) XML 244M ± 5% 244M ± 5% ~ (p=0.594 n=49+50) [Geo mean] 341M 341M +0.11% Change-Id: Ice962f61eb3a524c2db00a166cb582c22caa7d68 Reviewed-on: https://go-review.googlesource.com/39633 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-04 14:31:55 -07:00
linkpatch(ctxt, s, newprog)
ctxt.Arch.Preprocess(ctxt, s, newprog)
ctxt.Arch.Assemble(ctxt, s, newprog)
if ctxt.Errors > 0 {
continue
}
linkpcln(ctxt, s)
ctxt.populateDWARF(plist.Curfn, s, myimportpath)
}
}
cmd/compile: move Text.From.Sym initialization earlier The initialization of an ATEXT Prog's From.Sym can race with the assemblers in a concurrent compiler. CL 40254 contains an initial, failed attempt to fix that race. This CL takes a different approach: Rather than expose an API to initialize the Prog, expose an API to initialize the Sym. The initialization of the Sym can then be moved earlier in the compiler, avoiding the race. The growth of gc.Func has negligible performance impact; see below. Passes toolstash -cmp. Updates #15756 name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.8MB ± 0% ~ (p=0.968 n=9+10) Unicode 29.8MB ± 0% 29.8MB ± 0% ~ (p=0.684 n=10+10) GoTypes 113MB ± 0% 113MB ± 0% ~ (p=0.912 n=10+10) SSA 1.25GB ± 0% 1.25GB ± 0% ~ (p=0.481 n=10+10) Flate 25.3MB ± 0% 25.3MB ± 0% ~ (p=0.105 n=10+10) GoParser 31.7MB ± 0% 31.8MB ± 0% +0.09% (p=0.016 n=8+10) Reflect 78.3MB ± 0% 78.2MB ± 0% ~ (p=0.190 n=10+10) Tar 26.5MB ± 0% 26.6MB ± 0% +0.13% (p=0.011 n=10+10) XML 42.4MB ± 0% 42.4MB ± 0% ~ (p=0.971 n=10+10) name old allocs/op new allocs/op delta Template 378k ± 1% 378k ± 0% ~ (p=0.315 n=10+9) Unicode 321k ± 1% 321k ± 0% ~ (p=0.436 n=10+10) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.079 n=10+9) SSA 9.70M ± 0% 9.70M ± 0% -0.04% (p=0.035 n=10+10) Flate 233k ± 1% 234k ± 1% ~ (p=0.529 n=10+10) GoParser 315k ± 0% 316k ± 0% ~ (p=0.095 n=9+10) Reflect 980k ± 0% 980k ± 0% ~ (p=0.436 n=10+10) Tar 249k ± 1% 250k ± 0% ~ (p=0.280 n=10+10) XML 391k ± 1% 391k ± 1% ~ (p=0.481 n=10+10) Change-Id: I3c93033dddd2e1df8cc54a106a6e615d27859e71 Reviewed-on: https://go-review.googlesource.com/40496 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-12 13:23:07 -07:00
func (ctxt *Link) InitTextSym(s *LSym, flag int) {
if s == nil {
// func _() { }
return
}
if s.Func != nil {
ctxt.Diag("InitTextSym double init for %s", s.Name)
}
s.Func = new(FuncInfo)
if s.OnList() {
ctxt.Diag("symbol %s listed multiple times", s.Name)
}
s.Set(AttrOnList, true)
cmd/internal/obj: stop storing Text flags in From3 Prior to this CL, flags such as NOSPLIT on ATEXT Progs were stored in From3.Offset. Some but not all of those flags were also duplicated into From.Sym.Attribute. This CL migrates all of those flags into From.Sym.Attribute and stops creating a From3. A side-effect of this is that printing an ATEXT Prog can no longer simply dump From3.Offset. That's kind of good, since the raw flag value wasn't very informative anyway, but it did necessitate a bunch of updates to the cmd/asm tests. The reason I'm doing this work now is that avoiding storing flags in both From.Sym and From3.Offset simplifies some other changes to fix the data race first described in CL 40254. This CL almost passes toolstash-check -all. The only changes are in cases where the assembler has decided that a function's flags may be altered, e.g. to make a function with no calls in it NOSPLIT. Prior to this CL, that information was not printed. Sample before: "".Ctz64 t=1 size=63 args=0x10 locals=0x0 0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35) TEXT "".Ctz64(SB), $0-16 0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35) FUNCDATA $0, gclocals·f207267fbf96a0178e8758c6e3e0ce28(SB) Sample after: "".Ctz64 t=1 nosplit size=63 args=0x10 locals=0x0 0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35) TEXT "".Ctz64(SB), NOSPLIT, $0-16 0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35) FUNCDATA $0, gclocals·f207267fbf96a0178e8758c6e3e0ce28(SB) Observe the additional "nosplit" in the first line and the additional "NOSPLIT" in the second line. Updates #15756 Change-Id: I5c59bd8f3bdc7c780361f801d94a261f0aef3d13 Reviewed-on: https://go-review.googlesource.com/40495 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-11 15:15:04 -07:00
s.Set(AttrDuplicateOK, flag&DUPOK != 0)
s.Set(AttrNoSplit, flag&NOSPLIT != 0)
s.Set(AttrReflectMethod, flag&REFLECTMETHOD != 0)
s.Set(AttrWrapper, flag&WRAPPER != 0)
s.Set(AttrNeedCtxt, flag&NEEDCTXT != 0)
s.Set(AttrNoFrame, flag&NOFRAME != 0)
s.Set(AttrTopFrame, flag&TOPFRAME != 0)
s.Type = objabi.STEXT
ctxt.Text = append(ctxt.Text, s)
// Set up DWARF entries for s.
cmd/compile: remove isStmt symbol from FuncInfo As promised in CL 188238, removing the obsolete symbol. Here are the latest stats. This is baselined at "e53edafb66" with only these changes applied, run on magna.cam. The linker looks straight better (in memory and speed). There is still a change I'm working on walking the progs to generate the debug_lines data in the compiler. That will likely result in a compiler speedup. name old time/op new time/op delta Template 324ms ± 3% 317ms ± 3% -2.07% (p=0.043 n=10+10) Unicode 142ms ± 4% 144ms ± 3% ~ (p=0.393 n=10+10) GoTypes 1.05s ± 2% 1.07s ± 2% +1.59% (p=0.019 n=9+9) Compiler 4.09s ± 2% 4.11s ± 1% ~ (p=0.218 n=10+10) SSA 12.5s ± 1% 12.7s ± 1% +1.00% (p=0.035 n=10+10) Flate 199ms ± 7% 203ms ± 5% ~ (p=0.481 n=10+10) GoParser 245ms ± 3% 246ms ± 5% ~ (p=0.780 n=9+10) Reflect 672ms ± 4% 688ms ± 3% +2.42% (p=0.015 n=10+10) Tar 280ms ± 4% 284ms ± 4% ~ (p=0.123 n=10+10) XML 379ms ± 4% 381ms ± 2% ~ (p=0.529 n=10+10) LinkCompiler 1.16s ± 4% 1.12s ± 2% -3.03% (p=0.001 n=10+9) ExternalLinkCompiler 2.28s ± 3% 2.23s ± 3% -2.51% (p=0.011 n=8+9) LinkWithoutDebugCompiler 686ms ± 9% 667ms ± 2% ~ (p=0.277 n=9+8) StdCmd 14.1s ± 1% 14.0s ± 1% ~ (p=0.739 n=10+10) name old user-time/op new user-time/op delta Template 604ms ±23% 564ms ± 7% ~ (p=0.661 n=10+9) Unicode 429ms ±40% 418ms ±37% ~ (p=0.579 n=10+10) GoTypes 2.43s ±12% 2.51s ± 7% ~ (p=0.393 n=10+10) Compiler 9.22s ± 3% 9.27s ± 3% ~ (p=0.720 n=9+10) SSA 26.3s ± 3% 26.6s ± 2% ~ (p=0.579 n=10+10) Flate 328ms ±19% 333ms ±12% ~ (p=0.842 n=10+9) GoParser 387ms ± 5% 378ms ± 9% ~ (p=0.356 n=9+10) Reflect 1.36s ±20% 1.43s ±21% ~ (p=0.631 n=10+10) Tar 469ms ±12% 471ms ±21% ~ (p=0.497 n=9+10) XML 685ms ±18% 698ms ±19% ~ (p=0.739 n=10+10) LinkCompiler 1.86s ±10% 1.87s ±11% ~ (p=0.968 n=10+9) ExternalLinkCompiler 3.20s ±13% 3.01s ± 8% -5.70% (p=0.046 n=8+9) LinkWithoutDebugCompiler 1.08s ±15% 1.09s ±20% ~ (p=0.579 n=10+10) name old alloc/op new alloc/op delta Template 36.3MB ± 0% 36.4MB ± 0% +0.26% (p=0.000 n=10+10) Unicode 28.5MB ± 0% 28.5MB ± 0% ~ (p=0.165 n=10+10) GoTypes 120MB ± 0% 121MB ± 0% +0.29% (p=0.000 n=9+10) Compiler 546MB ± 0% 548MB ± 0% +0.32% (p=0.000 n=10+10) SSA 1.84GB ± 0% 1.85GB ± 0% +0.49% (p=0.000 n=10+10) Flate 22.9MB ± 0% 23.0MB ± 0% +0.25% (p=0.000 n=10+10) GoParser 27.8MB ± 0% 27.9MB ± 0% +0.25% (p=0.000 n=10+8) Reflect 77.5MB ± 0% 77.7MB ± 0% +0.27% (p=0.000 n=9+9) Tar 34.5MB ± 0% 34.6MB ± 0% +0.23% (p=0.000 n=10+10) XML 44.2MB ± 0% 44.4MB ± 0% +0.32% (p=0.000 n=10+10) LinkCompiler 239MB ± 0% 230MB ± 0% -3.86% (p=0.000 n=10+10) ExternalLinkCompiler 243MB ± 0% 243MB ± 0% +0.22% (p=0.000 n=10+10) LinkWithoutDebugCompiler 164MB ± 0% 155MB ± 0% -5.45% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 371k ± 0% 372k ± 0% +0.44% (p=0.000 n=10+10) Unicode 340k ± 0% 340k ± 0% +0.05% (p=0.000 n=10+10) GoTypes 1.32M ± 0% 1.32M ± 0% +0.46% (p=0.000 n=10+10) Compiler 5.34M ± 0% 5.37M ± 0% +0.59% (p=0.000 n=10+10) SSA 17.6M ± 0% 17.7M ± 0% +0.63% (p=0.000 n=10+10) Flate 233k ± 0% 234k ± 0% +0.48% (p=0.000 n=10+10) GoParser 309k ± 0% 310k ± 0% +0.40% (p=0.000 n=10+10) Reflect 964k ± 0% 969k ± 0% +0.54% (p=0.000 n=10+10) Tar 346k ± 0% 348k ± 0% +0.48% (p=0.000 n=10+9) XML 424k ± 0% 426k ± 0% +0.51% (p=0.000 n=10+10) LinkCompiler 751k ± 0% 645k ± 0% -14.13% (p=0.000 n=10+10) ExternalLinkCompiler 1.79M ± 0% 1.69M ± 0% -5.30% (p=0.000 n=10+10) LinkWithoutDebugCompiler 217k ± 0% 222k ± 0% +2.02% (p=0.000 n=10+10) name old object-bytes new object-bytes delta Template 547kB ± 0% 559kB ± 0% +2.17% (p=0.000 n=10+10) Unicode 215kB ± 0% 216kB ± 0% +0.60% (p=0.000 n=10+10) GoTypes 1.99MB ± 0% 2.03MB ± 0% +2.02% (p=0.000 n=10+10) Compiler 7.86MB ± 0% 8.07MB ± 0% +2.73% (p=0.000 n=10+10) SSA 26.4MB ± 0% 27.2MB ± 0% +3.27% (p=0.000 n=10+10) Flate 337kB ± 0% 343kB ± 0% +2.02% (p=0.000 n=10+10) GoParser 432kB ± 0% 441kB ± 0% +2.11% (p=0.000 n=10+10) Reflect 1.33MB ± 0% 1.36MB ± 0% +1.87% (p=0.000 n=10+10) Tar 477kB ± 0% 487kB ± 0% +2.24% (p=0.000 n=10+10) XML 617kB ± 0% 632kB ± 0% +2.33% (p=0.000 n=10+10) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 109kB ± 0% +0.09% (p=0.000 n=10+10) SSA 137kB ± 0% 137kB ± 0% +0.03% (p=0.000 n=10+10) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 760kB ± 0% 760kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.11MB ± 0% 1.13MB ± 0% +1.10% (p=0.000 n=10+10) CmdGoSize 14.9MB ± 0% 15.0MB ± 0% +0.77% (p=0.000 n=10+10) Change-Id: I42e6087cd6231dbdcfff5464e46d373474e455e1 Reviewed-on: https://go-review.googlesource.com/c/go/+/192417 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com>
2019-08-29 16:35:50 -04:00
info, loc, ranges, _, lines := ctxt.dwarfSym(s)
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
info.Type = objabi.SDWARFINFO
info.Set(AttrDuplicateOK, s.DuplicateOK())
if loc != nil {
loc.Type = objabi.SDWARFLOC
loc.Set(AttrDuplicateOK, s.DuplicateOK())
ctxt.Data = append(ctxt.Data, loc)
}
ranges.Type = objabi.SDWARFRANGE
ranges.Set(AttrDuplicateOK, s.DuplicateOK())
ctxt.Data = append(ctxt.Data, info, ranges)
lines.Type = objabi.SDWARFLINES
lines.Set(AttrDuplicateOK, s.DuplicateOK())
ctxt.Data = append(ctxt.Data, lines)
}
func (ctxt *Link) Globl(s *LSym, size int64, flag int) {
if s.SeenGlobl() {
fmt.Printf("duplicate %v\n", s)
}
s.Set(AttrSeenGlobl, true)
if s.OnList() {
ctxt.Diag("symbol %s listed multiple times", s.Name)
}
s.Set(AttrOnList, true)
ctxt.Data = append(ctxt.Data, s)
s.Size = size
if s.Type == 0 {
s.Type = objabi.SBSS
}
if flag&DUPOK != 0 {
s.Set(AttrDuplicateOK, true)
}
if flag&RODATA != 0 {
s.Type = objabi.SRODATA
} else if flag&NOPTR != 0 {
if s.Type == objabi.SDATA {
s.Type = objabi.SNOPTRDATA
} else {
s.Type = objabi.SNOPTRBSS
}
} else if flag&TLSBSS != 0 {
s.Type = objabi.STLSBSS
}
}
// EmitEntryLiveness generates PCDATA Progs after p to switch to the
// liveness map active at the entry of function s. It returns the last
// Prog generated.
func (ctxt *Link) EmitEntryLiveness(s *LSym, p *Prog, newprog ProgAlloc) *Prog {
pcdata := Appendp(p, newprog)
pcdata.Pos = s.Func.Text.Pos
pcdata.As = APCDATA
pcdata.From.Type = TYPE_CONST
pcdata.From.Offset = objabi.PCDATA_StackMapIndex
pcdata.To.Type = TYPE_CONST
pcdata.To.Offset = -1 // pcdata starts at -1 at function entry
cmd/compile, cmd/internal/obj: record register maps in binary This adds FUNCDATA and PCDATA that records the register maps much like the existing live arguments maps and live locals maps. The register map is indexed independently from the argument and locals maps since changes in register liveness tend not to correlate with changes to argument and local liveness. This is the final CL toward adding safe-points everywhere. The following CLs will optimize liveness analysis to bring down the cost. The effect of this CL is: name old time/op new time/op delta Template 195ms ± 2% 197ms ± 1% ~ (p=0.136 n=9+9) Unicode 98.4ms ± 2% 99.7ms ± 1% +1.39% (p=0.004 n=10+10) GoTypes 685ms ± 1% 700ms ± 1% +2.06% (p=0.000 n=9+9) Compiler 3.28s ± 1% 3.34s ± 0% +1.71% (p=0.000 n=9+8) SSA 7.79s ± 1% 7.91s ± 1% +1.55% (p=0.000 n=10+9) Flate 133ms ± 2% 133ms ± 2% ~ (p=0.190 n=10+10) GoParser 161ms ± 2% 164ms ± 3% +1.83% (p=0.015 n=10+10) Reflect 450ms ± 1% 457ms ± 1% +1.62% (p=0.000 n=10+10) Tar 183ms ± 2% 185ms ± 1% +0.91% (p=0.008 n=9+10) XML 234ms ± 1% 238ms ± 1% +1.60% (p=0.000 n=9+9) [Geo mean] 411ms 417ms +1.40% name old exe-bytes new exe-bytes delta HelloSize 1.47M ± 0% 1.51M ± 0% +2.79% (p=0.000 n=10+10) Compared to just before "cmd/internal/obj: consolidate emitting entry stack map", the cumulative effect of adding stack maps everywhere and register maps is: name old time/op new time/op delta Template 185ms ± 2% 197ms ± 1% +6.42% (p=0.000 n=10+9) Unicode 96.3ms ± 3% 99.7ms ± 1% +3.60% (p=0.000 n=10+10) GoTypes 658ms ± 0% 700ms ± 1% +6.37% (p=0.000 n=10+9) Compiler 3.14s ± 1% 3.34s ± 0% +6.53% (p=0.000 n=9+8) SSA 7.41s ± 2% 7.91s ± 1% +6.71% (p=0.000 n=9+9) Flate 126ms ± 1% 133ms ± 2% +6.15% (p=0.000 n=10+10) GoParser 153ms ± 1% 164ms ± 3% +6.89% (p=0.000 n=10+10) Reflect 437ms ± 1% 457ms ± 1% +4.59% (p=0.000 n=10+10) Tar 178ms ± 1% 185ms ± 1% +4.18% (p=0.000 n=10+10) XML 223ms ± 1% 238ms ± 1% +6.39% (p=0.000 n=10+9) [Geo mean] 394ms 417ms +5.78% name old alloc/op new alloc/op delta Template 34.5MB ± 0% 38.0MB ± 0% +10.19% (p=0.000 n=10+10) Unicode 29.3MB ± 0% 30.3MB ± 0% +3.56% (p=0.000 n=8+9) GoTypes 113MB ± 0% 125MB ± 0% +10.89% (p=0.000 n=10+10) Compiler 510MB ± 0% 575MB ± 0% +12.79% (p=0.000 n=10+10) SSA 1.46GB ± 0% 1.64GB ± 0% +12.40% (p=0.000 n=10+10) Flate 23.9MB ± 0% 25.9MB ± 0% +8.56% (p=0.000 n=10+10) GoParser 28.0MB ± 0% 30.8MB ± 0% +10.08% (p=0.000 n=10+10) Reflect 77.6MB ± 0% 84.3MB ± 0% +8.63% (p=0.000 n=10+10) Tar 34.1MB ± 0% 37.0MB ± 0% +8.44% (p=0.000 n=10+10) XML 42.7MB ± 0% 47.2MB ± 0% +10.75% (p=0.000 n=10+10) [Geo mean] 76.0MB 83.3MB +9.60% name old allocs/op new allocs/op delta Template 321k ± 0% 337k ± 0% +4.98% (p=0.000 n=10+10) Unicode 337k ± 0% 340k ± 0% +1.04% (p=0.000 n=10+9) GoTypes 1.13M ± 0% 1.18M ± 0% +4.85% (p=0.000 n=10+10) Compiler 4.67M ± 0% 4.96M ± 0% +6.25% (p=0.000 n=10+10) SSA 11.7M ± 0% 12.3M ± 0% +5.69% (p=0.000 n=10+10) Flate 216k ± 0% 226k ± 0% +4.52% (p=0.000 n=10+9) GoParser 271k ± 0% 283k ± 0% +4.52% (p=0.000 n=10+10) Reflect 927k ± 0% 972k ± 0% +4.78% (p=0.000 n=10+10) Tar 318k ± 0% 333k ± 0% +4.56% (p=0.000 n=10+10) XML 376k ± 0% 395k ± 0% +5.04% (p=0.000 n=10+10) [Geo mean] 730k 764k +4.61% name old exe-bytes new exe-bytes delta HelloSize 1.46M ± 0% 1.51M ± 0% +3.66% (p=0.000 n=10+10) For #24543. Change-Id: I91e003dc64151916b384274884bf02a2d6862547 Reviewed-on: https://go-review.googlesource.com/109353 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-03-27 15:50:45 -04:00
// Same, with register map.
pcdata = Appendp(pcdata, newprog)
pcdata.Pos = s.Func.Text.Pos
pcdata.As = APCDATA
pcdata.From.Type = TYPE_CONST
pcdata.From.Offset = objabi.PCDATA_RegMapIndex
pcdata.To.Type = TYPE_CONST
pcdata.To.Offset = -1
return pcdata
}