go/src/cmd/link/internal/ld/pcln.go

1097 lines
34 KiB
Go
Raw Normal View History

// Copyright 2013 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package ld
import (
"cmd/internal/goobj"
"cmd/internal/objabi"
"cmd/internal/sys"
"cmd/link/internal/loader"
"cmd/link/internal/sym"
"cmp"
cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata Generate inline code at defer time to save the args of defer calls to unique (autotmp) stack slots, and generate inline code at exit time to check which defer calls were made and make the associated function/method/interface calls. We remember that a particular defer statement was reached by storing in the deferBits variable (always stored on the stack). At exit time, we check the bits of the deferBits variable to determine which defer function calls to make (in reverse order). These low-cost defers are only used for functions where no defers appear in loops. In addition, we don't do these low-cost defers if there are too many defer statements or too many exits in a function (to limit code increase). When a function uses open-coded defers, we produce extra FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and for each defer, the stack slots where the closure and associated args have been stored. The funcdata also includes the location of the deferBits variable. Therefore, for panics, we can use this funcdata to determine exactly which defers are active, and call the appropriate functions/methods/closures with the correct arguments for each active defer. In order to unwind the stack correctly after a recover(), we need to add an extra code segment to functions with open-coded defers that simply calls deferreturn() and returns. This segment is not reachable by the normal function, but is returned to by the runtime during recovery. We set the liveness information of this deferreturn() to be the same as the liveness at the first function call during the last defer exit code (so all return values and all stack slots needed by the defer calls will be live). I needed to increase the stackguard constant from 880 to 896, because of a small amount of new code in deferreturn(). The -N flag disables open-coded defers. '-d defer' prints out the kind of defer being used at each defer statement (heap-allocated, stack-allocated, or open-coded). Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ] With normal (stack-allocated) defers only: 35.4 ns/op With open-coded defers: 5.6 ns/op Cost of function call alone (remove defer keyword): 4.4 ns/op Text size increase (including funcdata) for go binary without/with open-coded defers: 0.09% The average size increase (including funcdata) for only the functions that use open-coded defers is 1.1%. The cost of a panic followed by a recover got noticeably slower, since panic processing now requires a scan of the stack for open-coded defer frames. This scan is required, even if no frames are using open-coded defers: Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ] Without open-coded defers: 62.0 ns/op With open-coded defers: 255 ns/op A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers: CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ] Without open-coded defers: 443 ns/op With open-coded defers: 347 ns/op Updates #14939 (defer performance) Updates #34481 (design doc) Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff Reviewed-on: https://go-review.googlesource.com/c/go/+/202340 Reviewed-by: Austin Clements <austin@google.com>
2019-06-24 12:59:22 -07:00
"fmt"
"internal/abi"
"internal/buildcfg"
"path/filepath"
"slices"
"strings"
)
cmd/compile,cmd/link,runtime: add start line numbers to func metadata This adds the function "start line number" to runtime._func and runtime.inlinedCall objects. The "start line number" is the line number of the func keyword or TEXT directive for assembly. Subtracting the start line number from PC line number provides the relative line offset of a PC from the the start of the function. This helps with source stability by allowing code above the function to move without invalidating samples within the function. Encoding start line rather than relative lines directly is convenient because the pprof format already contains a start line field. This CL uses a straightforward encoding of explictly including a start line field in every _func and inlinedCall. It is possible that we could compress this further in the future. e.g., functions with a prologue usually have <line of PC 0> == <start line>. In runtime.test, 95% of functions have <line of PC 0> == <start line>. According to bent, this is geomean +0.83% binary size vs master and -0.31% binary size vs 1.19. Note that //line directives can change the file and line numbers arbitrarily. The encoded start line is as adjusted by //line directives. Since this can change in the middle of a function, `line - start line` offset calculations may not be meaningful if //line directives are in use. For #55022. Change-Id: Iaabbc6dd4f85ffdda294266ef982ae838cc692f6 Reviewed-on: https://go-review.googlesource.com/c/go/+/429638 Run-TryBot: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-09-07 13:23:19 -04:00
const funcSize = 11 * 4 // funcSize is the size of the _func object in runtime/runtime2.go
// pclntab holds the state needed for pclntab generation.
type pclntab struct {
// The first and last functions found.
firstFunc, lastFunc loader.Sym
// Running total size of pclntab.
size int64
// runtime.pclntab's symbols
carrier loader.Sym
pclntab loader.Sym
pcheader loader.Sym
funcnametab loader.Sym
findfunctab loader.Sym
cutab loader.Sym
filetab loader.Sym
pctab loader.Sym
funcdata loader.Sym
// The number of functions + number of TEXT sections - 1. This is such an
// unexpected value because platforms that have more than one TEXT section
// get a dummy function inserted between because the external linker can place
// functions in those areas. We mark those areas as not covered by the Go
// runtime.
//
// On most platforms this is the number of reachable functions.
nfunc int32
// The number of filenames in runtime.filetab.
nfiles uint32
}
// addGeneratedSym adds a generator symbol to pclntab, returning the new Sym.
// It is the caller's responsibility to save the symbol in state.
func (state *pclntab) addGeneratedSym(ctxt *Link, name string, size int64, f generatorFunc) loader.Sym {
size = Rnd(size, int64(ctxt.Arch.PtrSize))
state.size += size
s := ctxt.createGeneratorSymbol(name, 0, sym.SPCLNTAB, size, f)
ctxt.loader.SetAttrReachable(s, true)
ctxt.loader.SetCarrierSym(s, state.carrier)
ctxt.loader.SetAttrNotInSymbolTable(s, true)
return s
}
// makePclntab makes a pclntab object, and assembles all the compilation units
// we'll need to write pclntab. Returns the pclntab structure, a slice of the
// CompilationUnits we need, and a slice of the function symbols we need to
// generate pclntab.
func makePclntab(ctxt *Link, container loader.Bitmap) (*pclntab, []*sym.CompilationUnit, []loader.Sym) {
ldr := ctxt.loader
state := new(pclntab)
// Gather some basic stats and info.
seenCUs := make(map[*sym.CompilationUnit]struct{})
compUnits := []*sym.CompilationUnit{}
funcs := []loader.Sym{}
for _, s := range ctxt.Textp {
if !emitPcln(ctxt, s, container) {
continue
}
funcs = append(funcs, s)
state.nfunc++
if state.firstFunc == 0 {
state.firstFunc = s
}
state.lastFunc = s
// We need to keep track of all compilation units we see. Some symbols
// (eg, go.buildid, _cgoexp_, etc) won't have a compilation unit.
cu := ldr.SymUnit(s)
if _, ok := seenCUs[cu]; cu != nil && !ok {
seenCUs[cu] = struct{}{}
cu.PclnIndex = len(compUnits)
compUnits = append(compUnits, cu)
}
}
return state, compUnits, funcs
}
func emitPcln(ctxt *Link, s loader.Sym, container loader.Bitmap) bool {
if ctxt.Target.IsRISCV64() {
// Avoid adding local symbols to the pcln table - RISC-V
// linking generates a very large number of these, particularly
// for HI20 symbols (which we need to load in order to be able
// to resolve relocations). Unnecessarily including all of
// these symbols quickly blows out the size of the pcln table
// and overflows hash buckets.
symName := ctxt.loader.SymName(s)
if symName == "" || strings.HasPrefix(symName, ".L") {
return false
}
}
// We want to generate func table entries only for the "lowest
// level" symbols, not containers of subsymbols.
return !container.Has(s)
}
func computeDeferReturn(ctxt *Link, deferReturnSym, s loader.Sym) uint32 {
ldr := ctxt.loader
target := ctxt.Target
deferreturn := uint32(0)
lastWasmAddr := uint32(0)
relocs := ldr.Relocs(s)
for ri := 0; ri < relocs.Count(); ri++ {
r := relocs.At(ri)
if target.IsWasm() && r.Type() == objabi.R_ADDR {
[dev.typeparams] runtime,cmd/compile,cmd/link: replace jmpdefer with a loop Currently, deferreturn runs deferred functions by backing up its return PC to the deferreturn call, and then effectively tail-calling the deferred function (via jmpdefer). The effect of this is that the deferred function appears to be called directly from the deferee, and when it returns, the deferee calls deferreturn again so it can run the next deferred function if necessary. This unusual flow control leads to a large number of special cases and complications all over the tool chain. This used to be necessary because deferreturn copied the deferred function's argument frame directly into its caller's frame and then had to invoke that call as if it had been called from its caller's frame so it could access it arguments. But now that we've simplified defer processing so the runtime only deals with argument-less closures, this approach is no longer necessary. This CL simplifies all of this by making deferreturn simply call deferred functions in a loop. This eliminates the need for jmpdefer, so we can delete a bunch of per-architecture assembly code. This eliminates several special cases on Wasm, since it couldn't support these calling shenanigans directly and thus had to simulate the loop a different way. Now Wasm can largely work the way the other platforms do. This eliminates the per-architecture Ginsnopdefer operation. On PPC64, this was necessary to reload the TOC pointer after the tail call (since TOC pointers in general make tail calls impossible). The tail call is gone, and in the case where we do force a jump to the deferreturn call when recovering from an open-coded defer, we go through gogo (via runtime.recovery), which handles the TOC. On other platforms, we needed a NOP so traceback didn't get confused by seeing the return to the CALL instruction, rather than the usual return to the instruction following the CALL instruction. Now we don't inject a return to the CALL instruction at all, so this NOP is also unnecessary. The one potential effect of this is that deferreturn could now appear in stack traces from deferred functions. However, this could already happen from open-coded defers, so we've long since marked deferreturn as a "wrapper" so it gets elided not only from printed stack traces, but from runtime.Callers*. This is a retry of CL 337652 because we had to back out its parent. There are no changes in this version. Change-Id: I3f54b7fec1d7ccac71cc6cf6835c6a46b7e5fb6c Reviewed-on: https://go-review.googlesource.com/c/go/+/339397 Trust: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-07-26 15:44:22 -04:00
// wasm/ssa.go generates an ARESUMEPOINT just
// before the deferreturn call. The "PC" of
// the deferreturn call is stored in the
// R_ADDR relocation on the ARESUMEPOINT.
lastWasmAddr = uint32(r.Add())
}
if r.Type().IsDirectCall() && (r.Sym() == deferReturnSym || ldr.IsDeferReturnTramp(r.Sym())) {
if target.IsWasm() {
deferreturn = lastWasmAddr - 1
} else {
// Note: the relocation target is in the call instruction, but
// is not necessarily the whole instruction (for instance, on
// x86 the relocation applies to bytes [1:5] of the 5 byte call
// instruction).
deferreturn = uint32(r.Off())
switch target.Arch.Family {
cmd/compile, runtime: use PC of deferreturn for panic transfer this removes the old conditional-on-register-value handshake from the deferproc/deferprocstack logic. The "line" for the recovery-exit frame itself (not the defers that it runs) is the closing brace of the function. Reduces code size slightly (e.g. go command is 0.2% smaller) Sample output showing effect of this change, also what sort of code it requires to observe the effect: ``` package main import "os" func main() { g(len(os.Args) - 1) // stack[0] } var gi int var pi *int = &gi //go:noinline func g(i int) { switch i { case 0: defer func() { println("g0", i) q() // stack[2] if i == 0 }() for j := *pi; j < 1; j++ { defer func() { println("recover0", recover().(string)) }() } default: for j := *pi; j < 1; j++ { defer func() { println("g1", i) q() // stack[2] if i == 1 }() } defer func() { println("recover1", recover().(string)) }() } p() } // stack[1] (deferreturn) //go:noinline func p() { panic("p()") } //go:noinline func q() { panic("q()") // stack[3] } /* Sample output for "./foo foo": recover1 p() g1 1 panic: q() goroutine 1 [running]: main.q() .../main.go:46 +0x2c main.g.func3() .../main.go:29 +0x48 main.g(0x1?) .../main.go:37 +0x68 main.main() .../main.go:6 +0x28 */ ``` Change-Id: Ie39ea62ecc244213500380ea06d44024cadc2317 Reviewed-on: https://go-review.googlesource.com/c/go/+/650795 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-19 16:47:31 -05:00
case sys.I386:
deferreturn--
cmd/compile, runtime: use PC of deferreturn for panic transfer this removes the old conditional-on-register-value handshake from the deferproc/deferprocstack logic. The "line" for the recovery-exit frame itself (not the defers that it runs) is the closing brace of the function. Reduces code size slightly (e.g. go command is 0.2% smaller) Sample output showing effect of this change, also what sort of code it requires to observe the effect: ``` package main import "os" func main() { g(len(os.Args) - 1) // stack[0] } var gi int var pi *int = &gi //go:noinline func g(i int) { switch i { case 0: defer func() { println("g0", i) q() // stack[2] if i == 0 }() for j := *pi; j < 1; j++ { defer func() { println("recover0", recover().(string)) }() } default: for j := *pi; j < 1; j++ { defer func() { println("g1", i) q() // stack[2] if i == 1 }() } defer func() { println("recover1", recover().(string)) }() } p() } // stack[1] (deferreturn) //go:noinline func p() { panic("p()") } //go:noinline func q() { panic("q()") // stack[3] } /* Sample output for "./foo foo": recover1 p() g1 1 panic: q() goroutine 1 [running]: main.q() .../main.go:46 +0x2c main.g.func3() .../main.go:29 +0x48 main.g(0x1?) .../main.go:37 +0x68 main.main() .../main.go:6 +0x28 */ ``` Change-Id: Ie39ea62ecc244213500380ea06d44024cadc2317 Reviewed-on: https://go-review.googlesource.com/c/go/+/650795 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-19 16:47:31 -05:00
if ctxt.BuildMode == BuildModeShared || ctxt.linkShared || ctxt.BuildMode == BuildModePlugin {
// In this mode, we need to get the address from GOT,
// with two additional instructions like
//
// CALL __x86.get_pc_thunk.bx(SB) // 5 bytes
// LEAL _GLOBAL_OFFSET_TABLE_<>(BX), BX // 6 bytes
//
// We need to back off to the get_pc_thunk call.
// (See progedit in cmd/internal/obj/x86/obj6.go)
deferreturn -= 11
}
case sys.AMD64:
deferreturn--
case sys.ARM, sys.ARM64, sys.Loong64, sys.MIPS, sys.MIPS64, sys.PPC64, sys.RISCV64:
// no change
case sys.S390X:
deferreturn -= 2
default:
panic(fmt.Sprint("Unhandled architecture:", target.Arch.Family))
}
}
break // only need one
}
}
return deferreturn
}
// genInlTreeSym generates the InlTree sym for a function with the
// specified FuncInfo.
func genInlTreeSym(ctxt *Link, cu *sym.CompilationUnit, fi loader.FuncInfo, arch *sys.Arch, nameOffsets map[loader.Sym]uint32) loader.Sym {
ldr := ctxt.loader
its := ldr.CreateExtSym("", 0)
inlTreeSym := ldr.MakeSymbolUpdater(its)
// Note: the generated symbol is given a type of sym.SGOFUNC, as a
// signal to the symtab() phase that it needs to be grouped in with
// other similar symbols (gcdata, etc); the dodata() phase will
// eventually switch the type back to SRODATA.
inlTreeSym.SetType(sym.SPCLNTAB)
ldr.SetAttrReachable(its, true)
ldr.SetSymAlign(its, 4) // it has 32-bit fields
ninl := fi.NumInlTree()
for i := 0; i < int(ninl); i++ {
call := fi.InlTree(i)
nameOff, ok := nameOffsets[call.Func]
if !ok {
panic("couldn't find function name offset")
}
inlFunc := ldr.FuncInfo(call.Func)
var funcID abi.FuncID
cmd/compile,cmd/link,runtime: add start line numbers to func metadata This adds the function "start line number" to runtime._func and runtime.inlinedCall objects. The "start line number" is the line number of the func keyword or TEXT directive for assembly. Subtracting the start line number from PC line number provides the relative line offset of a PC from the the start of the function. This helps with source stability by allowing code above the function to move without invalidating samples within the function. Encoding start line rather than relative lines directly is convenient because the pprof format already contains a start line field. This CL uses a straightforward encoding of explictly including a start line field in every _func and inlinedCall. It is possible that we could compress this further in the future. e.g., functions with a prologue usually have <line of PC 0> == <start line>. In runtime.test, 95% of functions have <line of PC 0> == <start line>. According to bent, this is geomean +0.83% binary size vs master and -0.31% binary size vs 1.19. Note that //line directives can change the file and line numbers arbitrarily. The encoded start line is as adjusted by //line directives. Since this can change in the middle of a function, `line - start line` offset calculations may not be meaningful if //line directives are in use. For #55022. Change-Id: Iaabbc6dd4f85ffdda294266ef982ae838cc692f6 Reviewed-on: https://go-review.googlesource.com/c/go/+/429638 Run-TryBot: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-09-07 13:23:19 -04:00
startLine := int32(0)
if inlFunc.Valid() {
funcID = inlFunc.FuncID()
cmd/compile,cmd/link,runtime: add start line numbers to func metadata This adds the function "start line number" to runtime._func and runtime.inlinedCall objects. The "start line number" is the line number of the func keyword or TEXT directive for assembly. Subtracting the start line number from PC line number provides the relative line offset of a PC from the the start of the function. This helps with source stability by allowing code above the function to move without invalidating samples within the function. Encoding start line rather than relative lines directly is convenient because the pprof format already contains a start line field. This CL uses a straightforward encoding of explictly including a start line field in every _func and inlinedCall. It is possible that we could compress this further in the future. e.g., functions with a prologue usually have <line of PC 0> == <start line>. In runtime.test, 95% of functions have <line of PC 0> == <start line>. According to bent, this is geomean +0.83% binary size vs master and -0.31% binary size vs 1.19. Note that //line directives can change the file and line numbers arbitrarily. The encoded start line is as adjusted by //line directives. Since this can change in the middle of a function, `line - start line` offset calculations may not be meaningful if //line directives are in use. For #55022. Change-Id: Iaabbc6dd4f85ffdda294266ef982ae838cc692f6 Reviewed-on: https://go-review.googlesource.com/c/go/+/429638 Run-TryBot: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-09-07 13:23:19 -04:00
startLine = inlFunc.StartLine()
} else if !ctxt.linkShared {
// Inlined functions are always Go functions, and thus
// must have FuncInfo.
//
// Unfortunately, with -linkshared, the inlined
// function may be external symbols (from another
// shared library), and we don't load FuncInfo from the
// shared library. We will report potentially incorrect
// FuncID in this case. See https://go.dev/issue/55954.
panic(fmt.Sprintf("inlined function %s missing func info", ldr.SymName(call.Func)))
}
// Construct runtime.inlinedCall value.
cmd/compile,cmd/link,runtime: add start line numbers to func metadata This adds the function "start line number" to runtime._func and runtime.inlinedCall objects. The "start line number" is the line number of the func keyword or TEXT directive for assembly. Subtracting the start line number from PC line number provides the relative line offset of a PC from the the start of the function. This helps with source stability by allowing code above the function to move without invalidating samples within the function. Encoding start line rather than relative lines directly is convenient because the pprof format already contains a start line field. This CL uses a straightforward encoding of explictly including a start line field in every _func and inlinedCall. It is possible that we could compress this further in the future. e.g., functions with a prologue usually have <line of PC 0> == <start line>. In runtime.test, 95% of functions have <line of PC 0> == <start line>. According to bent, this is geomean +0.83% binary size vs master and -0.31% binary size vs 1.19. Note that //line directives can change the file and line numbers arbitrarily. The encoded start line is as adjusted by //line directives. Since this can change in the middle of a function, `line - start line` offset calculations may not be meaningful if //line directives are in use. For #55022. Change-Id: Iaabbc6dd4f85ffdda294266ef982ae838cc692f6 Reviewed-on: https://go-review.googlesource.com/c/go/+/429638 Run-TryBot: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-09-07 13:23:19 -04:00
const size = 16
inlTreeSym.SetUint8(arch, int64(i*size+0), uint8(funcID))
// Bytes 1-3 are unused.
inlTreeSym.SetUint32(arch, int64(i*size+4), nameOff)
inlTreeSym.SetUint32(arch, int64(i*size+8), uint32(call.ParentPC))
cmd/compile,cmd/link,runtime: add start line numbers to func metadata This adds the function "start line number" to runtime._func and runtime.inlinedCall objects. The "start line number" is the line number of the func keyword or TEXT directive for assembly. Subtracting the start line number from PC line number provides the relative line offset of a PC from the the start of the function. This helps with source stability by allowing code above the function to move without invalidating samples within the function. Encoding start line rather than relative lines directly is convenient because the pprof format already contains a start line field. This CL uses a straightforward encoding of explictly including a start line field in every _func and inlinedCall. It is possible that we could compress this further in the future. e.g., functions with a prologue usually have <line of PC 0> == <start line>. In runtime.test, 95% of functions have <line of PC 0> == <start line>. According to bent, this is geomean +0.83% binary size vs master and -0.31% binary size vs 1.19. Note that //line directives can change the file and line numbers arbitrarily. The encoded start line is as adjusted by //line directives. Since this can change in the middle of a function, `line - start line` offset calculations may not be meaningful if //line directives are in use. For #55022. Change-Id: Iaabbc6dd4f85ffdda294266ef982ae838cc692f6 Reviewed-on: https://go-review.googlesource.com/c/go/+/429638 Run-TryBot: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-09-07 13:23:19 -04:00
inlTreeSym.SetUint32(arch, int64(i*size+12), uint32(startLine))
}
return its
}
// makeInlSyms returns a map of loader.Sym that are created inlSyms.
func makeInlSyms(ctxt *Link, funcs []loader.Sym, nameOffsets map[loader.Sym]uint32) map[loader.Sym]loader.Sym {
ldr := ctxt.loader
// Create the inline symbols we need.
inlSyms := make(map[loader.Sym]loader.Sym)
for _, s := range funcs {
if fi := ldr.FuncInfo(s); fi.Valid() {
fi.Preload()
if fi.NumInlTree() > 0 {
inlSyms[s] = genInlTreeSym(ctxt, ldr.SymUnit(s), fi, ctxt.Arch, nameOffsets)
}
}
}
return inlSyms
}
// generatePCHeader creates the runtime.pcheader symbol, setting it up as a
// generator to fill in its data later.
func (state *pclntab) generatePCHeader(ctxt *Link) {
cmd/link, runtime: use offset for _func.entry The first field of the func data stored by the linker is the entry PC for the function. Prior to this change, this was stored as a relocation to the function. Change this to be an offset relative to runtime.text. This reduces the number of relocations on darwin/arm64 by about 10%. It also slightly shrinks binaries: file before after Δ % addr2line 3803058 3791298 -11760 -0.309% api 5140114 5104242 -35872 -0.698% asm 4886850 4840626 -46224 -0.946% buildid 2512466 2503042 -9424 -0.375% cgo 4374770 4342274 -32496 -0.743% compile 22920530 22769202 -151328 -0.660% cover 4624626 4588242 -36384 -0.787% dist 3217570 3205522 -12048 -0.374% doc 3715026 3684498 -30528 -0.822% fix 3148226 3119266 -28960 -0.920% link 6350226 6313362 -36864 -0.581% nm 3768850 3757106 -11744 -0.312% objdump 4140594 4127618 -12976 -0.313% pack 2227474 2218818 -8656 -0.389% pprof 13598706 13506786 -91920 -0.676% test2json 2497234 2487426 -9808 -0.393% trace 10198066 10118498 -79568 -0.780% vet 6930658 6889074 -41584 -0.600% total 108055044 107366900 -688144 -0.637% It should also incrementally speed up binary launching. This is the first step towards removing enough relocations that pages that were previously dirtied by the loader may remain clean, which will offer memory savings useful in constrained environments. Change-Id: Icfba55e696ba2f9c99c4f179125ba5a3ba4369c9 Reviewed-on: https://go-review.googlesource.com/c/go/+/351463 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-28 17:06:56 -07:00
ldr := ctxt.loader
size := int64(8 + 8*ctxt.Arch.PtrSize)
writeHeader := func(ctxt *Link, s loader.Sym) {
header := ctxt.loader.MakeSymbolUpdater(s)
writeSymOffset := func(off int64, ws loader.Sym) int64 {
diff := ldr.SymValue(ws) - ldr.SymValue(s)
if diff <= 0 {
name := ldr.SymName(ws)
panic(fmt.Sprintf("expected runtime.pcheader(%x) to be placed before %s(%x)", ldr.SymValue(s), name, ldr.SymValue(ws)))
}
return header.SetUintptr(ctxt.Arch, off, uintptr(diff))
}
// Write header.
cmd/link, runtime: use offset for _func.entry The first field of the func data stored by the linker is the entry PC for the function. Prior to this change, this was stored as a relocation to the function. Change this to be an offset relative to runtime.text. This reduces the number of relocations on darwin/arm64 by about 10%. It also slightly shrinks binaries: file before after Δ % addr2line 3803058 3791298 -11760 -0.309% api 5140114 5104242 -35872 -0.698% asm 4886850 4840626 -46224 -0.946% buildid 2512466 2503042 -9424 -0.375% cgo 4374770 4342274 -32496 -0.743% compile 22920530 22769202 -151328 -0.660% cover 4624626 4588242 -36384 -0.787% dist 3217570 3205522 -12048 -0.374% doc 3715026 3684498 -30528 -0.822% fix 3148226 3119266 -28960 -0.920% link 6350226 6313362 -36864 -0.581% nm 3768850 3757106 -11744 -0.312% objdump 4140594 4127618 -12976 -0.313% pack 2227474 2218818 -8656 -0.389% pprof 13598706 13506786 -91920 -0.676% test2json 2497234 2487426 -9808 -0.393% trace 10198066 10118498 -79568 -0.780% vet 6930658 6889074 -41584 -0.600% total 108055044 107366900 -688144 -0.637% It should also incrementally speed up binary launching. This is the first step towards removing enough relocations that pages that were previously dirtied by the loader may remain clean, which will offer memory savings useful in constrained environments. Change-Id: Icfba55e696ba2f9c99c4f179125ba5a3ba4369c9 Reviewed-on: https://go-review.googlesource.com/c/go/+/351463 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-28 17:06:56 -07:00
// Keep in sync with runtime/symtab.go:pcHeader and package debug/gosym.
header.SetUint32(ctxt.Arch, 0, 0xfffffff1)
header.SetUint8(ctxt.Arch, 6, uint8(ctxt.Arch.MinLC))
header.SetUint8(ctxt.Arch, 7, uint8(ctxt.Arch.PtrSize))
off := header.SetUint(ctxt.Arch, 8, uint64(state.nfunc))
off = header.SetUint(ctxt.Arch, off, uint64(state.nfiles))
off = header.SetUintptr(ctxt.Arch, off, 0) // unused
off = writeSymOffset(off, state.funcnametab)
off = writeSymOffset(off, state.cutab)
off = writeSymOffset(off, state.filetab)
off = writeSymOffset(off, state.pctab)
off = writeSymOffset(off, state.pclntab)
cmd/link, runtime: use offset for _func.entry The first field of the func data stored by the linker is the entry PC for the function. Prior to this change, this was stored as a relocation to the function. Change this to be an offset relative to runtime.text. This reduces the number of relocations on darwin/arm64 by about 10%. It also slightly shrinks binaries: file before after Δ % addr2line 3803058 3791298 -11760 -0.309% api 5140114 5104242 -35872 -0.698% asm 4886850 4840626 -46224 -0.946% buildid 2512466 2503042 -9424 -0.375% cgo 4374770 4342274 -32496 -0.743% compile 22920530 22769202 -151328 -0.660% cover 4624626 4588242 -36384 -0.787% dist 3217570 3205522 -12048 -0.374% doc 3715026 3684498 -30528 -0.822% fix 3148226 3119266 -28960 -0.920% link 6350226 6313362 -36864 -0.581% nm 3768850 3757106 -11744 -0.312% objdump 4140594 4127618 -12976 -0.313% pack 2227474 2218818 -8656 -0.389% pprof 13598706 13506786 -91920 -0.676% test2json 2497234 2487426 -9808 -0.393% trace 10198066 10118498 -79568 -0.780% vet 6930658 6889074 -41584 -0.600% total 108055044 107366900 -688144 -0.637% It should also incrementally speed up binary launching. This is the first step towards removing enough relocations that pages that were previously dirtied by the loader may remain clean, which will offer memory savings useful in constrained environments. Change-Id: Icfba55e696ba2f9c99c4f179125ba5a3ba4369c9 Reviewed-on: https://go-review.googlesource.com/c/go/+/351463 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-28 17:06:56 -07:00
if off != size {
panic(fmt.Sprintf("pcHeader size: %d != %d", off, size))
}
}
state.pcheader = state.addGeneratedSym(ctxt, "runtime.pcheader", size, writeHeader)
}
// walkFuncs iterates over the funcs, calling a function for each unique
// function and inlined function.
func walkFuncs(ctxt *Link, funcs []loader.Sym, f func(loader.Sym)) {
ldr := ctxt.loader
seen := make(map[loader.Sym]struct{})
for _, s := range funcs {
if _, ok := seen[s]; !ok {
f(s)
seen[s] = struct{}{}
}
fi := ldr.FuncInfo(s)
if !fi.Valid() {
continue
}
fi.Preload()
for i, ni := 0, fi.NumInlTree(); i < int(ni); i++ {
call := fi.InlTree(i).Func
if _, ok := seen[call]; !ok {
f(call)
seen[call] = struct{}{}
}
}
}
}
// generateFuncnametab creates the function name table. Returns a map of
// func symbol to the name offset in runtime.funcnamtab.
func (state *pclntab) generateFuncnametab(ctxt *Link, funcs []loader.Sym) map[loader.Sym]uint32 {
nameOffsets := make(map[loader.Sym]uint32, state.nfunc)
// Write the null terminated strings.
writeFuncNameTab := func(ctxt *Link, s loader.Sym) {
symtab := ctxt.loader.MakeSymbolUpdater(s)
for s, off := range nameOffsets {
symtab.AddCStringAt(int64(off), ctxt.loader.SymName(s))
}
}
// Loop through the CUs, and calculate the size needed.
var size int64
walkFuncs(ctxt, funcs, func(s loader.Sym) {
nameOffsets[s] = uint32(size)
size += int64(len(ctxt.loader.SymName(s)) + 1) // NULL terminate
})
state.funcnametab = state.addGeneratedSym(ctxt, "runtime.funcnametab", size, writeFuncNameTab)
return nameOffsets
}
// walkFilenames walks funcs, calling a function for each filename used in each
// function's line table.
func walkFilenames(ctxt *Link, funcs []loader.Sym, f func(*sym.CompilationUnit, goobj.CUFileIndex)) {
ldr := ctxt.loader
// Loop through all functions, finding the filenames we need.
for _, s := range funcs {
fi := ldr.FuncInfo(s)
if !fi.Valid() {
continue
}
fi.Preload()
cu := ldr.SymUnit(s)
for i, nf := 0, int(fi.NumFile()); i < nf; i++ {
f(cu, fi.File(i))
}
for i, ninl := 0, int(fi.NumInlTree()); i < ninl; i++ {
call := fi.InlTree(i)
f(cu, call.File)
}
}
}
// generateFilenameTabs creates LUTs needed for filename lookup. Returns a slice
// of the index at which each CU begins in runtime.cutab.
//
// Function objects keep track of the files they reference to print the stack.
// This function creates a per-CU list of filenames if CU[M] references
// files[1-N], the following is generated:
//
// runtime.cutab:
// CU[M]
// offsetToFilename[0]
// offsetToFilename[1]
// ..
//
// runtime.filetab
// filename[0]
// filename[1]
//
// Looking up a filename then becomes:
// 0. Given a func, and filename index [K]
// 1. Get Func.CUIndex: M := func.cuOffset
// 2. Find filename offset: fileOffset := runtime.cutab[M+K]
// 3. Get the filename: getcstring(runtime.filetab[fileOffset])
func (state *pclntab) generateFilenameTabs(ctxt *Link, compUnits []*sym.CompilationUnit, funcs []loader.Sym) []uint32 {
// On a per-CU basis, keep track of all the filenames we need.
//
// Note, that we store the filenames in a separate section in the object
// files, and deduplicate based on the actual value. It would be better to
// store the filenames as symbols, using content addressable symbols (and
// then not loading extra filenames), and just use the hash value of the
// symbol name to do this cataloging.
//
// TODO: Store filenames as symbols. (Note this would be easiest if you
// also move strings to ALWAYS using the larger content addressable hash
// function, and use that hash value for uniqueness testing.)
cuEntries := make([]goobj.CUFileIndex, len(compUnits))
fileOffsets := make(map[string]uint32)
// Walk the filenames.
// We store the total filename string length we need to load, and the max
// file index we've seen per CU so we can calculate how large the
// CU->global table needs to be.
var fileSize int64
walkFilenames(ctxt, funcs, func(cu *sym.CompilationUnit, i goobj.CUFileIndex) {
// Note we use the raw filename for lookup, but use the expanded filename
// when we save the size.
filename := cu.FileTable[i]
if _, ok := fileOffsets[filename]; !ok {
fileOffsets[filename] = uint32(fileSize)
fileSize += int64(len(expandFile(filename)) + 1) // NULL terminate
}
// Find the maximum file index we've seen.
if cuEntries[cu.PclnIndex] < i+1 {
cuEntries[cu.PclnIndex] = i + 1 // Store max + 1
}
})
// Calculate the size of the runtime.cutab variable.
var totalEntries uint32
cuOffsets := make([]uint32, len(cuEntries))
for i, entries := range cuEntries {
// Note, cutab is a slice of uint32, so an offset to a cu's entry is just the
// running total of all cu indices we've needed to store so far, not the
// number of bytes we've stored so far.
cuOffsets[i] = totalEntries
totalEntries += uint32(entries)
}
// Write cutab.
writeCutab := func(ctxt *Link, s loader.Sym) {
sb := ctxt.loader.MakeSymbolUpdater(s)
var off int64
for i, max := range cuEntries {
// Write the per CU LUT.
cu := compUnits[i]
for j := goobj.CUFileIndex(0); j < max; j++ {
fileOffset, ok := fileOffsets[cu.FileTable[j]]
if !ok {
// We're looping through all possible file indices. It's possible a file's
// been deadcode eliminated, and although it's a valid file in the CU, it's
// not needed in this binary. When that happens, use an invalid offset.
fileOffset = ^uint32(0)
}
off = sb.SetUint32(ctxt.Arch, off, fileOffset)
}
}
}
state.cutab = state.addGeneratedSym(ctxt, "runtime.cutab", int64(totalEntries*4), writeCutab)
// Write filetab.
writeFiletab := func(ctxt *Link, s loader.Sym) {
sb := ctxt.loader.MakeSymbolUpdater(s)
// Write the strings.
for filename, loc := range fileOffsets {
sb.AddStringAt(int64(loc), expandFile(filename))
}
}
state.nfiles = uint32(len(fileOffsets))
state.filetab = state.addGeneratedSym(ctxt, "runtime.filetab", fileSize, writeFiletab)
return cuOffsets
}
// generatePctab creates the runtime.pctab variable, holding all the
// deduplicated pcdata.
func (state *pclntab) generatePctab(ctxt *Link, funcs []loader.Sym) {
ldr := ctxt.loader
// Pctab offsets of 0 are considered invalid in the runtime. We respect
// that by just padding a single byte at the beginning of runtime.pctab,
// that way no real offsets can be zero.
size := int64(1)
// Walk the functions, finding offset to store each pcdata.
seen := make(map[loader.Sym]struct{})
saveOffset := func(pcSym loader.Sym) {
if _, ok := seen[pcSym]; !ok {
datSize := ldr.SymSize(pcSym)
if datSize != 0 {
ldr.SetSymValue(pcSym, size)
} else {
// Invalid PC data, record as zero.
ldr.SetSymValue(pcSym, 0)
}
size += datSize
seen[pcSym] = struct{}{}
}
}
var pcsp, pcline, pcfile, pcinline loader.Sym
var pcdata []loader.Sym
for _, s := range funcs {
fi := ldr.FuncInfo(s)
if !fi.Valid() {
continue
}
fi.Preload()
pcsp, pcfile, pcline, pcinline, pcdata = ldr.PcdataAuxs(s, pcdata)
pcSyms := []loader.Sym{pcsp, pcfile, pcline}
for _, pcSym := range pcSyms {
saveOffset(pcSym)
}
for _, pcSym := range pcdata {
saveOffset(pcSym)
}
if fi.NumInlTree() > 0 {
saveOffset(pcinline)
}
}
// TODO: There is no reason we need a generator for this variable, and it
// could be moved to a carrier symbol. However, carrier symbols containing
// carrier symbols don't work yet (as of Aug 2020). Once this is fixed,
// runtime.pctab could just be a carrier sym.
writePctab := func(ctxt *Link, s loader.Sym) {
ldr := ctxt.loader
sb := ldr.MakeSymbolUpdater(s)
for sym := range seen {
sb.SetBytesAt(ldr.SymValue(sym), ldr.Data(sym))
}
}
state.pctab = state.addGeneratedSym(ctxt, "runtime.pctab", size, writePctab)
}
// generateFuncdata writes out the funcdata information.
func (state *pclntab) generateFuncdata(ctxt *Link, funcs []loader.Sym, inlsyms map[loader.Sym]loader.Sym) {
ldr := ctxt.loader
// Walk the functions and collect the funcdata.
seen := make(map[loader.Sym]struct{}, len(funcs))
fdSyms := make([]loader.Sym, 0, len(funcs))
fd := []loader.Sym{}
for _, s := range funcs {
fi := ldr.FuncInfo(s)
if !fi.Valid() {
continue
}
fi.Preload()
fd := funcData(ldr, s, fi, inlsyms[s], fd)
for j, fdSym := range fd {
if ignoreFuncData(ldr, s, j, fdSym) {
continue
}
if _, ok := seen[fdSym]; !ok {
fdSyms = append(fdSyms, fdSym)
seen[fdSym] = struct{}{}
}
}
}
seen = nil
// Sort the funcdata in reverse order by alignment
// to minimize alignment gaps. Use a stable sort
// for reproducible results.
var maxAlign int32
slices.SortStableFunc(fdSyms, func(a, b loader.Sym) int {
aAlign := symalign(ldr, a)
bAlign := symalign(ldr, b)
// Remember maximum alignment.
maxAlign = max(maxAlign, aAlign, bAlign)
// Negate to sort by decreasing alignment.
return -cmp.Compare(aAlign, bAlign)
})
// We will output the symbols in the order of fdSyms.
// Set the value of each symbol to its offset in the funcdata.
// This way when writeFuncs writes out the funcdata offset,
// it can simply write out the symbol value.
// Accumulated size of funcdata info.
size := int64(0)
for _, fdSym := range fdSyms {
datSize := ldr.SymSize(fdSym)
if datSize == 0 {
ctxt.Errorf(fdSym, "zero size funcdata")
continue
}
size = Rnd(size, int64(symalign(ldr, fdSym)))
ldr.SetSymValue(fdSym, size)
size += datSize
// We do not put the funcdata symbols in the symbol table.
ldr.SetAttrNotInSymbolTable(fdSym, true)
// Mark the symbol as special so that it does not get
// adjusted by the section offset.
ldr.SetAttrSpecial(fdSym, true)
}
// Funcdata symbols are permitted to have R_ADDROFF relocations,
// which the linker can fully resolve.
resolveRelocs := func(ldr *loader.Loader, fdSym loader.Sym, data []byte) {
relocs := ldr.Relocs(fdSym)
for i := 0; i < relocs.Count(); i++ {
r := relocs.At(i)
if r.Type() != objabi.R_ADDROFF {
ctxt.Errorf(fdSym, "unsupported reloc %d (%s) for funcdata symbol", r.Type(), sym.RelocName(ctxt.Target.Arch, r.Type()))
return
}
if r.Siz() != 4 {
ctxt.Errorf(fdSym, "unsupported ADDROFF reloc size %d for funcdata symbol", r.Siz())
return
}
rs := r.Sym()
if r.Weak() && !ldr.AttrReachable(rs) {
return
}
sect := ldr.SymSect(rs)
if sect == nil {
ctxt.Errorf(fdSym, "missing section for relocation target %s for funcdata symbol", ldr.SymName(rs))
}
o := ldr.SymValue(rs)
if sect.Name != ".text" {
o -= int64(sect.Vaddr)
} else {
// With multiple .text sections the offset
// is from the start of the first one.
o -= int64(Segtext.Sections[0].Vaddr)
if ctxt.Target.IsWasm() {
if o&(1<<16-1) != 0 {
ctxt.Errorf(fdSym, "textoff relocation does not target function entry for funcdata symbol: %s %#x", ldr.SymName(rs), o)
}
o >>= 16
}
}
o += r.Add()
if o != int64(int32(o)) && o != int64(uint32(o)) {
ctxt.Errorf(fdSym, "ADDROFF relocation out of range for funcdata symbol: %#x", o)
}
ctxt.Target.Arch.ByteOrder.PutUint32(data[r.Off():], uint32(o))
}
}
writeFuncData := func(ctxt *Link, s loader.Sym) {
ldr := ctxt.loader
sb := ldr.MakeSymbolUpdater(s)
for _, fdSym := range fdSyms {
off := ldr.SymValue(fdSym)
fdSymData := ldr.Data(fdSym)
sb.SetBytesAt(off, fdSymData)
// Resolve any R_ADDROFF relocations.
resolveRelocs(ldr, fdSym, sb.Data()[off:off+int64(len(fdSymData))])
}
}
state.funcdata = state.addGeneratedSym(ctxt, "go:func.*", size, writeFuncData)
// Because the funcdata previously was not in pclntab,
// we need to keep the visible symbol so that tools can find it.
ldr.SetAttrNotInSymbolTable(state.funcdata, false)
}
// ignoreFuncData reports whether we should ignore a funcdata symbol.
//
// cmd/internal/obj optimistically populates ArgsPointerMaps and
// ArgInfo for assembly functions, hoping that the compiler will
// emit appropriate symbols from their Go stub declarations. If
// it didn't though, just ignore it.
//
// TODO(cherryyz): Fix arg map generation (see discussion on CL 523335).
func ignoreFuncData(ldr *loader.Loader, s loader.Sym, j int, fdSym loader.Sym) bool {
if fdSym == 0 {
return true
}
if (j == abi.FUNCDATA_ArgsPointerMaps || j == abi.FUNCDATA_ArgInfo) && ldr.IsFromAssembly(s) && ldr.Data(fdSym) == nil {
return true
}
return false
}
// numPCData returns the number of PCData syms for the FuncInfo.
// NB: Preload must be called on valid FuncInfos before calling this function.
func numPCData(ldr *loader.Loader, s loader.Sym, fi loader.FuncInfo) uint32 {
if !fi.Valid() {
return 0
}
numPCData := uint32(ldr.NumPcdata(s))
if fi.NumInlTree() > 0 {
if numPCData < abi.PCDATA_InlTreeIndex+1 {
numPCData = abi.PCDATA_InlTreeIndex + 1
}
}
return numPCData
}
// generateFunctab creates the runtime.functab
//
// runtime.functab contains two things:
//
// - pc->func look up table.
// - array of func objects, interleaved with pcdata and funcdata
func (state *pclntab) generateFunctab(ctxt *Link, funcs []loader.Sym, inlSyms map[loader.Sym]loader.Sym, cuOffsets []uint32, nameOffsets map[loader.Sym]uint32) {
// Calculate the size of the table.
size, startLocations := state.calculateFunctabSize(ctxt, funcs)
writePcln := func(ctxt *Link, s loader.Sym) {
ldr := ctxt.loader
sb := ldr.MakeSymbolUpdater(s)
// Write the data.
cmd/link,runtime: remove functab relocations Use an offset from runtime.text instead. This removes the last relocation from functab generation, which lets us simplify that code. size before after Δ % addr2line 3680818 3652498 -28320 -0.769% api 4944850 4892418 -52432 -1.060% asm 4757586 4711266 -46320 -0.974% buildid 2418546 2392578 -25968 -1.074% cgo 4197346 4164818 -32528 -0.775% compile 22076882 21875890 -200992 -0.910% cover 4411362 4358418 -52944 -1.200% dist 3091346 3062738 -28608 -0.925% doc 3563234 3532610 -30624 -0.859% fix 3020658 2991666 -28992 -0.960% link 6164642 6110834 -53808 -0.873% nm 3646818 3618482 -28336 -0.777% objdump 4012594 3983042 -29552 -0.736% pack 2153554 2128338 -25216 -1.171% pprof 13011666 12870114 -141552 -1.088% test2json 2383906 2357554 -26352 -1.105% trace 9736514 9631186 -105328 -1.082% vet 6655058 6580370 -74688 -1.122% total 103927380 102914820 -1012560 -0.974% relocs before after Δ % addr2line 25069 22709 -2360 -9.414% api 17176 13321 -3855 -22.444% asm 18271 15630 -2641 -14.455% buildid 9233 7352 -1881 -20.373% cgo 16222 13044 -3178 -19.591% compile 60421 46299 -14122 -23.373% cover 18479 14526 -3953 -21.392% dist 10135 7733 -2402 -23.700% doc 12735 9940 -2795 -21.947% fix 10820 8341 -2479 -22.911% link 21849 17785 -4064 -18.600% nm 24988 22642 -2346 -9.389% objdump 26060 23462 -2598 -9.969% pack 7665 5936 -1729 -22.557% pprof 60764 50998 -9766 -16.072% test2json 8389 6431 -1958 -23.340% trace 37180 29382 -7798 -20.974% vet 24044 19055 -4989 -20.749% total 409499 334585 -74914 -18.294% Caching the field size in debug/gosym.funcTab avoids a 20% PCToLine performance regression. name old time/op new time/op delta 115/LineToPC-8 56.4µs ± 3% 57.3µs ± 2% +1.66% (p=0.006 n=15+13) 115/PCToLine-8 188ns ± 2% 190ns ± 3% +1.46% (p=0.030 n=15+15) Change-Id: I2816a1b28e62b01852e3b306f08546f1e56cd5ac Reviewed-on: https://go-review.googlesource.com/c/go/+/352191 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org>
2021-09-24 10:57:37 -07:00
writePCToFunc(ctxt, sb, funcs, startLocations)
writeFuncs(ctxt, sb, funcs, inlSyms, startLocations, cuOffsets, nameOffsets)
}
state.pclntab = state.addGeneratedSym(ctxt, "runtime.functab", size, writePcln)
}
// funcData returns the funcdata and offsets for the FuncInfo.
// The funcdata are written into runtime.functab after each func
// object. This is a helper function to make querying the FuncInfo object
// cleaner.
//
// NB: Preload must be called on the FuncInfo before calling.
// NB: fdSyms is used as scratch space.
func funcData(ldr *loader.Loader, s loader.Sym, fi loader.FuncInfo, inlSym loader.Sym, fdSyms []loader.Sym) []loader.Sym {
fdSyms = fdSyms[:0]
if fi.Valid() {
fdSyms = ldr.Funcdata(s, fdSyms)
if fi.NumInlTree() > 0 {
if len(fdSyms) < abi.FUNCDATA_InlTree+1 {
fdSyms = append(fdSyms, make([]loader.Sym, abi.FUNCDATA_InlTree+1-len(fdSyms))...)
}
fdSyms[abi.FUNCDATA_InlTree] = inlSym
}
}
return fdSyms
}
// calculateFunctabSize calculates the size of the pclntab, and the offsets in
// the output buffer for individual func entries.
func (state pclntab) calculateFunctabSize(ctxt *Link, funcs []loader.Sym) (int64, []uint32) {
ldr := ctxt.loader
startLocations := make([]uint32, len(funcs))
cmd/link,runtime: remove functab relocations Use an offset from runtime.text instead. This removes the last relocation from functab generation, which lets us simplify that code. size before after Δ % addr2line 3680818 3652498 -28320 -0.769% api 4944850 4892418 -52432 -1.060% asm 4757586 4711266 -46320 -0.974% buildid 2418546 2392578 -25968 -1.074% cgo 4197346 4164818 -32528 -0.775% compile 22076882 21875890 -200992 -0.910% cover 4411362 4358418 -52944 -1.200% dist 3091346 3062738 -28608 -0.925% doc 3563234 3532610 -30624 -0.859% fix 3020658 2991666 -28992 -0.960% link 6164642 6110834 -53808 -0.873% nm 3646818 3618482 -28336 -0.777% objdump 4012594 3983042 -29552 -0.736% pack 2153554 2128338 -25216 -1.171% pprof 13011666 12870114 -141552 -1.088% test2json 2383906 2357554 -26352 -1.105% trace 9736514 9631186 -105328 -1.082% vet 6655058 6580370 -74688 -1.122% total 103927380 102914820 -1012560 -0.974% relocs before after Δ % addr2line 25069 22709 -2360 -9.414% api 17176 13321 -3855 -22.444% asm 18271 15630 -2641 -14.455% buildid 9233 7352 -1881 -20.373% cgo 16222 13044 -3178 -19.591% compile 60421 46299 -14122 -23.373% cover 18479 14526 -3953 -21.392% dist 10135 7733 -2402 -23.700% doc 12735 9940 -2795 -21.947% fix 10820 8341 -2479 -22.911% link 21849 17785 -4064 -18.600% nm 24988 22642 -2346 -9.389% objdump 26060 23462 -2598 -9.969% pack 7665 5936 -1729 -22.557% pprof 60764 50998 -9766 -16.072% test2json 8389 6431 -1958 -23.340% trace 37180 29382 -7798 -20.974% vet 24044 19055 -4989 -20.749% total 409499 334585 -74914 -18.294% Caching the field size in debug/gosym.funcTab avoids a 20% PCToLine performance regression. name old time/op new time/op delta 115/LineToPC-8 56.4µs ± 3% 57.3µs ± 2% +1.66% (p=0.006 n=15+13) 115/PCToLine-8 188ns ± 2% 190ns ± 3% +1.46% (p=0.030 n=15+15) Change-Id: I2816a1b28e62b01852e3b306f08546f1e56cd5ac Reviewed-on: https://go-review.googlesource.com/c/go/+/352191 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org>
2021-09-24 10:57:37 -07:00
// Allocate space for the pc->func table. This structure consists of a pc offset
// and an offset to the func structure. After that, we have a single pc
// value that marks the end of the last function in the binary.
cmd/link,runtime: remove functab relocations Use an offset from runtime.text instead. This removes the last relocation from functab generation, which lets us simplify that code. size before after Δ % addr2line 3680818 3652498 -28320 -0.769% api 4944850 4892418 -52432 -1.060% asm 4757586 4711266 -46320 -0.974% buildid 2418546 2392578 -25968 -1.074% cgo 4197346 4164818 -32528 -0.775% compile 22076882 21875890 -200992 -0.910% cover 4411362 4358418 -52944 -1.200% dist 3091346 3062738 -28608 -0.925% doc 3563234 3532610 -30624 -0.859% fix 3020658 2991666 -28992 -0.960% link 6164642 6110834 -53808 -0.873% nm 3646818 3618482 -28336 -0.777% objdump 4012594 3983042 -29552 -0.736% pack 2153554 2128338 -25216 -1.171% pprof 13011666 12870114 -141552 -1.088% test2json 2383906 2357554 -26352 -1.105% trace 9736514 9631186 -105328 -1.082% vet 6655058 6580370 -74688 -1.122% total 103927380 102914820 -1012560 -0.974% relocs before after Δ % addr2line 25069 22709 -2360 -9.414% api 17176 13321 -3855 -22.444% asm 18271 15630 -2641 -14.455% buildid 9233 7352 -1881 -20.373% cgo 16222 13044 -3178 -19.591% compile 60421 46299 -14122 -23.373% cover 18479 14526 -3953 -21.392% dist 10135 7733 -2402 -23.700% doc 12735 9940 -2795 -21.947% fix 10820 8341 -2479 -22.911% link 21849 17785 -4064 -18.600% nm 24988 22642 -2346 -9.389% objdump 26060 23462 -2598 -9.969% pack 7665 5936 -1729 -22.557% pprof 60764 50998 -9766 -16.072% test2json 8389 6431 -1958 -23.340% trace 37180 29382 -7798 -20.974% vet 24044 19055 -4989 -20.749% total 409499 334585 -74914 -18.294% Caching the field size in debug/gosym.funcTab avoids a 20% PCToLine performance regression. name old time/op new time/op delta 115/LineToPC-8 56.4µs ± 3% 57.3µs ± 2% +1.66% (p=0.006 n=15+13) 115/PCToLine-8 188ns ± 2% 190ns ± 3% +1.46% (p=0.030 n=15+15) Change-Id: I2816a1b28e62b01852e3b306f08546f1e56cd5ac Reviewed-on: https://go-review.googlesource.com/c/go/+/352191 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org>
2021-09-24 10:57:37 -07:00
size := int64(int(state.nfunc)*2*4 + 4)
// Now find the space for the func objects. We do this in a running manner,
// so that we can find individual starting locations.
for i, s := range funcs {
size = Rnd(size, int64(ctxt.Arch.PtrSize))
startLocations[i] = uint32(size)
fi := ldr.FuncInfo(s)
size += funcSize
if fi.Valid() {
fi.Preload()
numFuncData := ldr.NumFuncdata(s)
if fi.NumInlTree() > 0 {
if numFuncData < abi.FUNCDATA_InlTree+1 {
numFuncData = abi.FUNCDATA_InlTree + 1
}
}
size += int64(numPCData(ldr, s, fi) * 4)
cmd/link, runtime: convert FUNCDATA relocations to offsets Every function has associated numbered extra funcdata to another symbol. Prior to this change, a funcdata pointer was stored as a relocation. This change alters this to be an offset relative to go.func.* or go.funcrel.*. This reduces the number of relocations on darwin/arm64 by about 40%. It also shrinks externally linked binaries. On darwin/arm64: size before after Δ % addr2line 3788498 3699730 -88768 -2.343% api 5100018 4951074 -148944 -2.920% asm 4855234 4744274 -110960 -2.285% buildid 2500162 2419986 -80176 -3.207% cgo 4338258 4218306 -119952 -2.765% compile 22764418 22132226 -632192 -2.777% cover 4583186 4432770 -150416 -3.282% dist 3200962 3094626 -106336 -3.322% doc 3680402 3583602 -96800 -2.630% fix 3114914 3023922 -90992 -2.921% link 6308578 6154786 -153792 -2.438% nm 3754338 3665826 -88512 -2.358% objdump 4124738 4015234 -109504 -2.655% pack 2232626 2155010 -77616 -3.476% pprof 13497474 13044066 -453408 -3.359% test2json 2483810 2402146 -81664 -3.288% trace 10108898 9748802 -360096 -3.562% vet 6884322 6681314 -203008 -2.949% total 107320836 104167700 -3153136 -2.938% relocs before after Δ % addr2line 33357 25563 -7794 -23.365% api 31589 18409 -13180 -41.723% asm 27825 18904 -8921 -32.061% buildid 15603 9513 -6090 -39.031% cgo 27809 17103 -10706 -38.498% compile 114769 64829 -49940 -43.513% cover 32932 19462 -13470 -40.902% dist 18797 10796 -8001 -42.565% doc 22891 13503 -9388 -41.012% fix 19700 11465 -8235 -41.802% link 37324 23198 -14126 -37.847% nm 33226 25480 -7746 -23.313% objdump 35237 26610 -8627 -24.483% pack 13535 7951 -5584 -41.256% pprof 97986 63961 -34025 -34.724% test2json 15113 8735 -6378 -42.202% trace 66786 39636 -27150 -40.652% vet 43328 25971 -17357 -40.060% total 687806 431088 -256718 -37.324% It should also incrementally speed up binary launching and may reduce linker memory use. This is another step towards removing relocations so that pages that were previously dirtied by the loader may remain clean, which will offer memory savings useful in constrained environments like iOS. Removing the relocations in .stkobj symbols will allow some simplifications. There will be no references into go.funcrel.*, so we will no longer need to use the bottom bit to distinguish offset bases. Change-Id: I83d34c1701d6f3f515b9905941477d522441019d Reviewed-on: https://go-review.googlesource.com/c/go/+/352110 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-10-01 16:35:43 -07:00
size += int64(numFuncData * 4)
}
}
return size, startLocations
}
// textOff computes the offset of a text symbol, relative to textStart,
// similar to an R_ADDROFF relocation, for various runtime metadata and
// tables (see runtime/symtab.go:(*moduledata).textAddr).
func textOff(ctxt *Link, s loader.Sym, textStart int64) uint32 {
ldr := ctxt.loader
off := ldr.SymValue(s) - textStart
if off < 0 {
panic(fmt.Sprintf("expected func %s(%x) to be placed at or after textStart (%x)", ldr.SymName(s), ldr.SymValue(s), textStart))
}
if ctxt.IsWasm() {
// On Wasm, the function table contains just the function index, whereas
// the "PC" (s's Value) is function index << 16 + block index (see
// ../wasm/asm.go:assignAddress).
if off&(1<<16-1) != 0 {
ctxt.Errorf(s, "nonzero PC_B at function entry: %#x", off)
}
off >>= 16
}
if int64(uint32(off)) != off {
ctxt.Errorf(s, "textOff overflow: %#x", off)
}
return uint32(off)
}
// writePCToFunc writes the PC->func lookup table.
cmd/link,runtime: remove functab relocations Use an offset from runtime.text instead. This removes the last relocation from functab generation, which lets us simplify that code. size before after Δ % addr2line 3680818 3652498 -28320 -0.769% api 4944850 4892418 -52432 -1.060% asm 4757586 4711266 -46320 -0.974% buildid 2418546 2392578 -25968 -1.074% cgo 4197346 4164818 -32528 -0.775% compile 22076882 21875890 -200992 -0.910% cover 4411362 4358418 -52944 -1.200% dist 3091346 3062738 -28608 -0.925% doc 3563234 3532610 -30624 -0.859% fix 3020658 2991666 -28992 -0.960% link 6164642 6110834 -53808 -0.873% nm 3646818 3618482 -28336 -0.777% objdump 4012594 3983042 -29552 -0.736% pack 2153554 2128338 -25216 -1.171% pprof 13011666 12870114 -141552 -1.088% test2json 2383906 2357554 -26352 -1.105% trace 9736514 9631186 -105328 -1.082% vet 6655058 6580370 -74688 -1.122% total 103927380 102914820 -1012560 -0.974% relocs before after Δ % addr2line 25069 22709 -2360 -9.414% api 17176 13321 -3855 -22.444% asm 18271 15630 -2641 -14.455% buildid 9233 7352 -1881 -20.373% cgo 16222 13044 -3178 -19.591% compile 60421 46299 -14122 -23.373% cover 18479 14526 -3953 -21.392% dist 10135 7733 -2402 -23.700% doc 12735 9940 -2795 -21.947% fix 10820 8341 -2479 -22.911% link 21849 17785 -4064 -18.600% nm 24988 22642 -2346 -9.389% objdump 26060 23462 -2598 -9.969% pack 7665 5936 -1729 -22.557% pprof 60764 50998 -9766 -16.072% test2json 8389 6431 -1958 -23.340% trace 37180 29382 -7798 -20.974% vet 24044 19055 -4989 -20.749% total 409499 334585 -74914 -18.294% Caching the field size in debug/gosym.funcTab avoids a 20% PCToLine performance regression. name old time/op new time/op delta 115/LineToPC-8 56.4µs ± 3% 57.3µs ± 2% +1.66% (p=0.006 n=15+13) 115/PCToLine-8 188ns ± 2% 190ns ± 3% +1.46% (p=0.030 n=15+15) Change-Id: I2816a1b28e62b01852e3b306f08546f1e56cd5ac Reviewed-on: https://go-review.googlesource.com/c/go/+/352191 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org>
2021-09-24 10:57:37 -07:00
func writePCToFunc(ctxt *Link, sb *loader.SymbolBuilder, funcs []loader.Sym, startLocations []uint32) {
ldr := ctxt.loader
cmd/link,runtime: remove functab relocations Use an offset from runtime.text instead. This removes the last relocation from functab generation, which lets us simplify that code. size before after Δ % addr2line 3680818 3652498 -28320 -0.769% api 4944850 4892418 -52432 -1.060% asm 4757586 4711266 -46320 -0.974% buildid 2418546 2392578 -25968 -1.074% cgo 4197346 4164818 -32528 -0.775% compile 22076882 21875890 -200992 -0.910% cover 4411362 4358418 -52944 -1.200% dist 3091346 3062738 -28608 -0.925% doc 3563234 3532610 -30624 -0.859% fix 3020658 2991666 -28992 -0.960% link 6164642 6110834 -53808 -0.873% nm 3646818 3618482 -28336 -0.777% objdump 4012594 3983042 -29552 -0.736% pack 2153554 2128338 -25216 -1.171% pprof 13011666 12870114 -141552 -1.088% test2json 2383906 2357554 -26352 -1.105% trace 9736514 9631186 -105328 -1.082% vet 6655058 6580370 -74688 -1.122% total 103927380 102914820 -1012560 -0.974% relocs before after Δ % addr2line 25069 22709 -2360 -9.414% api 17176 13321 -3855 -22.444% asm 18271 15630 -2641 -14.455% buildid 9233 7352 -1881 -20.373% cgo 16222 13044 -3178 -19.591% compile 60421 46299 -14122 -23.373% cover 18479 14526 -3953 -21.392% dist 10135 7733 -2402 -23.700% doc 12735 9940 -2795 -21.947% fix 10820 8341 -2479 -22.911% link 21849 17785 -4064 -18.600% nm 24988 22642 -2346 -9.389% objdump 26060 23462 -2598 -9.969% pack 7665 5936 -1729 -22.557% pprof 60764 50998 -9766 -16.072% test2json 8389 6431 -1958 -23.340% trace 37180 29382 -7798 -20.974% vet 24044 19055 -4989 -20.749% total 409499 334585 -74914 -18.294% Caching the field size in debug/gosym.funcTab avoids a 20% PCToLine performance regression. name old time/op new time/op delta 115/LineToPC-8 56.4µs ± 3% 57.3µs ± 2% +1.66% (p=0.006 n=15+13) 115/PCToLine-8 188ns ± 2% 190ns ± 3% +1.46% (p=0.030 n=15+15) Change-Id: I2816a1b28e62b01852e3b306f08546f1e56cd5ac Reviewed-on: https://go-review.googlesource.com/c/go/+/352191 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org>
2021-09-24 10:57:37 -07:00
textStart := ldr.SymValue(ldr.Lookup("runtime.text", 0))
pcOff := func(s loader.Sym) uint32 {
return textOff(ctxt, s, textStart)
cmd/link,runtime: remove functab relocations Use an offset from runtime.text instead. This removes the last relocation from functab generation, which lets us simplify that code. size before after Δ % addr2line 3680818 3652498 -28320 -0.769% api 4944850 4892418 -52432 -1.060% asm 4757586 4711266 -46320 -0.974% buildid 2418546 2392578 -25968 -1.074% cgo 4197346 4164818 -32528 -0.775% compile 22076882 21875890 -200992 -0.910% cover 4411362 4358418 -52944 -1.200% dist 3091346 3062738 -28608 -0.925% doc 3563234 3532610 -30624 -0.859% fix 3020658 2991666 -28992 -0.960% link 6164642 6110834 -53808 -0.873% nm 3646818 3618482 -28336 -0.777% objdump 4012594 3983042 -29552 -0.736% pack 2153554 2128338 -25216 -1.171% pprof 13011666 12870114 -141552 -1.088% test2json 2383906 2357554 -26352 -1.105% trace 9736514 9631186 -105328 -1.082% vet 6655058 6580370 -74688 -1.122% total 103927380 102914820 -1012560 -0.974% relocs before after Δ % addr2line 25069 22709 -2360 -9.414% api 17176 13321 -3855 -22.444% asm 18271 15630 -2641 -14.455% buildid 9233 7352 -1881 -20.373% cgo 16222 13044 -3178 -19.591% compile 60421 46299 -14122 -23.373% cover 18479 14526 -3953 -21.392% dist 10135 7733 -2402 -23.700% doc 12735 9940 -2795 -21.947% fix 10820 8341 -2479 -22.911% link 21849 17785 -4064 -18.600% nm 24988 22642 -2346 -9.389% objdump 26060 23462 -2598 -9.969% pack 7665 5936 -1729 -22.557% pprof 60764 50998 -9766 -16.072% test2json 8389 6431 -1958 -23.340% trace 37180 29382 -7798 -20.974% vet 24044 19055 -4989 -20.749% total 409499 334585 -74914 -18.294% Caching the field size in debug/gosym.funcTab avoids a 20% PCToLine performance regression. name old time/op new time/op delta 115/LineToPC-8 56.4µs ± 3% 57.3µs ± 2% +1.66% (p=0.006 n=15+13) 115/PCToLine-8 188ns ± 2% 190ns ± 3% +1.46% (p=0.030 n=15+15) Change-Id: I2816a1b28e62b01852e3b306f08546f1e56cd5ac Reviewed-on: https://go-review.googlesource.com/c/go/+/352191 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org>
2021-09-24 10:57:37 -07:00
}
for i, s := range funcs {
sb.SetUint32(ctxt.Arch, int64(i*2*4), pcOff(s))
sb.SetUint32(ctxt.Arch, int64((i*2+1)*4), startLocations[i])
}
cmd/compile,link: generate PC-value tables with inlining information In order to generate accurate tracebacks, the runtime needs to know the inlined call stack for a given PC. This creates two tables per function for this purpose. The first table is the inlining tree (stored in the function's funcdata), which has a node containing the file, line, and function name for every inlined call. The second table is a PC-value table that maps each PC to a node in the inlining tree (or -1 if the PC is not the result of inlining). To give the appearance that inlining hasn't happened, the runtime also needs the original source position information of inlined AST nodes. Previously the compiler plastered over the line numbers of inlined AST nodes with the line number of the call. This meant that the PC-line table mapped each PC to line number of the outermost call in its inlined call stack, with no way to access the innermost line number. Now the compiler retains line numbers of inlined AST nodes and writes the innermost source position information to the PC-line and PC-file tables. Some tools and tests expect to see outermost line numbers, so we provide the OutermostLine function for displaying line info. To keep track of the inlined call stack for an AST node, we extend the src.PosBase type with an index into a global inlining tree. Every time the compiler inlines a call, it creates a node in the global inlining tree for the call, and writes its index to the PosBase of every inlined AST node. The parent of this node is the inlining tree index of the call. -1 signifies no parent. For each function, the compiler creates a local inlining tree and a PC-value table mapping each PC to an index in the local tree. These are written to an object file, which is read by the linker. The linker re-encodes these tables compactly by deduplicating function names and file names. This change increases the size of binaries by 4-5%. For example, this is how the go1 benchmark binary is impacted by this change: section old bytes new bytes delta .text 3.49M ± 0% 3.49M ± 0% +0.06% .rodata 1.12M ± 0% 1.21M ± 0% +8.21% .gopclntab 1.50M ± 0% 1.68M ± 0% +11.89% .debug_line 338k ± 0% 435k ± 0% +28.78% Total 9.21M ± 0% 9.58M ± 0% +4.01% Updates #19348. Change-Id: Ic4f180c3b516018138236b0c35e0218270d957d3 Reviewed-on: https://go-review.googlesource.com/37231 Run-TryBot: David Lazar <lazard@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2017-02-17 12:28:05 -05:00
cmd/link,runtime: remove functab relocations Use an offset from runtime.text instead. This removes the last relocation from functab generation, which lets us simplify that code. size before after Δ % addr2line 3680818 3652498 -28320 -0.769% api 4944850 4892418 -52432 -1.060% asm 4757586 4711266 -46320 -0.974% buildid 2418546 2392578 -25968 -1.074% cgo 4197346 4164818 -32528 -0.775% compile 22076882 21875890 -200992 -0.910% cover 4411362 4358418 -52944 -1.200% dist 3091346 3062738 -28608 -0.925% doc 3563234 3532610 -30624 -0.859% fix 3020658 2991666 -28992 -0.960% link 6164642 6110834 -53808 -0.873% nm 3646818 3618482 -28336 -0.777% objdump 4012594 3983042 -29552 -0.736% pack 2153554 2128338 -25216 -1.171% pprof 13011666 12870114 -141552 -1.088% test2json 2383906 2357554 -26352 -1.105% trace 9736514 9631186 -105328 -1.082% vet 6655058 6580370 -74688 -1.122% total 103927380 102914820 -1012560 -0.974% relocs before after Δ % addr2line 25069 22709 -2360 -9.414% api 17176 13321 -3855 -22.444% asm 18271 15630 -2641 -14.455% buildid 9233 7352 -1881 -20.373% cgo 16222 13044 -3178 -19.591% compile 60421 46299 -14122 -23.373% cover 18479 14526 -3953 -21.392% dist 10135 7733 -2402 -23.700% doc 12735 9940 -2795 -21.947% fix 10820 8341 -2479 -22.911% link 21849 17785 -4064 -18.600% nm 24988 22642 -2346 -9.389% objdump 26060 23462 -2598 -9.969% pack 7665 5936 -1729 -22.557% pprof 60764 50998 -9766 -16.072% test2json 8389 6431 -1958 -23.340% trace 37180 29382 -7798 -20.974% vet 24044 19055 -4989 -20.749% total 409499 334585 -74914 -18.294% Caching the field size in debug/gosym.funcTab avoids a 20% PCToLine performance regression. name old time/op new time/op delta 115/LineToPC-8 56.4µs ± 3% 57.3µs ± 2% +1.66% (p=0.006 n=15+13) 115/PCToLine-8 188ns ± 2% 190ns ± 3% +1.46% (p=0.030 n=15+15) Change-Id: I2816a1b28e62b01852e3b306f08546f1e56cd5ac Reviewed-on: https://go-review.googlesource.com/c/go/+/352191 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org>
2021-09-24 10:57:37 -07:00
// Final entry of table is just end pc offset.
lastFunc := funcs[len(funcs)-1]
lastPC := pcOff(lastFunc) + uint32(ldr.SymSize(lastFunc))
if ctxt.IsWasm() {
lastPC = pcOff(lastFunc) + 1 // On Wasm it is function index (see above)
}
sb.SetUint32(ctxt.Arch, int64(len(funcs))*2*4, lastPC)
}
// writeFuncs writes the func structures and pcdata to runtime.functab.
func writeFuncs(ctxt *Link, sb *loader.SymbolBuilder, funcs []loader.Sym, inlSyms map[loader.Sym]loader.Sym, startLocations, cuOffsets []uint32, nameOffsets map[loader.Sym]uint32) {
ldr := ctxt.loader
deferReturnSym := ldr.Lookup("runtime.deferreturn", abiInternalVer)
cmd/link, runtime: use offset for _func.entry The first field of the func data stored by the linker is the entry PC for the function. Prior to this change, this was stored as a relocation to the function. Change this to be an offset relative to runtime.text. This reduces the number of relocations on darwin/arm64 by about 10%. It also slightly shrinks binaries: file before after Δ % addr2line 3803058 3791298 -11760 -0.309% api 5140114 5104242 -35872 -0.698% asm 4886850 4840626 -46224 -0.946% buildid 2512466 2503042 -9424 -0.375% cgo 4374770 4342274 -32496 -0.743% compile 22920530 22769202 -151328 -0.660% cover 4624626 4588242 -36384 -0.787% dist 3217570 3205522 -12048 -0.374% doc 3715026 3684498 -30528 -0.822% fix 3148226 3119266 -28960 -0.920% link 6350226 6313362 -36864 -0.581% nm 3768850 3757106 -11744 -0.312% objdump 4140594 4127618 -12976 -0.313% pack 2227474 2218818 -8656 -0.389% pprof 13598706 13506786 -91920 -0.676% test2json 2497234 2487426 -9808 -0.393% trace 10198066 10118498 -79568 -0.780% vet 6930658 6889074 -41584 -0.600% total 108055044 107366900 -688144 -0.637% It should also incrementally speed up binary launching. This is the first step towards removing enough relocations that pages that were previously dirtied by the loader may remain clean, which will offer memory savings useful in constrained environments. Change-Id: Icfba55e696ba2f9c99c4f179125ba5a3ba4369c9 Reviewed-on: https://go-review.googlesource.com/c/go/+/351463 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-28 17:06:56 -07:00
textStart := ldr.SymValue(ldr.Lookup("runtime.text", 0))
funcdata := []loader.Sym{}
var pcsp, pcfile, pcline, pcinline loader.Sym
var pcdata []loader.Sym
// Write the individual func objects (runtime._func struct).
for i, s := range funcs {
cmd/compile,cmd/link,runtime: add start line numbers to func metadata This adds the function "start line number" to runtime._func and runtime.inlinedCall objects. The "start line number" is the line number of the func keyword or TEXT directive for assembly. Subtracting the start line number from PC line number provides the relative line offset of a PC from the the start of the function. This helps with source stability by allowing code above the function to move without invalidating samples within the function. Encoding start line rather than relative lines directly is convenient because the pprof format already contains a start line field. This CL uses a straightforward encoding of explictly including a start line field in every _func and inlinedCall. It is possible that we could compress this further in the future. e.g., functions with a prologue usually have <line of PC 0> == <start line>. In runtime.test, 95% of functions have <line of PC 0> == <start line>. According to bent, this is geomean +0.83% binary size vs master and -0.31% binary size vs 1.19. Note that //line directives can change the file and line numbers arbitrarily. The encoded start line is as adjusted by //line directives. Since this can change in the middle of a function, `line - start line` offset calculations may not be meaningful if //line directives are in use. For #55022. Change-Id: Iaabbc6dd4f85ffdda294266ef982ae838cc692f6 Reviewed-on: https://go-review.googlesource.com/c/go/+/429638 Run-TryBot: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-09-07 13:23:19 -04:00
startLine := int32(0)
fi := ldr.FuncInfo(s)
if fi.Valid() {
fi.Preload()
pcsp, pcfile, pcline, pcinline, pcdata = ldr.PcdataAuxs(s, pcdata)
cmd/compile,cmd/link,runtime: add start line numbers to func metadata This adds the function "start line number" to runtime._func and runtime.inlinedCall objects. The "start line number" is the line number of the func keyword or TEXT directive for assembly. Subtracting the start line number from PC line number provides the relative line offset of a PC from the the start of the function. This helps with source stability by allowing code above the function to move without invalidating samples within the function. Encoding start line rather than relative lines directly is convenient because the pprof format already contains a start line field. This CL uses a straightforward encoding of explictly including a start line field in every _func and inlinedCall. It is possible that we could compress this further in the future. e.g., functions with a prologue usually have <line of PC 0> == <start line>. In runtime.test, 95% of functions have <line of PC 0> == <start line>. According to bent, this is geomean +0.83% binary size vs master and -0.31% binary size vs 1.19. Note that //line directives can change the file and line numbers arbitrarily. The encoded start line is as adjusted by //line directives. Since this can change in the middle of a function, `line - start line` offset calculations may not be meaningful if //line directives are in use. For #55022. Change-Id: Iaabbc6dd4f85ffdda294266ef982ae838cc692f6 Reviewed-on: https://go-review.googlesource.com/c/go/+/429638 Run-TryBot: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-09-07 13:23:19 -04:00
startLine = fi.StartLine()
}
off := int64(startLocations[i])
// entryOff uint32 (offset of func entry PC from textStart)
entryOff := textOff(ctxt, s, textStart)
off = sb.SetUint32(ctxt.Arch, off, entryOff)
// nameOff int32
nameOff, ok := nameOffsets[s]
if !ok {
panic("couldn't find function name offset")
}
off = sb.SetUint32(ctxt.Arch, off, nameOff)
// args int32
// TODO: Move into funcinfo.
args := uint32(0)
if fi.Valid() {
args = uint32(fi.Args())
}
off = sb.SetUint32(ctxt.Arch, off, args)
// deferreturn
deferreturn := computeDeferReturn(ctxt, deferReturnSym, s)
off = sb.SetUint32(ctxt.Arch, off, deferreturn)
cmd/compile,link: generate PC-value tables with inlining information In order to generate accurate tracebacks, the runtime needs to know the inlined call stack for a given PC. This creates two tables per function for this purpose. The first table is the inlining tree (stored in the function's funcdata), which has a node containing the file, line, and function name for every inlined call. The second table is a PC-value table that maps each PC to a node in the inlining tree (or -1 if the PC is not the result of inlining). To give the appearance that inlining hasn't happened, the runtime also needs the original source position information of inlined AST nodes. Previously the compiler plastered over the line numbers of inlined AST nodes with the line number of the call. This meant that the PC-line table mapped each PC to line number of the outermost call in its inlined call stack, with no way to access the innermost line number. Now the compiler retains line numbers of inlined AST nodes and writes the innermost source position information to the PC-line and PC-file tables. Some tools and tests expect to see outermost line numbers, so we provide the OutermostLine function for displaying line info. To keep track of the inlined call stack for an AST node, we extend the src.PosBase type with an index into a global inlining tree. Every time the compiler inlines a call, it creates a node in the global inlining tree for the call, and writes its index to the PosBase of every inlined AST node. The parent of this node is the inlining tree index of the call. -1 signifies no parent. For each function, the compiler creates a local inlining tree and a PC-value table mapping each PC to an index in the local tree. These are written to an object file, which is read by the linker. The linker re-encodes these tables compactly by deduplicating function names and file names. This change increases the size of binaries by 4-5%. For example, this is how the go1 benchmark binary is impacted by this change: section old bytes new bytes delta .text 3.49M ± 0% 3.49M ± 0% +0.06% .rodata 1.12M ± 0% 1.21M ± 0% +8.21% .gopclntab 1.50M ± 0% 1.68M ± 0% +11.89% .debug_line 338k ± 0% 435k ± 0% +28.78% Total 9.21M ± 0% 9.58M ± 0% +4.01% Updates #19348. Change-Id: Ic4f180c3b516018138236b0c35e0218270d957d3 Reviewed-on: https://go-review.googlesource.com/37231 Run-TryBot: David Lazar <lazard@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2017-02-17 12:28:05 -05:00
// pcdata
if fi.Valid() {
off = sb.SetUint32(ctxt.Arch, off, uint32(ldr.SymValue(pcsp)))
off = sb.SetUint32(ctxt.Arch, off, uint32(ldr.SymValue(pcfile)))
off = sb.SetUint32(ctxt.Arch, off, uint32(ldr.SymValue(pcline)))
} else {
off += 12
}
off = sb.SetUint32(ctxt.Arch, off, numPCData(ldr, s, fi))
// Store the offset to compilation unit's file table.
cuIdx := ^uint32(0)
if cu := ldr.SymUnit(s); cu != nil {
cuIdx = cuOffsets[cu.PclnIndex]
}
off = sb.SetUint32(ctxt.Arch, off, cuIdx)
cmd/compile,cmd/link,runtime: add start line numbers to func metadata This adds the function "start line number" to runtime._func and runtime.inlinedCall objects. The "start line number" is the line number of the func keyword or TEXT directive for assembly. Subtracting the start line number from PC line number provides the relative line offset of a PC from the the start of the function. This helps with source stability by allowing code above the function to move without invalidating samples within the function. Encoding start line rather than relative lines directly is convenient because the pprof format already contains a start line field. This CL uses a straightforward encoding of explictly including a start line field in every _func and inlinedCall. It is possible that we could compress this further in the future. e.g., functions with a prologue usually have <line of PC 0> == <start line>. In runtime.test, 95% of functions have <line of PC 0> == <start line>. According to bent, this is geomean +0.83% binary size vs master and -0.31% binary size vs 1.19. Note that //line directives can change the file and line numbers arbitrarily. The encoded start line is as adjusted by //line directives. Since this can change in the middle of a function, `line - start line` offset calculations may not be meaningful if //line directives are in use. For #55022. Change-Id: Iaabbc6dd4f85ffdda294266ef982ae838cc692f6 Reviewed-on: https://go-review.googlesource.com/c/go/+/429638 Run-TryBot: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-09-07 13:23:19 -04:00
// startLine int32
off = sb.SetUint32(ctxt.Arch, off, uint32(startLine))
// funcID uint8
var funcID abi.FuncID
if fi.Valid() {
funcID = fi.FuncID()
}
off = sb.SetUint8(ctxt.Arch, off, uint8(funcID))
cmd/asm, cmd/link, runtime: introduce FuncInfo flag bits The runtime traceback code has its own definition of which functions mark the top frame of a stack, separate from the TOPFRAME bits that exist in the assembly and are passed along in DWARF information. It's error-prone and redundant to have two different sources of truth. This CL provides the actual TOPFRAME bits to the runtime, so that the runtime can use those bits instead of reinventing its own category. This CL also adds a new bit, SPWRITE, which marks functions that write directly to SP (anything but adding and subtracting constants). Such functions must stop a traceback, because the traceback has no way to rederive the SP on entry. Again, the runtime has its own definition which is mostly correct, but also missing some functions. During ordinary goroutine context switches, such functions do not appear on the stack, so the incompleteness in the runtime usually doesn't matter. But profiling signals can arrive at any moment, and the runtime may crash during traceback if it attempts to unwind an SP-writing frame and gets out-of-sync with the actual stack. The runtime contains code to try to detect likely candidates but again it is incomplete. Deriving the SPWRITE bit automatically from the actual assembly code provides the complete truth, and passing it to the runtime lets the runtime use it. This CL is part of a stack adding windows/arm64 support (#36439), intended to land in the Go 1.17 cycle. This CL is, however, not windows/arm64-specific. It is cleanup meant to make the port (and future ports) easier. Change-Id: I227f53b23ac5b3dabfcc5e8ee3f00df4e113cf58 Reviewed-on: https://go-review.googlesource.com/c/go/+/288800 Trust: Russ Cox <rsc@golang.org> Trust: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-01-28 15:21:33 -05:00
// flag uint8
var flag abi.FuncFlag
cmd/asm, cmd/link, runtime: introduce FuncInfo flag bits The runtime traceback code has its own definition of which functions mark the top frame of a stack, separate from the TOPFRAME bits that exist in the assembly and are passed along in DWARF information. It's error-prone and redundant to have two different sources of truth. This CL provides the actual TOPFRAME bits to the runtime, so that the runtime can use those bits instead of reinventing its own category. This CL also adds a new bit, SPWRITE, which marks functions that write directly to SP (anything but adding and subtracting constants). Such functions must stop a traceback, because the traceback has no way to rederive the SP on entry. Again, the runtime has its own definition which is mostly correct, but also missing some functions. During ordinary goroutine context switches, such functions do not appear on the stack, so the incompleteness in the runtime usually doesn't matter. But profiling signals can arrive at any moment, and the runtime may crash during traceback if it attempts to unwind an SP-writing frame and gets out-of-sync with the actual stack. The runtime contains code to try to detect likely candidates but again it is incomplete. Deriving the SPWRITE bit automatically from the actual assembly code provides the complete truth, and passing it to the runtime lets the runtime use it. This CL is part of a stack adding windows/arm64 support (#36439), intended to land in the Go 1.17 cycle. This CL is, however, not windows/arm64-specific. It is cleanup meant to make the port (and future ports) easier. Change-Id: I227f53b23ac5b3dabfcc5e8ee3f00df4e113cf58 Reviewed-on: https://go-review.googlesource.com/c/go/+/288800 Trust: Russ Cox <rsc@golang.org> Trust: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-01-28 15:21:33 -05:00
if fi.Valid() {
flag = fi.FuncFlag()
}
off = sb.SetUint8(ctxt.Arch, off, uint8(flag))
cmd/asm, cmd/link, runtime: introduce FuncInfo flag bits The runtime traceback code has its own definition of which functions mark the top frame of a stack, separate from the TOPFRAME bits that exist in the assembly and are passed along in DWARF information. It's error-prone and redundant to have two different sources of truth. This CL provides the actual TOPFRAME bits to the runtime, so that the runtime can use those bits instead of reinventing its own category. This CL also adds a new bit, SPWRITE, which marks functions that write directly to SP (anything but adding and subtracting constants). Such functions must stop a traceback, because the traceback has no way to rederive the SP on entry. Again, the runtime has its own definition which is mostly correct, but also missing some functions. During ordinary goroutine context switches, such functions do not appear on the stack, so the incompleteness in the runtime usually doesn't matter. But profiling signals can arrive at any moment, and the runtime may crash during traceback if it attempts to unwind an SP-writing frame and gets out-of-sync with the actual stack. The runtime contains code to try to detect likely candidates but again it is incomplete. Deriving the SPWRITE bit automatically from the actual assembly code provides the complete truth, and passing it to the runtime lets the runtime use it. This CL is part of a stack adding windows/arm64 support (#36439), intended to land in the Go 1.17 cycle. This CL is, however, not windows/arm64-specific. It is cleanup meant to make the port (and future ports) easier. Change-Id: I227f53b23ac5b3dabfcc5e8ee3f00df4e113cf58 Reviewed-on: https://go-review.googlesource.com/c/go/+/288800 Trust: Russ Cox <rsc@golang.org> Trust: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-01-28 15:21:33 -05:00
off += 1 // pad
// nfuncdata must be the final entry.
funcdata = funcData(ldr, s, fi, 0, funcdata)
off = sb.SetUint8(ctxt.Arch, off, uint8(len(funcdata)))
// Output the pcdata.
if fi.Valid() {
for j, pcSym := range pcdata {
sb.SetUint32(ctxt.Arch, off+int64(j*4), uint32(ldr.SymValue(pcSym)))
}
if fi.NumInlTree() > 0 {
sb.SetUint32(ctxt.Arch, off+abi.PCDATA_InlTreeIndex*4, uint32(ldr.SymValue(pcinline)))
}
}
cmd/link, runtime: convert FUNCDATA relocations to offsets Every function has associated numbered extra funcdata to another symbol. Prior to this change, a funcdata pointer was stored as a relocation. This change alters this to be an offset relative to go.func.* or go.funcrel.*. This reduces the number of relocations on darwin/arm64 by about 40%. It also shrinks externally linked binaries. On darwin/arm64: size before after Δ % addr2line 3788498 3699730 -88768 -2.343% api 5100018 4951074 -148944 -2.920% asm 4855234 4744274 -110960 -2.285% buildid 2500162 2419986 -80176 -3.207% cgo 4338258 4218306 -119952 -2.765% compile 22764418 22132226 -632192 -2.777% cover 4583186 4432770 -150416 -3.282% dist 3200962 3094626 -106336 -3.322% doc 3680402 3583602 -96800 -2.630% fix 3114914 3023922 -90992 -2.921% link 6308578 6154786 -153792 -2.438% nm 3754338 3665826 -88512 -2.358% objdump 4124738 4015234 -109504 -2.655% pack 2232626 2155010 -77616 -3.476% pprof 13497474 13044066 -453408 -3.359% test2json 2483810 2402146 -81664 -3.288% trace 10108898 9748802 -360096 -3.562% vet 6884322 6681314 -203008 -2.949% total 107320836 104167700 -3153136 -2.938% relocs before after Δ % addr2line 33357 25563 -7794 -23.365% api 31589 18409 -13180 -41.723% asm 27825 18904 -8921 -32.061% buildid 15603 9513 -6090 -39.031% cgo 27809 17103 -10706 -38.498% compile 114769 64829 -49940 -43.513% cover 32932 19462 -13470 -40.902% dist 18797 10796 -8001 -42.565% doc 22891 13503 -9388 -41.012% fix 19700 11465 -8235 -41.802% link 37324 23198 -14126 -37.847% nm 33226 25480 -7746 -23.313% objdump 35237 26610 -8627 -24.483% pack 13535 7951 -5584 -41.256% pprof 97986 63961 -34025 -34.724% test2json 15113 8735 -6378 -42.202% trace 66786 39636 -27150 -40.652% vet 43328 25971 -17357 -40.060% total 687806 431088 -256718 -37.324% It should also incrementally speed up binary launching and may reduce linker memory use. This is another step towards removing relocations so that pages that were previously dirtied by the loader may remain clean, which will offer memory savings useful in constrained environments like iOS. Removing the relocations in .stkobj symbols will allow some simplifications. There will be no references into go.funcrel.*, so we will no longer need to use the bottom bit to distinguish offset bases. Change-Id: I83d34c1701d6f3f515b9905941477d522441019d Reviewed-on: https://go-review.googlesource.com/c/go/+/352110 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-10-01 16:35:43 -07:00
// Write funcdata refs as offsets from go:func.* and go:funcrel.*.
cmd/link, runtime: convert FUNCDATA relocations to offsets Every function has associated numbered extra funcdata to another symbol. Prior to this change, a funcdata pointer was stored as a relocation. This change alters this to be an offset relative to go.func.* or go.funcrel.*. This reduces the number of relocations on darwin/arm64 by about 40%. It also shrinks externally linked binaries. On darwin/arm64: size before after Δ % addr2line 3788498 3699730 -88768 -2.343% api 5100018 4951074 -148944 -2.920% asm 4855234 4744274 -110960 -2.285% buildid 2500162 2419986 -80176 -3.207% cgo 4338258 4218306 -119952 -2.765% compile 22764418 22132226 -632192 -2.777% cover 4583186 4432770 -150416 -3.282% dist 3200962 3094626 -106336 -3.322% doc 3680402 3583602 -96800 -2.630% fix 3114914 3023922 -90992 -2.921% link 6308578 6154786 -153792 -2.438% nm 3754338 3665826 -88512 -2.358% objdump 4124738 4015234 -109504 -2.655% pack 2232626 2155010 -77616 -3.476% pprof 13497474 13044066 -453408 -3.359% test2json 2483810 2402146 -81664 -3.288% trace 10108898 9748802 -360096 -3.562% vet 6884322 6681314 -203008 -2.949% total 107320836 104167700 -3153136 -2.938% relocs before after Δ % addr2line 33357 25563 -7794 -23.365% api 31589 18409 -13180 -41.723% asm 27825 18904 -8921 -32.061% buildid 15603 9513 -6090 -39.031% cgo 27809 17103 -10706 -38.498% compile 114769 64829 -49940 -43.513% cover 32932 19462 -13470 -40.902% dist 18797 10796 -8001 -42.565% doc 22891 13503 -9388 -41.012% fix 19700 11465 -8235 -41.802% link 37324 23198 -14126 -37.847% nm 33226 25480 -7746 -23.313% objdump 35237 26610 -8627 -24.483% pack 13535 7951 -5584 -41.256% pprof 97986 63961 -34025 -34.724% test2json 15113 8735 -6378 -42.202% trace 66786 39636 -27150 -40.652% vet 43328 25971 -17357 -40.060% total 687806 431088 -256718 -37.324% It should also incrementally speed up binary launching and may reduce linker memory use. This is another step towards removing relocations so that pages that were previously dirtied by the loader may remain clean, which will offer memory savings useful in constrained environments like iOS. Removing the relocations in .stkobj symbols will allow some simplifications. There will be no references into go.funcrel.*, so we will no longer need to use the bottom bit to distinguish offset bases. Change-Id: I83d34c1701d6f3f515b9905941477d522441019d Reviewed-on: https://go-review.googlesource.com/c/go/+/352110 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-10-01 16:35:43 -07:00
funcdata = funcData(ldr, s, fi, inlSyms[s], funcdata)
// Missing funcdata will be ^0. See runtime/symtab.go:funcdata.
off = int64(startLocations[i] + funcSize + numPCData(ldr, s, fi)*4)
cmd/link, runtime: convert FUNCDATA relocations to offsets Every function has associated numbered extra funcdata to another symbol. Prior to this change, a funcdata pointer was stored as a relocation. This change alters this to be an offset relative to go.func.* or go.funcrel.*. This reduces the number of relocations on darwin/arm64 by about 40%. It also shrinks externally linked binaries. On darwin/arm64: size before after Δ % addr2line 3788498 3699730 -88768 -2.343% api 5100018 4951074 -148944 -2.920% asm 4855234 4744274 -110960 -2.285% buildid 2500162 2419986 -80176 -3.207% cgo 4338258 4218306 -119952 -2.765% compile 22764418 22132226 -632192 -2.777% cover 4583186 4432770 -150416 -3.282% dist 3200962 3094626 -106336 -3.322% doc 3680402 3583602 -96800 -2.630% fix 3114914 3023922 -90992 -2.921% link 6308578 6154786 -153792 -2.438% nm 3754338 3665826 -88512 -2.358% objdump 4124738 4015234 -109504 -2.655% pack 2232626 2155010 -77616 -3.476% pprof 13497474 13044066 -453408 -3.359% test2json 2483810 2402146 -81664 -3.288% trace 10108898 9748802 -360096 -3.562% vet 6884322 6681314 -203008 -2.949% total 107320836 104167700 -3153136 -2.938% relocs before after Δ % addr2line 33357 25563 -7794 -23.365% api 31589 18409 -13180 -41.723% asm 27825 18904 -8921 -32.061% buildid 15603 9513 -6090 -39.031% cgo 27809 17103 -10706 -38.498% compile 114769 64829 -49940 -43.513% cover 32932 19462 -13470 -40.902% dist 18797 10796 -8001 -42.565% doc 22891 13503 -9388 -41.012% fix 19700 11465 -8235 -41.802% link 37324 23198 -14126 -37.847% nm 33226 25480 -7746 -23.313% objdump 35237 26610 -8627 -24.483% pack 13535 7951 -5584 -41.256% pprof 97986 63961 -34025 -34.724% test2json 15113 8735 -6378 -42.202% trace 66786 39636 -27150 -40.652% vet 43328 25971 -17357 -40.060% total 687806 431088 -256718 -37.324% It should also incrementally speed up binary launching and may reduce linker memory use. This is another step towards removing relocations so that pages that were previously dirtied by the loader may remain clean, which will offer memory savings useful in constrained environments like iOS. Removing the relocations in .stkobj symbols will allow some simplifications. There will be no references into go.funcrel.*, so we will no longer need to use the bottom bit to distinguish offset bases. Change-Id: I83d34c1701d6f3f515b9905941477d522441019d Reviewed-on: https://go-review.googlesource.com/c/go/+/352110 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-10-01 16:35:43 -07:00
for j := range funcdata {
dataoff := off + int64(4*j)
cmd/link, runtime: convert FUNCDATA relocations to offsets Every function has associated numbered extra funcdata to another symbol. Prior to this change, a funcdata pointer was stored as a relocation. This change alters this to be an offset relative to go.func.* or go.funcrel.*. This reduces the number of relocations on darwin/arm64 by about 40%. It also shrinks externally linked binaries. On darwin/arm64: size before after Δ % addr2line 3788498 3699730 -88768 -2.343% api 5100018 4951074 -148944 -2.920% asm 4855234 4744274 -110960 -2.285% buildid 2500162 2419986 -80176 -3.207% cgo 4338258 4218306 -119952 -2.765% compile 22764418 22132226 -632192 -2.777% cover 4583186 4432770 -150416 -3.282% dist 3200962 3094626 -106336 -3.322% doc 3680402 3583602 -96800 -2.630% fix 3114914 3023922 -90992 -2.921% link 6308578 6154786 -153792 -2.438% nm 3754338 3665826 -88512 -2.358% objdump 4124738 4015234 -109504 -2.655% pack 2232626 2155010 -77616 -3.476% pprof 13497474 13044066 -453408 -3.359% test2json 2483810 2402146 -81664 -3.288% trace 10108898 9748802 -360096 -3.562% vet 6884322 6681314 -203008 -2.949% total 107320836 104167700 -3153136 -2.938% relocs before after Δ % addr2line 33357 25563 -7794 -23.365% api 31589 18409 -13180 -41.723% asm 27825 18904 -8921 -32.061% buildid 15603 9513 -6090 -39.031% cgo 27809 17103 -10706 -38.498% compile 114769 64829 -49940 -43.513% cover 32932 19462 -13470 -40.902% dist 18797 10796 -8001 -42.565% doc 22891 13503 -9388 -41.012% fix 19700 11465 -8235 -41.802% link 37324 23198 -14126 -37.847% nm 33226 25480 -7746 -23.313% objdump 35237 26610 -8627 -24.483% pack 13535 7951 -5584 -41.256% pprof 97986 63961 -34025 -34.724% test2json 15113 8735 -6378 -42.202% trace 66786 39636 -27150 -40.652% vet 43328 25971 -17357 -40.060% total 687806 431088 -256718 -37.324% It should also incrementally speed up binary launching and may reduce linker memory use. This is another step towards removing relocations so that pages that were previously dirtied by the loader may remain clean, which will offer memory savings useful in constrained environments like iOS. Removing the relocations in .stkobj symbols will allow some simplifications. There will be no references into go.funcrel.*, so we will no longer need to use the bottom bit to distinguish offset bases. Change-Id: I83d34c1701d6f3f515b9905941477d522441019d Reviewed-on: https://go-review.googlesource.com/c/go/+/352110 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-10-01 16:35:43 -07:00
fdsym := funcdata[j]
if ignoreFuncData(ldr, s, j, fdsym) {
sb.SetUint32(ctxt.Arch, dataoff, ^uint32(0)) // ^0 is a sentinel for "no value"
cmd/link, runtime: convert FUNCDATA relocations to offsets Every function has associated numbered extra funcdata to another symbol. Prior to this change, a funcdata pointer was stored as a relocation. This change alters this to be an offset relative to go.func.* or go.funcrel.*. This reduces the number of relocations on darwin/arm64 by about 40%. It also shrinks externally linked binaries. On darwin/arm64: size before after Δ % addr2line 3788498 3699730 -88768 -2.343% api 5100018 4951074 -148944 -2.920% asm 4855234 4744274 -110960 -2.285% buildid 2500162 2419986 -80176 -3.207% cgo 4338258 4218306 -119952 -2.765% compile 22764418 22132226 -632192 -2.777% cover 4583186 4432770 -150416 -3.282% dist 3200962 3094626 -106336 -3.322% doc 3680402 3583602 -96800 -2.630% fix 3114914 3023922 -90992 -2.921% link 6308578 6154786 -153792 -2.438% nm 3754338 3665826 -88512 -2.358% objdump 4124738 4015234 -109504 -2.655% pack 2232626 2155010 -77616 -3.476% pprof 13497474 13044066 -453408 -3.359% test2json 2483810 2402146 -81664 -3.288% trace 10108898 9748802 -360096 -3.562% vet 6884322 6681314 -203008 -2.949% total 107320836 104167700 -3153136 -2.938% relocs before after Δ % addr2line 33357 25563 -7794 -23.365% api 31589 18409 -13180 -41.723% asm 27825 18904 -8921 -32.061% buildid 15603 9513 -6090 -39.031% cgo 27809 17103 -10706 -38.498% compile 114769 64829 -49940 -43.513% cover 32932 19462 -13470 -40.902% dist 18797 10796 -8001 -42.565% doc 22891 13503 -9388 -41.012% fix 19700 11465 -8235 -41.802% link 37324 23198 -14126 -37.847% nm 33226 25480 -7746 -23.313% objdump 35237 26610 -8627 -24.483% pack 13535 7951 -5584 -41.256% pprof 97986 63961 -34025 -34.724% test2json 15113 8735 -6378 -42.202% trace 66786 39636 -27150 -40.652% vet 43328 25971 -17357 -40.060% total 687806 431088 -256718 -37.324% It should also incrementally speed up binary launching and may reduce linker memory use. This is another step towards removing relocations so that pages that were previously dirtied by the loader may remain clean, which will offer memory savings useful in constrained environments like iOS. Removing the relocations in .stkobj symbols will allow some simplifications. There will be no references into go.funcrel.*, so we will no longer need to use the bottom bit to distinguish offset bases. Change-Id: I83d34c1701d6f3f515b9905941477d522441019d Reviewed-on: https://go-review.googlesource.com/c/go/+/352110 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-10-01 16:35:43 -07:00
continue
}
sb.SetUint32(ctxt.Arch, dataoff, uint32(ldr.SymValue(fdsym)))
cmd/link, runtime: convert FUNCDATA relocations to offsets Every function has associated numbered extra funcdata to another symbol. Prior to this change, a funcdata pointer was stored as a relocation. This change alters this to be an offset relative to go.func.* or go.funcrel.*. This reduces the number of relocations on darwin/arm64 by about 40%. It also shrinks externally linked binaries. On darwin/arm64: size before after Δ % addr2line 3788498 3699730 -88768 -2.343% api 5100018 4951074 -148944 -2.920% asm 4855234 4744274 -110960 -2.285% buildid 2500162 2419986 -80176 -3.207% cgo 4338258 4218306 -119952 -2.765% compile 22764418 22132226 -632192 -2.777% cover 4583186 4432770 -150416 -3.282% dist 3200962 3094626 -106336 -3.322% doc 3680402 3583602 -96800 -2.630% fix 3114914 3023922 -90992 -2.921% link 6308578 6154786 -153792 -2.438% nm 3754338 3665826 -88512 -2.358% objdump 4124738 4015234 -109504 -2.655% pack 2232626 2155010 -77616 -3.476% pprof 13497474 13044066 -453408 -3.359% test2json 2483810 2402146 -81664 -3.288% trace 10108898 9748802 -360096 -3.562% vet 6884322 6681314 -203008 -2.949% total 107320836 104167700 -3153136 -2.938% relocs before after Δ % addr2line 33357 25563 -7794 -23.365% api 31589 18409 -13180 -41.723% asm 27825 18904 -8921 -32.061% buildid 15603 9513 -6090 -39.031% cgo 27809 17103 -10706 -38.498% compile 114769 64829 -49940 -43.513% cover 32932 19462 -13470 -40.902% dist 18797 10796 -8001 -42.565% doc 22891 13503 -9388 -41.012% fix 19700 11465 -8235 -41.802% link 37324 23198 -14126 -37.847% nm 33226 25480 -7746 -23.313% objdump 35237 26610 -8627 -24.483% pack 13535 7951 -5584 -41.256% pprof 97986 63961 -34025 -34.724% test2json 15113 8735 -6378 -42.202% trace 66786 39636 -27150 -40.652% vet 43328 25971 -17357 -40.060% total 687806 431088 -256718 -37.324% It should also incrementally speed up binary launching and may reduce linker memory use. This is another step towards removing relocations so that pages that were previously dirtied by the loader may remain clean, which will offer memory savings useful in constrained environments like iOS. Removing the relocations in .stkobj symbols will allow some simplifications. There will be no references into go.funcrel.*, so we will no longer need to use the bottom bit to distinguish offset bases. Change-Id: I83d34c1701d6f3f515b9905941477d522441019d Reviewed-on: https://go-review.googlesource.com/c/go/+/352110 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-10-01 16:35:43 -07:00
}
}
}
// pclntab initializes the pclntab symbol with
// runtime function and file name information.
// pclntab generates the pcln table for the link output.
func (ctxt *Link) pclntab(container loader.Bitmap) *pclntab {
// Go 1.2's symtab layout is documented in golang.org/s/go12symtab, but the
// layout and data has changed since that time.
//
// As of August 2020, here's the layout of pclntab:
//
// .gopclntab/__gopclntab [elf/macho section]
// runtime.pclntab
// Carrier symbol for the entire pclntab section.
//
// runtime.pcheader (see: runtime/symtab.go:pcHeader)
// 8-byte magic
// nfunc [thearch.ptrsize bytes]
// offset to runtime.funcnametab from the beginning of runtime.pcheader
// offset to runtime.pclntab_old from beginning of runtime.pcheader
//
// runtime.funcnametab
// []list of null terminated function names
//
// runtime.cutab
// for i=0..#CUs
// for j=0..#max used file index in CU[i]
// uint32 offset into runtime.filetab for the filename[j]
//
// runtime.filetab
// []null terminated filename strings
//
// runtime.pctab
// []byte of deduplicated pc data.
//
// runtime.functab
// function table, alternating PC and offset to func struct [each entry thearch.ptrsize bytes]
// end PC [thearch.ptrsize bytes]
// func structures, pcdata offsets, func data.
//
// runtime.funcdata
// []byte of deduplicated funcdata
state, compUnits, funcs := makePclntab(ctxt, container)
ldr := ctxt.loader
state.carrier = ldr.LookupOrCreateSym("runtime.pclntab", 0)
ldr.MakeSymbolUpdater(state.carrier).SetType(sym.SPCLNTAB)
ldr.SetAttrReachable(state.carrier, true)
setCarrierSym(sym.SPCLNTAB, state.carrier)
state.generatePCHeader(ctxt)
nameOffsets := state.generateFuncnametab(ctxt, funcs)
cuOffsets := state.generateFilenameTabs(ctxt, compUnits, funcs)
state.generatePctab(ctxt, funcs)
inlSyms := makeInlSyms(ctxt, funcs, nameOffsets)
state.generateFunctab(ctxt, funcs, inlSyms, cuOffsets, nameOffsets)
state.generateFuncdata(ctxt, funcs, inlSyms)
return state
}
func expandGoroot(s string) string {
const n = len("$GOROOT")
if len(s) >= n+1 && s[:n] == "$GOROOT" && (s[n] == '/' || s[n] == '\\') {
if final := buildcfg.GOROOT; final != "" {
return filepath.ToSlash(filepath.Join(final, s[n:]))
}
}
return s
}
const (
SUBBUCKETS = 16
SUBBUCKETSIZE = abi.FuncTabBucketSize / SUBBUCKETS
NOIDX = 0x7fffffff
)
// findfunctab generates a lookup table to quickly find the containing
// function for a pc. See src/runtime/symtab.go:findfunc for details.
func (ctxt *Link) findfunctab(state *pclntab, container loader.Bitmap) {
ldr := ctxt.loader
// find min and max address
min := ldr.SymValue(ctxt.Textp[0])
lastp := ctxt.Textp[len(ctxt.Textp)-1]
max := ldr.SymValue(lastp) + ldr.SymSize(lastp)
// for each subbucket, compute the minimum of all symbol indexes
// that map to that subbucket.
n := int32((max - min + SUBBUCKETSIZE - 1) / SUBBUCKETSIZE)
nbuckets := int32((max - min + abi.FuncTabBucketSize - 1) / abi.FuncTabBucketSize)
size := 4*int64(nbuckets) + int64(n)
writeFindFuncTab := func(_ *Link, s loader.Sym) {
t := ldr.MakeSymbolUpdater(s)
indexes := make([]int32, n)
for i := int32(0); i < n; i++ {
indexes[i] = NOIDX
}
idx := int32(0)
for i, s := range ctxt.Textp {
if !emitPcln(ctxt, s, container) {
continue
}
p := ldr.SymValue(s)
var e loader.Sym
i++
if i < len(ctxt.Textp) {
e = ctxt.Textp[i]
}
for e != 0 && !emitPcln(ctxt, e, container) && i < len(ctxt.Textp) {
e = ctxt.Textp[i]
i++
}
q := max
if e != 0 {
q = ldr.SymValue(e)
}
//fmt.Printf("%d: [%x %x] %s\n", idx, p, q, ldr.SymName(s))
for ; p < q; p += SUBBUCKETSIZE {
i = int((p - min) / SUBBUCKETSIZE)
if indexes[i] > idx {
indexes[i] = idx
}
}
i = int((q - 1 - min) / SUBBUCKETSIZE)
if indexes[i] > idx {
indexes[i] = idx
}
idx++
}
// fill in table
for i := int32(0); i < nbuckets; i++ {
base := indexes[i*SUBBUCKETS]
if base == NOIDX {
Errorf("hole in findfunctab")
}
t.SetUint32(ctxt.Arch, int64(i)*(4+SUBBUCKETS), uint32(base))
for j := int32(0); j < SUBBUCKETS && i*SUBBUCKETS+j < n; j++ {
idx = indexes[i*SUBBUCKETS+j]
if idx == NOIDX {
Errorf("hole in findfunctab")
}
if idx-base >= 256 {
Errorf("too many functions in a findfunc bucket! %d/%d %d %d", i, nbuckets, j, idx-base)
}
t.SetUint8(ctxt.Arch, int64(i)*(4+SUBBUCKETS)+4+int64(j), uint8(idx-base))
}
}
}
state.findfunctab = ctxt.createGeneratorSymbol("runtime.findfunctab", 0, sym.SRODATA, size, writeFindFuncTab)
ldr.SetAttrReachable(state.findfunctab, true)
ldr.SetAttrLocal(state.findfunctab, true)
}
// findContainerSyms returns a bitmap, indexed by symbol number, where there's
// a 1 for every container symbol.
func (ctxt *Link) findContainerSyms() loader.Bitmap {
ldr := ctxt.loader
container := loader.MakeBitmap(ldr.NSym())
// Find container symbols and mark them as such.
for _, s := range ctxt.Textp {
outer := ldr.OuterSym(s)
if outer != 0 {
container.Set(outer)
}
}
return container
}