go/src/cmd/compile/internal/gc/obj.go

643 lines
16 KiB
Go
Raw Normal View History

// Copyright 2009 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package gc
import (
"cmd/compile/internal/types"
"cmd/internal/bio"
"cmd/internal/obj"
"cmd/internal/objabi"
"cmd/internal/src"
"crypto/sha256"
"encoding/json"
"fmt"
[dev.regabi] cmd/compile: prep refactoring for switching to go/constant This CL replaces gc.Ctype (along with its CTINT, etc. constants) with constant.Kind; renames Val.Ctype to Val.Kind; and replaces a handful of abstraction-violating patterns that can be readily expressed differently. The next commit will actually replace Val with constant.Value. Passes toolstash-check. [git-generate] cd src/cmd/compile/internal/gc sed -i 's/type Ctype uint8/type Ctype = constant.Kind/' const.go goimports -w const.go rf ' inline -rm Ctype mv Val.Ctype Val.Kind ex import "go/constant"; \ CTxxx -> constant.Unknown; \ CTINT -> constant.Int; \ CTFLT -> constant.Float; \ CTCPLX -> constant.Complex; \ CTBOOL -> constant.Bool; \ CTSTR -> constant.String rm CTxxx CTINT CTFLT CTCPLX CTBOOL CTSTR ex import "cmd/compile/internal/types"; \ var t *types.Type; \ var v, v2 Val; \ v.U.(*Mpint).Cmp(maxintval[TINT]) > 0 -> doesoverflow(v, types.Types[TINT]); \ v.U.(*Mpint).Cmp(v2.U.(*Mpint)) > 0 -> compareOp(v, OGT, v2); \ maxintval[t.Etype].Cmp(maxintval[TUINT]) <= 0 -> t.Size() <= types.Types[TUINT].Size(); \ maxintval[t.Etype].Cmp(maxintval[TUINT]) > 0 -> t.Size() > types.Types[TUINT].Size(); ' go test cmd/compile -u Change-Id: I6c22ec0597508845f88eee639a0d76cbaa66d08f Reviewed-on: https://go-review.googlesource.com/c/go/+/272653 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org> Trust: Matthew Dempsky <mdempsky@google.com>
2020-11-23 13:42:43 -08:00
"go/constant"
"io"
"io/ioutil"
"os"
"sort"
"strconv"
)
// architecture-independent object file output
const ArhdrSize = 60
func formathdr(arhdr []byte, name string, size int64) {
copy(arhdr[:], fmt.Sprintf("%-16s%-12d%-6d%-6d%-8o%-10d`\n", name, 0, 0, 0, 0644, size))
}
cmd/compile: add -linkobj flag to allow writing object file in two parts This flag is experimental and the semantics may change even after Go 1.7 is released. There are no changes to code not using the flag. The first part is for reading by future compiles. The second part is for reading by the final link step. Splitting the file this way allows distributed build systems to ship the compile-input part only to compile steps and the linker-input part only to linker steps. The first part is basically just the export data, and the second part is basically everything else. The overall files still have the same broad structure, so that existing tools will work with both halves. It's just that various pieces are empty in the two halves. This also copies the two bits of data the linker needed from export data into the object header proper, so that the linker doesn't need any export data at all. That eliminates a TODO that was left for switching to the binary export data. (Now the linker doesn't need to know about the switch.) The default is still to write out a combined output file. Nothing changes unless you pass -linkobj to the compiler. There is no support in the go command for -linkobj, since the go command doesn't copy objects around. The expectation is that other build systems (like bazel, say) might take advantage of this. The header adjustment and the option for the split output was intended as part of the zip archives, but the zip archives have been cut from Go 1.7. Doing this to the current archives both unblocks one step in the switch to binary export data and enables alternate build systems to experiment with the new flag using the Go 1.7 release. Change-Id: I8b6eab25b8a22b0a266ba0ac6d31e594f3d117f3 Reviewed-on: https://go-review.googlesource.com/22500 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2016-04-26 21:50:59 -04:00
// These modes say which kind of object file to generate.
// The default use of the toolchain is to set both bits,
// generating a combined compiler+linker object, one that
// serves to describe the package to both the compiler and the linker.
// In fact the compiler and linker read nearly disjoint sections of
// that file, though, so in a distributed build setting it can be more
// efficient to split the output into two files, supplying the compiler
// object only to future compilations and the linker object only to
// future links.
//
// By default a combined object is written, but if -linkobj is specified
// on the command line then the default -o output is a compiler object
// and the -linkobj output is a linker object.
const (
modeCompilerObj = 1 << iota
modeLinkerObj
)
func dumpobj() {
cmd/compile: add -linkobj flag to allow writing object file in two parts This flag is experimental and the semantics may change even after Go 1.7 is released. There are no changes to code not using the flag. The first part is for reading by future compiles. The second part is for reading by the final link step. Splitting the file this way allows distributed build systems to ship the compile-input part only to compile steps and the linker-input part only to linker steps. The first part is basically just the export data, and the second part is basically everything else. The overall files still have the same broad structure, so that existing tools will work with both halves. It's just that various pieces are empty in the two halves. This also copies the two bits of data the linker needed from export data into the object header proper, so that the linker doesn't need any export data at all. That eliminates a TODO that was left for switching to the binary export data. (Now the linker doesn't need to know about the switch.) The default is still to write out a combined output file. Nothing changes unless you pass -linkobj to the compiler. There is no support in the go command for -linkobj, since the go command doesn't copy objects around. The expectation is that other build systems (like bazel, say) might take advantage of this. The header adjustment and the option for the split output was intended as part of the zip archives, but the zip archives have been cut from Go 1.7. Doing this to the current archives both unblocks one step in the switch to binary export data and enables alternate build systems to experiment with the new flag using the Go 1.7 release. Change-Id: I8b6eab25b8a22b0a266ba0ac6d31e594f3d117f3 Reviewed-on: https://go-review.googlesource.com/22500 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2016-04-26 21:50:59 -04:00
if linkobj == "" {
dumpobj1(outfile, modeCompilerObj|modeLinkerObj)
return
cmd/compile: add -linkobj flag to allow writing object file in two parts This flag is experimental and the semantics may change even after Go 1.7 is released. There are no changes to code not using the flag. The first part is for reading by future compiles. The second part is for reading by the final link step. Splitting the file this way allows distributed build systems to ship the compile-input part only to compile steps and the linker-input part only to linker steps. The first part is basically just the export data, and the second part is basically everything else. The overall files still have the same broad structure, so that existing tools will work with both halves. It's just that various pieces are empty in the two halves. This also copies the two bits of data the linker needed from export data into the object header proper, so that the linker doesn't need any export data at all. That eliminates a TODO that was left for switching to the binary export data. (Now the linker doesn't need to know about the switch.) The default is still to write out a combined output file. Nothing changes unless you pass -linkobj to the compiler. There is no support in the go command for -linkobj, since the go command doesn't copy objects around. The expectation is that other build systems (like bazel, say) might take advantage of this. The header adjustment and the option for the split output was intended as part of the zip archives, but the zip archives have been cut from Go 1.7. Doing this to the current archives both unblocks one step in the switch to binary export data and enables alternate build systems to experiment with the new flag using the Go 1.7 release. Change-Id: I8b6eab25b8a22b0a266ba0ac6d31e594f3d117f3 Reviewed-on: https://go-review.googlesource.com/22500 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2016-04-26 21:50:59 -04:00
}
dumpobj1(outfile, modeCompilerObj)
dumpobj1(linkobj, modeLinkerObj)
cmd/compile: add -linkobj flag to allow writing object file in two parts This flag is experimental and the semantics may change even after Go 1.7 is released. There are no changes to code not using the flag. The first part is for reading by future compiles. The second part is for reading by the final link step. Splitting the file this way allows distributed build systems to ship the compile-input part only to compile steps and the linker-input part only to linker steps. The first part is basically just the export data, and the second part is basically everything else. The overall files still have the same broad structure, so that existing tools will work with both halves. It's just that various pieces are empty in the two halves. This also copies the two bits of data the linker needed from export data into the object header proper, so that the linker doesn't need any export data at all. That eliminates a TODO that was left for switching to the binary export data. (Now the linker doesn't need to know about the switch.) The default is still to write out a combined output file. Nothing changes unless you pass -linkobj to the compiler. There is no support in the go command for -linkobj, since the go command doesn't copy objects around. The expectation is that other build systems (like bazel, say) might take advantage of this. The header adjustment and the option for the split output was intended as part of the zip archives, but the zip archives have been cut from Go 1.7. Doing this to the current archives both unblocks one step in the switch to binary export data and enables alternate build systems to experiment with the new flag using the Go 1.7 release. Change-Id: I8b6eab25b8a22b0a266ba0ac6d31e594f3d117f3 Reviewed-on: https://go-review.googlesource.com/22500 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2016-04-26 21:50:59 -04:00
}
func dumpobj1(outfile string, mode int) {
bout, err := bio.Create(outfile)
if err != nil {
flusherrors()
fmt.Printf("can't create %s: %v\n", outfile, err)
errorexit()
}
defer bout.Close()
bout.WriteString("!<arch>\n")
if mode&modeCompilerObj != 0 {
start := startArchiveEntry(bout)
dumpCompilerObj(bout)
finishArchiveEntry(bout, start, "__.PKGDEF")
}
if mode&modeLinkerObj != 0 {
start := startArchiveEntry(bout)
dumpLinkerObj(bout)
finishArchiveEntry(bout, start, "_go_.o")
}
}
func printObjHeader(bout *bio.Writer) {
fmt.Fprintf(bout, "go object %s %s %s %s\n", objabi.GOOS, objabi.GOARCH, objabi.Version, objabi.Expstring())
if buildid != "" {
fmt.Fprintf(bout, "build id %q\n", buildid)
}
if localpkg.Name == "main" {
fmt.Fprintf(bout, "main\n")
cmd/compile: add -linkobj flag to allow writing object file in two parts This flag is experimental and the semantics may change even after Go 1.7 is released. There are no changes to code not using the flag. The first part is for reading by future compiles. The second part is for reading by the final link step. Splitting the file this way allows distributed build systems to ship the compile-input part only to compile steps and the linker-input part only to linker steps. The first part is basically just the export data, and the second part is basically everything else. The overall files still have the same broad structure, so that existing tools will work with both halves. It's just that various pieces are empty in the two halves. This also copies the two bits of data the linker needed from export data into the object header proper, so that the linker doesn't need any export data at all. That eliminates a TODO that was left for switching to the binary export data. (Now the linker doesn't need to know about the switch.) The default is still to write out a combined output file. Nothing changes unless you pass -linkobj to the compiler. There is no support in the go command for -linkobj, since the go command doesn't copy objects around. The expectation is that other build systems (like bazel, say) might take advantage of this. The header adjustment and the option for the split output was intended as part of the zip archives, but the zip archives have been cut from Go 1.7. Doing this to the current archives both unblocks one step in the switch to binary export data and enables alternate build systems to experiment with the new flag using the Go 1.7 release. Change-Id: I8b6eab25b8a22b0a266ba0ac6d31e594f3d117f3 Reviewed-on: https://go-review.googlesource.com/22500 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2016-04-26 21:50:59 -04:00
}
fmt.Fprintf(bout, "\n") // header ends with blank line
}
cmd/compile: add -linkobj flag to allow writing object file in two parts This flag is experimental and the semantics may change even after Go 1.7 is released. There are no changes to code not using the flag. The first part is for reading by future compiles. The second part is for reading by the final link step. Splitting the file this way allows distributed build systems to ship the compile-input part only to compile steps and the linker-input part only to linker steps. The first part is basically just the export data, and the second part is basically everything else. The overall files still have the same broad structure, so that existing tools will work with both halves. It's just that various pieces are empty in the two halves. This also copies the two bits of data the linker needed from export data into the object header proper, so that the linker doesn't need any export data at all. That eliminates a TODO that was left for switching to the binary export data. (Now the linker doesn't need to know about the switch.) The default is still to write out a combined output file. Nothing changes unless you pass -linkobj to the compiler. There is no support in the go command for -linkobj, since the go command doesn't copy objects around. The expectation is that other build systems (like bazel, say) might take advantage of this. The header adjustment and the option for the split output was intended as part of the zip archives, but the zip archives have been cut from Go 1.7. Doing this to the current archives both unblocks one step in the switch to binary export data and enables alternate build systems to experiment with the new flag using the Go 1.7 release. Change-Id: I8b6eab25b8a22b0a266ba0ac6d31e594f3d117f3 Reviewed-on: https://go-review.googlesource.com/22500 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2016-04-26 21:50:59 -04:00
func startArchiveEntry(bout *bio.Writer) int64 {
var arhdr [ArhdrSize]byte
bout.Write(arhdr[:])
return bout.Offset()
}
cmd/compile: add -linkobj flag to allow writing object file in two parts This flag is experimental and the semantics may change even after Go 1.7 is released. There are no changes to code not using the flag. The first part is for reading by future compiles. The second part is for reading by the final link step. Splitting the file this way allows distributed build systems to ship the compile-input part only to compile steps and the linker-input part only to linker steps. The first part is basically just the export data, and the second part is basically everything else. The overall files still have the same broad structure, so that existing tools will work with both halves. It's just that various pieces are empty in the two halves. This also copies the two bits of data the linker needed from export data into the object header proper, so that the linker doesn't need any export data at all. That eliminates a TODO that was left for switching to the binary export data. (Now the linker doesn't need to know about the switch.) The default is still to write out a combined output file. Nothing changes unless you pass -linkobj to the compiler. There is no support in the go command for -linkobj, since the go command doesn't copy objects around. The expectation is that other build systems (like bazel, say) might take advantage of this. The header adjustment and the option for the split output was intended as part of the zip archives, but the zip archives have been cut from Go 1.7. Doing this to the current archives both unblocks one step in the switch to binary export data and enables alternate build systems to experiment with the new flag using the Go 1.7 release. Change-Id: I8b6eab25b8a22b0a266ba0ac6d31e594f3d117f3 Reviewed-on: https://go-review.googlesource.com/22500 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2016-04-26 21:50:59 -04:00
func finishArchiveEntry(bout *bio.Writer, start int64, name string) {
bout.Flush()
size := bout.Offset() - start
if size&1 != 0 {
bout.WriteByte(0)
cmd/compile: add -linkobj flag to allow writing object file in two parts This flag is experimental and the semantics may change even after Go 1.7 is released. There are no changes to code not using the flag. The first part is for reading by future compiles. The second part is for reading by the final link step. Splitting the file this way allows distributed build systems to ship the compile-input part only to compile steps and the linker-input part only to linker steps. The first part is basically just the export data, and the second part is basically everything else. The overall files still have the same broad structure, so that existing tools will work with both halves. It's just that various pieces are empty in the two halves. This also copies the two bits of data the linker needed from export data into the object header proper, so that the linker doesn't need any export data at all. That eliminates a TODO that was left for switching to the binary export data. (Now the linker doesn't need to know about the switch.) The default is still to write out a combined output file. Nothing changes unless you pass -linkobj to the compiler. There is no support in the go command for -linkobj, since the go command doesn't copy objects around. The expectation is that other build systems (like bazel, say) might take advantage of this. The header adjustment and the option for the split output was intended as part of the zip archives, but the zip archives have been cut from Go 1.7. Doing this to the current archives both unblocks one step in the switch to binary export data and enables alternate build systems to experiment with the new flag using the Go 1.7 release. Change-Id: I8b6eab25b8a22b0a266ba0ac6d31e594f3d117f3 Reviewed-on: https://go-review.googlesource.com/22500 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2016-04-26 21:50:59 -04:00
}
bout.MustSeek(start-ArhdrSize, 0)
var arhdr [ArhdrSize]byte
formathdr(arhdr[:], name, size)
bout.Write(arhdr[:])
bout.Flush()
bout.MustSeek(start+size+(size&1), 0)
}
cmd/compile: add -linkobj flag to allow writing object file in two parts This flag is experimental and the semantics may change even after Go 1.7 is released. There are no changes to code not using the flag. The first part is for reading by future compiles. The second part is for reading by the final link step. Splitting the file this way allows distributed build systems to ship the compile-input part only to compile steps and the linker-input part only to linker steps. The first part is basically just the export data, and the second part is basically everything else. The overall files still have the same broad structure, so that existing tools will work with both halves. It's just that various pieces are empty in the two halves. This also copies the two bits of data the linker needed from export data into the object header proper, so that the linker doesn't need any export data at all. That eliminates a TODO that was left for switching to the binary export data. (Now the linker doesn't need to know about the switch.) The default is still to write out a combined output file. Nothing changes unless you pass -linkobj to the compiler. There is no support in the go command for -linkobj, since the go command doesn't copy objects around. The expectation is that other build systems (like bazel, say) might take advantage of this. The header adjustment and the option for the split output was intended as part of the zip archives, but the zip archives have been cut from Go 1.7. Doing this to the current archives both unblocks one step in the switch to binary export data and enables alternate build systems to experiment with the new flag using the Go 1.7 release. Change-Id: I8b6eab25b8a22b0a266ba0ac6d31e594f3d117f3 Reviewed-on: https://go-review.googlesource.com/22500 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2016-04-26 21:50:59 -04:00
func dumpCompilerObj(bout *bio.Writer) {
printObjHeader(bout)
dumpexport(bout)
}
cmd/compile: add -linkobj flag to allow writing object file in two parts This flag is experimental and the semantics may change even after Go 1.7 is released. There are no changes to code not using the flag. The first part is for reading by future compiles. The second part is for reading by the final link step. Splitting the file this way allows distributed build systems to ship the compile-input part only to compile steps and the linker-input part only to linker steps. The first part is basically just the export data, and the second part is basically everything else. The overall files still have the same broad structure, so that existing tools will work with both halves. It's just that various pieces are empty in the two halves. This also copies the two bits of data the linker needed from export data into the object header proper, so that the linker doesn't need any export data at all. That eliminates a TODO that was left for switching to the binary export data. (Now the linker doesn't need to know about the switch.) The default is still to write out a combined output file. Nothing changes unless you pass -linkobj to the compiler. There is no support in the go command for -linkobj, since the go command doesn't copy objects around. The expectation is that other build systems (like bazel, say) might take advantage of this. The header adjustment and the option for the split output was intended as part of the zip archives, but the zip archives have been cut from Go 1.7. Doing this to the current archives both unblocks one step in the switch to binary export data and enables alternate build systems to experiment with the new flag using the Go 1.7 release. Change-Id: I8b6eab25b8a22b0a266ba0ac6d31e594f3d117f3 Reviewed-on: https://go-review.googlesource.com/22500 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2016-04-26 21:50:59 -04:00
func dumpdata() {
externs := len(externdcl)
xtops := len(xtop)
dumpglobls()
addptabs()
exportlistLen := len(exportlist)
addsignats(externdcl)
dumpsignats()
dumptabs()
ptabsLen := len(ptabs)
itabsLen := len(itabs)
dumpimportstrings()
dumpbasictypes()
dumpembeds()
// Calls to dumpsignats can generate functions,
// like method wrappers and hash and equality routines.
// Compile any generated functions, process any new resulting types, repeat.
// This can't loop forever, because there is no way to generate an infinite
// number of types in a finite amount of code.
// In the typical case, we loop 0 or 1 times.
// It was not until issue 24761 that we found any code that required a loop at all.
for {
for i := xtops; i < len(xtop); i++ {
n := xtop[i]
if n.Op == ODCLFUNC {
funccompile(n)
}
}
xtops = len(xtop)
compileFunctions()
dumpsignats()
if xtops == len(xtop) {
break
}
}
// Dump extra globals.
tmp := externdcl
if externdcl != nil {
externdcl = externdcl[externs:]
}
dumpglobls()
externdcl = tmp
if zerosize > 0 {
zero := mappkg.Lookup("zero")
ggloblsym(zero.Linksym(), int32(zerosize), obj.DUPOK|obj.RODATA)
}
cmd/internal/obj: rework gclocals handling The compiler handled gcargs and gclocals LSyms unusually. It generated placeholder symbols (makefuncdatasym), filled them in, and then renamed them for content-addressability. This is an important binary size optimization; the same locals information occurs over and over. This CL continues to treat these LSyms unusually, but in a slightly more explicit way, and importantly for concurrent compilation, in a way that does not require concurrent modification of Ctxt.Hash. Instead of creating gcargs and gclocals in the usual way, by creating a types.Sym and then an obj.LSym, we add them directly to obj.FuncInfo, initialize them in obj.InitTextSym, and deduplicate and add them to ctxt.Data at the end. Then the backend's job is simply to fill them in and rename them appropriately. Updates #15756 name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.7MB ± 0% -0.22% (p=0.016 n=5+5) Unicode 29.8MB ± 0% 29.8MB ± 0% ~ (p=0.690 n=5+5) GoTypes 113MB ± 0% 113MB ± 0% -0.24% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.24GB ± 0% -0.39% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.2MB ± 0% -0.43% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 31.7MB ± 0% -0.22% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 77.6MB ± 0% -0.80% (p=0.008 n=5+5) Tar 26.6MB ± 0% 26.3MB ± 0% -0.85% (p=0.008 n=5+5) XML 42.4MB ± 0% 41.9MB ± 0% -1.04% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 378k ± 0% 377k ± 1% ~ (p=0.151 n=5+5) Unicode 321k ± 1% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% -0.47% (p=0.016 n=5+5) SSA 9.71M ± 0% 9.67M ± 0% -0.33% (p=0.008 n=5+5) Flate 233k ± 1% 232k ± 1% ~ (p=0.151 n=5+5) GoParser 316k ± 0% 315k ± 0% -0.49% (p=0.016 n=5+5) Reflect 979k ± 0% 972k ± 0% -0.75% (p=0.008 n=5+5) Tar 250k ± 0% 247k ± 1% -0.92% (p=0.008 n=5+5) XML 392k ± 1% 389k ± 0% -0.67% (p=0.008 n=5+5) Change-Id: Idc36186ca9d2f8214b5f7720bbc27b6bb22fdc48 Reviewed-on: https://go-review.googlesource.com/40697 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-14 06:35:53 -07:00
addGCLocals()
if exportlistLen != len(exportlist) {
Fatalf("exportlist changed after compile functions loop")
}
if ptabsLen != len(ptabs) {
Fatalf("ptabs changed after compile functions loop")
}
if itabsLen != len(itabs) {
Fatalf("itabs changed after compile functions loop")
}
}
func dumpLinkerObj(bout *bio.Writer) {
printObjHeader(bout)
if len(pragcgobuf) != 0 {
// write empty export section; must be before cgo section
fmt.Fprintf(bout, "\n$$\n\n$$\n\n")
fmt.Fprintf(bout, "\n$$ // cgo\n")
if err := json.NewEncoder(bout).Encode(pragcgobuf); err != nil {
Fatalf("serializing pragcgobuf: %v", err)
}
fmt.Fprintf(bout, "\n$$\n\n")
}
fmt.Fprintf(bout, "\n!\n")
cmd/internal/obj: rework gclocals handling The compiler handled gcargs and gclocals LSyms unusually. It generated placeholder symbols (makefuncdatasym), filled them in, and then renamed them for content-addressability. This is an important binary size optimization; the same locals information occurs over and over. This CL continues to treat these LSyms unusually, but in a slightly more explicit way, and importantly for concurrent compilation, in a way that does not require concurrent modification of Ctxt.Hash. Instead of creating gcargs and gclocals in the usual way, by creating a types.Sym and then an obj.LSym, we add them directly to obj.FuncInfo, initialize them in obj.InitTextSym, and deduplicate and add them to ctxt.Data at the end. Then the backend's job is simply to fill them in and rename them appropriately. Updates #15756 name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.7MB ± 0% -0.22% (p=0.016 n=5+5) Unicode 29.8MB ± 0% 29.8MB ± 0% ~ (p=0.690 n=5+5) GoTypes 113MB ± 0% 113MB ± 0% -0.24% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.24GB ± 0% -0.39% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.2MB ± 0% -0.43% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 31.7MB ± 0% -0.22% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 77.6MB ± 0% -0.80% (p=0.008 n=5+5) Tar 26.6MB ± 0% 26.3MB ± 0% -0.85% (p=0.008 n=5+5) XML 42.4MB ± 0% 41.9MB ± 0% -1.04% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 378k ± 0% 377k ± 1% ~ (p=0.151 n=5+5) Unicode 321k ± 1% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% -0.47% (p=0.016 n=5+5) SSA 9.71M ± 0% 9.67M ± 0% -0.33% (p=0.008 n=5+5) Flate 233k ± 1% 232k ± 1% ~ (p=0.151 n=5+5) GoParser 316k ± 0% 315k ± 0% -0.49% (p=0.016 n=5+5) Reflect 979k ± 0% 972k ± 0% -0.75% (p=0.008 n=5+5) Tar 250k ± 0% 247k ± 1% -0.92% (p=0.008 n=5+5) XML 392k ± 1% 389k ± 0% -0.67% (p=0.008 n=5+5) Change-Id: Idc36186ca9d2f8214b5f7720bbc27b6bb22fdc48 Reviewed-on: https://go-review.googlesource.com/40697 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-14 06:35:53 -07:00
obj.WriteObjFile(Ctxt, bout)
}
func addptabs() {
if !Ctxt.Flag_dynlink || localpkg.Name != "main" {
return
}
for _, exportn := range exportlist {
s := exportn.Sym
n := asNode(s.Def)
if n == nil {
continue
}
if n.Op != ONAME {
continue
}
if !types.IsExported(s.Name) {
continue
}
if s.Pkg.Name != "main" {
continue
}
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if n.Type.Etype == TFUNC && n.Class() == PFUNC {
// function
ptabs = append(ptabs, ptabEntry{s: s, t: asNode(s.Def).Type})
} else {
// variable
ptabs = append(ptabs, ptabEntry{s: s, t: types.NewPtr(asNode(s.Def).Type)})
}
}
}
func dumpGlobal(n *Node) {
if n.Type == nil {
Fatalf("external %v nil type\n", n)
}
if n.Class() == PFUNC {
return
}
if n.Sym.Pkg != localpkg {
return
}
dowidth(n.Type)
ggloblnod(n)
}
func dumpGlobalConst(n *Node) {
// only export typed constants
t := n.Type
if t == nil {
return
}
if n.Sym.Pkg != localpkg {
return
}
// only export integer constants for now
switch t.Etype {
case TINT8:
case TINT16:
case TINT32:
case TINT64:
case TINT:
case TUINT8:
case TUINT16:
case TUINT32:
case TUINT64:
case TUINT:
case TUINTPTR:
// ok
case TIDEAL:
[dev.regabi] cmd/compile: prep refactoring for switching to go/constant This CL replaces gc.Ctype (along with its CTINT, etc. constants) with constant.Kind; renames Val.Ctype to Val.Kind; and replaces a handful of abstraction-violating patterns that can be readily expressed differently. The next commit will actually replace Val with constant.Value. Passes toolstash-check. [git-generate] cd src/cmd/compile/internal/gc sed -i 's/type Ctype uint8/type Ctype = constant.Kind/' const.go goimports -w const.go rf ' inline -rm Ctype mv Val.Ctype Val.Kind ex import "go/constant"; \ CTxxx -> constant.Unknown; \ CTINT -> constant.Int; \ CTFLT -> constant.Float; \ CTCPLX -> constant.Complex; \ CTBOOL -> constant.Bool; \ CTSTR -> constant.String rm CTxxx CTINT CTFLT CTCPLX CTBOOL CTSTR ex import "cmd/compile/internal/types"; \ var t *types.Type; \ var v, v2 Val; \ v.U.(*Mpint).Cmp(maxintval[TINT]) > 0 -> doesoverflow(v, types.Types[TINT]); \ v.U.(*Mpint).Cmp(v2.U.(*Mpint)) > 0 -> compareOp(v, OGT, v2); \ maxintval[t.Etype].Cmp(maxintval[TUINT]) <= 0 -> t.Size() <= types.Types[TUINT].Size(); \ maxintval[t.Etype].Cmp(maxintval[TUINT]) > 0 -> t.Size() > types.Types[TUINT].Size(); ' go test cmd/compile -u Change-Id: I6c22ec0597508845f88eee639a0d76cbaa66d08f Reviewed-on: https://go-review.googlesource.com/c/go/+/272653 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org> Trust: Matthew Dempsky <mdempsky@google.com>
2020-11-23 13:42:43 -08:00
if !Isconst(n, constant.Int) {
return
}
x := n.Val().U.(*Mpint)
if x.Cmp(minintval[TINT]) < 0 || x.Cmp(maxintval[TINT]) > 0 {
return
}
// Ideal integers we export as int (if they fit).
t = types.Types[TINT]
default:
return
}
Ctxt.DwarfIntConst(myimportpath, n.Sym.Name, typesymname(t), n.Int64Val())
}
func dumpglobls() {
// add globals
for _, n := range externdcl {
switch n.Op {
case ONAME:
dumpGlobal(n)
case OLITERAL:
dumpGlobalConst(n)
}
}
sort.Slice(funcsyms, func(i, j int) bool {
cmd/compile: add initial backend concurrency support This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-19 08:27:26 -07:00
return funcsyms[i].LinksymName() < funcsyms[j].LinksymName()
})
for _, s := range funcsyms {
sf := s.Pkg.Lookup(funcsymname(s)).Linksym()
dsymptr(sf, 0, s.Linksym(), 0)
ggloblsym(sf, int32(Widthptr), obj.DUPOK|obj.RODATA)
}
// Do not reprocess funcsyms on next dumpglobls call.
funcsyms = nil
}
// addGCLocals adds gcargs, gclocals, gcregs, and stack object symbols to Ctxt.Data.
//
// This is done during the sequential phase after compilation, since
// global symbols can't be declared during parallel compilation.
cmd/internal/obj: rework gclocals handling The compiler handled gcargs and gclocals LSyms unusually. It generated placeholder symbols (makefuncdatasym), filled them in, and then renamed them for content-addressability. This is an important binary size optimization; the same locals information occurs over and over. This CL continues to treat these LSyms unusually, but in a slightly more explicit way, and importantly for concurrent compilation, in a way that does not require concurrent modification of Ctxt.Hash. Instead of creating gcargs and gclocals in the usual way, by creating a types.Sym and then an obj.LSym, we add them directly to obj.FuncInfo, initialize them in obj.InitTextSym, and deduplicate and add them to ctxt.Data at the end. Then the backend's job is simply to fill them in and rename them appropriately. Updates #15756 name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.7MB ± 0% -0.22% (p=0.016 n=5+5) Unicode 29.8MB ± 0% 29.8MB ± 0% ~ (p=0.690 n=5+5) GoTypes 113MB ± 0% 113MB ± 0% -0.24% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.24GB ± 0% -0.39% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.2MB ± 0% -0.43% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 31.7MB ± 0% -0.22% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 77.6MB ± 0% -0.80% (p=0.008 n=5+5) Tar 26.6MB ± 0% 26.3MB ± 0% -0.85% (p=0.008 n=5+5) XML 42.4MB ± 0% 41.9MB ± 0% -1.04% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 378k ± 0% 377k ± 1% ~ (p=0.151 n=5+5) Unicode 321k ± 1% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% -0.47% (p=0.016 n=5+5) SSA 9.71M ± 0% 9.67M ± 0% -0.33% (p=0.008 n=5+5) Flate 233k ± 1% 232k ± 1% ~ (p=0.151 n=5+5) GoParser 316k ± 0% 315k ± 0% -0.49% (p=0.016 n=5+5) Reflect 979k ± 0% 972k ± 0% -0.75% (p=0.008 n=5+5) Tar 250k ± 0% 247k ± 1% -0.92% (p=0.008 n=5+5) XML 392k ± 1% 389k ± 0% -0.67% (p=0.008 n=5+5) Change-Id: Idc36186ca9d2f8214b5f7720bbc27b6bb22fdc48 Reviewed-on: https://go-review.googlesource.com/40697 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-14 06:35:53 -07:00
func addGCLocals() {
for _, s := range Ctxt.Text {
fn := s.Func()
if fn == nil {
cmd/internal/obj: rework gclocals handling The compiler handled gcargs and gclocals LSyms unusually. It generated placeholder symbols (makefuncdatasym), filled them in, and then renamed them for content-addressability. This is an important binary size optimization; the same locals information occurs over and over. This CL continues to treat these LSyms unusually, but in a slightly more explicit way, and importantly for concurrent compilation, in a way that does not require concurrent modification of Ctxt.Hash. Instead of creating gcargs and gclocals in the usual way, by creating a types.Sym and then an obj.LSym, we add them directly to obj.FuncInfo, initialize them in obj.InitTextSym, and deduplicate and add them to ctxt.Data at the end. Then the backend's job is simply to fill them in and rename them appropriately. Updates #15756 name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.7MB ± 0% -0.22% (p=0.016 n=5+5) Unicode 29.8MB ± 0% 29.8MB ± 0% ~ (p=0.690 n=5+5) GoTypes 113MB ± 0% 113MB ± 0% -0.24% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.24GB ± 0% -0.39% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.2MB ± 0% -0.43% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 31.7MB ± 0% -0.22% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 77.6MB ± 0% -0.80% (p=0.008 n=5+5) Tar 26.6MB ± 0% 26.3MB ± 0% -0.85% (p=0.008 n=5+5) XML 42.4MB ± 0% 41.9MB ± 0% -1.04% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 378k ± 0% 377k ± 1% ~ (p=0.151 n=5+5) Unicode 321k ± 1% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% -0.47% (p=0.016 n=5+5) SSA 9.71M ± 0% 9.67M ± 0% -0.33% (p=0.008 n=5+5) Flate 233k ± 1% 232k ± 1% ~ (p=0.151 n=5+5) GoParser 316k ± 0% 315k ± 0% -0.49% (p=0.016 n=5+5) Reflect 979k ± 0% 972k ± 0% -0.75% (p=0.008 n=5+5) Tar 250k ± 0% 247k ± 1% -0.92% (p=0.008 n=5+5) XML 392k ± 1% 389k ± 0% -0.67% (p=0.008 n=5+5) Change-Id: Idc36186ca9d2f8214b5f7720bbc27b6bb22fdc48 Reviewed-on: https://go-review.googlesource.com/40697 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-14 06:35:53 -07:00
continue
}
for _, gcsym := range []*obj.LSym{fn.GCArgs, fn.GCLocals} {
if gcsym != nil && !gcsym.OnList() {
ggloblsym(gcsym, int32(len(gcsym.P)), obj.RODATA|obj.DUPOK)
cmd/internal/obj: rework gclocals handling The compiler handled gcargs and gclocals LSyms unusually. It generated placeholder symbols (makefuncdatasym), filled them in, and then renamed them for content-addressability. This is an important binary size optimization; the same locals information occurs over and over. This CL continues to treat these LSyms unusually, but in a slightly more explicit way, and importantly for concurrent compilation, in a way that does not require concurrent modification of Ctxt.Hash. Instead of creating gcargs and gclocals in the usual way, by creating a types.Sym and then an obj.LSym, we add them directly to obj.FuncInfo, initialize them in obj.InitTextSym, and deduplicate and add them to ctxt.Data at the end. Then the backend's job is simply to fill them in and rename them appropriately. Updates #15756 name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.7MB ± 0% -0.22% (p=0.016 n=5+5) Unicode 29.8MB ± 0% 29.8MB ± 0% ~ (p=0.690 n=5+5) GoTypes 113MB ± 0% 113MB ± 0% -0.24% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.24GB ± 0% -0.39% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.2MB ± 0% -0.43% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 31.7MB ± 0% -0.22% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 77.6MB ± 0% -0.80% (p=0.008 n=5+5) Tar 26.6MB ± 0% 26.3MB ± 0% -0.85% (p=0.008 n=5+5) XML 42.4MB ± 0% 41.9MB ± 0% -1.04% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 378k ± 0% 377k ± 1% ~ (p=0.151 n=5+5) Unicode 321k ± 1% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% -0.47% (p=0.016 n=5+5) SSA 9.71M ± 0% 9.67M ± 0% -0.33% (p=0.008 n=5+5) Flate 233k ± 1% 232k ± 1% ~ (p=0.151 n=5+5) GoParser 316k ± 0% 315k ± 0% -0.49% (p=0.016 n=5+5) Reflect 979k ± 0% 972k ± 0% -0.75% (p=0.008 n=5+5) Tar 250k ± 0% 247k ± 1% -0.92% (p=0.008 n=5+5) XML 392k ± 1% 389k ± 0% -0.67% (p=0.008 n=5+5) Change-Id: Idc36186ca9d2f8214b5f7720bbc27b6bb22fdc48 Reviewed-on: https://go-review.googlesource.com/40697 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-14 06:35:53 -07:00
}
}
if x := fn.StackObjects; x != nil {
attr := int16(obj.RODATA)
ggloblsym(x, int32(len(x.P)), attr)
x.Set(obj.AttrStatic, true)
}
if x := fn.OpenCodedDeferInfo; x != nil {
cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata Generate inline code at defer time to save the args of defer calls to unique (autotmp) stack slots, and generate inline code at exit time to check which defer calls were made and make the associated function/method/interface calls. We remember that a particular defer statement was reached by storing in the deferBits variable (always stored on the stack). At exit time, we check the bits of the deferBits variable to determine which defer function calls to make (in reverse order). These low-cost defers are only used for functions where no defers appear in loops. In addition, we don't do these low-cost defers if there are too many defer statements or too many exits in a function (to limit code increase). When a function uses open-coded defers, we produce extra FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and for each defer, the stack slots where the closure and associated args have been stored. The funcdata also includes the location of the deferBits variable. Therefore, for panics, we can use this funcdata to determine exactly which defers are active, and call the appropriate functions/methods/closures with the correct arguments for each active defer. In order to unwind the stack correctly after a recover(), we need to add an extra code segment to functions with open-coded defers that simply calls deferreturn() and returns. This segment is not reachable by the normal function, but is returned to by the runtime during recovery. We set the liveness information of this deferreturn() to be the same as the liveness at the first function call during the last defer exit code (so all return values and all stack slots needed by the defer calls will be live). I needed to increase the stackguard constant from 880 to 896, because of a small amount of new code in deferreturn(). The -N flag disables open-coded defers. '-d defer' prints out the kind of defer being used at each defer statement (heap-allocated, stack-allocated, or open-coded). Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ] With normal (stack-allocated) defers only: 35.4 ns/op With open-coded defers: 5.6 ns/op Cost of function call alone (remove defer keyword): 4.4 ns/op Text size increase (including funcdata) for go binary without/with open-coded defers: 0.09% The average size increase (including funcdata) for only the functions that use open-coded defers is 1.1%. The cost of a panic followed by a recover got noticeably slower, since panic processing now requires a scan of the stack for open-coded defer frames. This scan is required, even if no frames are using open-coded defers: Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ] Without open-coded defers: 62.0 ns/op With open-coded defers: 255 ns/op A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers: CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ] Without open-coded defers: 443 ns/op With open-coded defers: 347 ns/op Updates #14939 (defer performance) Updates #34481 (design doc) Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff Reviewed-on: https://go-review.googlesource.com/c/go/+/202340 Reviewed-by: Austin Clements <austin@google.com>
2019-06-24 12:59:22 -07:00
ggloblsym(x, int32(len(x.P)), obj.RODATA|obj.DUPOK)
}
cmd/internal/obj: rework gclocals handling The compiler handled gcargs and gclocals LSyms unusually. It generated placeholder symbols (makefuncdatasym), filled them in, and then renamed them for content-addressability. This is an important binary size optimization; the same locals information occurs over and over. This CL continues to treat these LSyms unusually, but in a slightly more explicit way, and importantly for concurrent compilation, in a way that does not require concurrent modification of Ctxt.Hash. Instead of creating gcargs and gclocals in the usual way, by creating a types.Sym and then an obj.LSym, we add them directly to obj.FuncInfo, initialize them in obj.InitTextSym, and deduplicate and add them to ctxt.Data at the end. Then the backend's job is simply to fill them in and rename them appropriately. Updates #15756 name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.7MB ± 0% -0.22% (p=0.016 n=5+5) Unicode 29.8MB ± 0% 29.8MB ± 0% ~ (p=0.690 n=5+5) GoTypes 113MB ± 0% 113MB ± 0% -0.24% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.24GB ± 0% -0.39% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.2MB ± 0% -0.43% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 31.7MB ± 0% -0.22% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 77.6MB ± 0% -0.80% (p=0.008 n=5+5) Tar 26.6MB ± 0% 26.3MB ± 0% -0.85% (p=0.008 n=5+5) XML 42.4MB ± 0% 41.9MB ± 0% -1.04% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 378k ± 0% 377k ± 1% ~ (p=0.151 n=5+5) Unicode 321k ± 1% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% -0.47% (p=0.016 n=5+5) SSA 9.71M ± 0% 9.67M ± 0% -0.33% (p=0.008 n=5+5) Flate 233k ± 1% 232k ± 1% ~ (p=0.151 n=5+5) GoParser 316k ± 0% 315k ± 0% -0.49% (p=0.016 n=5+5) Reflect 979k ± 0% 972k ± 0% -0.75% (p=0.008 n=5+5) Tar 250k ± 0% 247k ± 1% -0.92% (p=0.008 n=5+5) XML 392k ± 1% 389k ± 0% -0.67% (p=0.008 n=5+5) Change-Id: Idc36186ca9d2f8214b5f7720bbc27b6bb22fdc48 Reviewed-on: https://go-review.googlesource.com/40697 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-14 06:35:53 -07:00
}
}
func duintxx(s *obj.LSym, off int, v uint64, wid int) int {
if off&(wid-1) != 0 {
Fatalf("duintxxLSym: misaligned: v=%d wid=%d off=%d", v, wid, off)
}
s.WriteInt(Ctxt, int64(off), wid, int64(v))
return off + wid
}
func duint8(s *obj.LSym, off int, v uint8) int {
return duintxx(s, off, uint64(v), 1)
cmd/internal/obj: rework gclocals handling The compiler handled gcargs and gclocals LSyms unusually. It generated placeholder symbols (makefuncdatasym), filled them in, and then renamed them for content-addressability. This is an important binary size optimization; the same locals information occurs over and over. This CL continues to treat these LSyms unusually, but in a slightly more explicit way, and importantly for concurrent compilation, in a way that does not require concurrent modification of Ctxt.Hash. Instead of creating gcargs and gclocals in the usual way, by creating a types.Sym and then an obj.LSym, we add them directly to obj.FuncInfo, initialize them in obj.InitTextSym, and deduplicate and add them to ctxt.Data at the end. Then the backend's job is simply to fill them in and rename them appropriately. Updates #15756 name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.7MB ± 0% -0.22% (p=0.016 n=5+5) Unicode 29.8MB ± 0% 29.8MB ± 0% ~ (p=0.690 n=5+5) GoTypes 113MB ± 0% 113MB ± 0% -0.24% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.24GB ± 0% -0.39% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.2MB ± 0% -0.43% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 31.7MB ± 0% -0.22% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 77.6MB ± 0% -0.80% (p=0.008 n=5+5) Tar 26.6MB ± 0% 26.3MB ± 0% -0.85% (p=0.008 n=5+5) XML 42.4MB ± 0% 41.9MB ± 0% -1.04% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 378k ± 0% 377k ± 1% ~ (p=0.151 n=5+5) Unicode 321k ± 1% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% -0.47% (p=0.016 n=5+5) SSA 9.71M ± 0% 9.67M ± 0% -0.33% (p=0.008 n=5+5) Flate 233k ± 1% 232k ± 1% ~ (p=0.151 n=5+5) GoParser 316k ± 0% 315k ± 0% -0.49% (p=0.016 n=5+5) Reflect 979k ± 0% 972k ± 0% -0.75% (p=0.008 n=5+5) Tar 250k ± 0% 247k ± 1% -0.92% (p=0.008 n=5+5) XML 392k ± 1% 389k ± 0% -0.67% (p=0.008 n=5+5) Change-Id: Idc36186ca9d2f8214b5f7720bbc27b6bb22fdc48 Reviewed-on: https://go-review.googlesource.com/40697 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-14 06:35:53 -07:00
}
func duint16(s *obj.LSym, off int, v uint16) int {
return duintxx(s, off, uint64(v), 2)
}
func duint32(s *obj.LSym, off int, v uint32) int {
return duintxx(s, off, uint64(v), 4)
cmd/internal/obj: rework gclocals handling The compiler handled gcargs and gclocals LSyms unusually. It generated placeholder symbols (makefuncdatasym), filled them in, and then renamed them for content-addressability. This is an important binary size optimization; the same locals information occurs over and over. This CL continues to treat these LSyms unusually, but in a slightly more explicit way, and importantly for concurrent compilation, in a way that does not require concurrent modification of Ctxt.Hash. Instead of creating gcargs and gclocals in the usual way, by creating a types.Sym and then an obj.LSym, we add them directly to obj.FuncInfo, initialize them in obj.InitTextSym, and deduplicate and add them to ctxt.Data at the end. Then the backend's job is simply to fill them in and rename them appropriately. Updates #15756 name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.7MB ± 0% -0.22% (p=0.016 n=5+5) Unicode 29.8MB ± 0% 29.8MB ± 0% ~ (p=0.690 n=5+5) GoTypes 113MB ± 0% 113MB ± 0% -0.24% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.24GB ± 0% -0.39% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.2MB ± 0% -0.43% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 31.7MB ± 0% -0.22% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 77.6MB ± 0% -0.80% (p=0.008 n=5+5) Tar 26.6MB ± 0% 26.3MB ± 0% -0.85% (p=0.008 n=5+5) XML 42.4MB ± 0% 41.9MB ± 0% -1.04% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 378k ± 0% 377k ± 1% ~ (p=0.151 n=5+5) Unicode 321k ± 1% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% -0.47% (p=0.016 n=5+5) SSA 9.71M ± 0% 9.67M ± 0% -0.33% (p=0.008 n=5+5) Flate 233k ± 1% 232k ± 1% ~ (p=0.151 n=5+5) GoParser 316k ± 0% 315k ± 0% -0.49% (p=0.016 n=5+5) Reflect 979k ± 0% 972k ± 0% -0.75% (p=0.008 n=5+5) Tar 250k ± 0% 247k ± 1% -0.92% (p=0.008 n=5+5) XML 392k ± 1% 389k ± 0% -0.67% (p=0.008 n=5+5) Change-Id: Idc36186ca9d2f8214b5f7720bbc27b6bb22fdc48 Reviewed-on: https://go-review.googlesource.com/40697 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-14 06:35:53 -07:00
}
func duintptr(s *obj.LSym, off int, v uint64) int {
return duintxx(s, off, v, Widthptr)
}
func dbvec(s *obj.LSym, off int, bv bvec) int {
// Runtime reads the bitmaps as byte arrays. Oblige.
for j := 0; int32(j) < bv.n; j += 8 {
word := bv.b[j/32]
off = duint8(s, off, uint8(word>>(uint(j)%32)))
}
return off
}
const (
stringSymPrefix = "go.string."
stringSymPattern = ".gostring.%d.%x"
)
// stringsym returns a symbol containing the string s.
// The symbol contains the string data, not a string header.
func stringsym(pos src.XPos, s string) (data *obj.LSym) {
var symname string
if len(s) > 100 {
// Huge strings are hashed to avoid long names in object files.
// Indulge in some paranoia by writing the length of s, too,
// as protection against length extension attacks.
// Same pattern is known to fileStringSym below.
h := sha256.New()
io.WriteString(h, s)
symname = fmt.Sprintf(stringSymPattern, len(s), h.Sum(nil))
} else {
// Small strings get named directly by their contents.
symname = strconv.Quote(s)
}
symdata := Ctxt.Lookup(stringSymPrefix + symname)
if !symdata.OnList() {
off := dstringdata(symdata, 0, s, pos, "string")
ggloblsym(symdata, int32(off), obj.DUPOK|obj.RODATA|obj.LOCAL)
symdata.Set(obj.AttrContentAddressable, true)
}
return symdata
}
// fileStringSym returns a symbol for the contents and the size of file.
// If readonly is true, the symbol shares storage with any literal string
// or other file with the same content and is placed in a read-only section.
// If readonly is false, the symbol is a read-write copy separate from any other,
// for use as the backing store of a []byte.
// The content hash of file is copied into hash. (If hash is nil, nothing is copied.)
// The returned symbol contains the data itself, not a string header.
func fileStringSym(pos src.XPos, file string, readonly bool, hash []byte) (*obj.LSym, int64, error) {
f, err := os.Open(file)
if err != nil {
return nil, 0, err
}
defer f.Close()
info, err := f.Stat()
if err != nil {
return nil, 0, err
}
if !info.Mode().IsRegular() {
return nil, 0, fmt.Errorf("not a regular file")
}
size := info.Size()
if size <= 1*1024 {
data, err := ioutil.ReadAll(f)
if err != nil {
return nil, 0, err
}
if int64(len(data)) != size {
return nil, 0, fmt.Errorf("file changed between reads")
}
var sym *obj.LSym
if readonly {
sym = stringsym(pos, string(data))
} else {
sym = slicedata(pos, string(data)).Sym.Linksym()
}
if len(hash) > 0 {
sum := sha256.Sum256(data)
copy(hash, sum[:])
}
return sym, size, nil
}
if size > 2e9 {
// ggloblsym takes an int32,
// and probably the rest of the toolchain
// can't handle such big symbols either.
// See golang.org/issue/9862.
return nil, 0, fmt.Errorf("file too large")
}
// File is too big to read and keep in memory.
// Compute hash if needed for read-only content hashing or if the caller wants it.
var sum []byte
if readonly || len(hash) > 0 {
h := sha256.New()
n, err := io.Copy(h, f)
if err != nil {
return nil, 0, err
}
if n != size {
return nil, 0, fmt.Errorf("file changed between reads")
}
sum = h.Sum(nil)
copy(hash, sum)
}
var symdata *obj.LSym
if readonly {
symname := fmt.Sprintf(stringSymPattern, size, sum)
symdata = Ctxt.Lookup(stringSymPrefix + symname)
if !symdata.OnList() {
info := symdata.NewFileInfo()
info.Name = file
info.Size = size
ggloblsym(symdata, int32(size), obj.DUPOK|obj.RODATA|obj.LOCAL)
// Note: AttrContentAddressable cannot be set here,
// because the content-addressable-handling code
// does not know about file symbols.
}
} else {
// Emit a zero-length data symbol
// and then fix up length and content to use file.
symdata = slicedata(pos, "").Sym.Linksym()
symdata.Size = size
symdata.Type = objabi.SNOPTRDATA
info := symdata.NewFileInfo()
info.Name = file
info.Size = size
}
return symdata, size, nil
}
var slicedataGen int
func slicedata(pos src.XPos, s string) *Node {
slicedataGen++
symname := fmt.Sprintf(".gobytes.%d", slicedataGen)
sym := localpkg.Lookup(symname)
symnode := newname(sym)
sym.Def = asTypesNode(symnode)
lsym := sym.Linksym()
off := dstringdata(lsym, 0, s, pos, "slice")
ggloblsym(lsym, int32(off), obj.NOPTR|obj.LOCAL)
return symnode
}
func slicebytes(nam *Node, s string) {
if nam.Op != ONAME {
Fatalf("slicebytes %v", nam)
}
slicesym(nam, slicedata(nam.Pos, s), int64(len(s)))
}
func dstringdata(s *obj.LSym, off int, t string, pos src.XPos, what string) int {
// Objects that are too large will cause the data section to overflow right away,
// causing a cryptic error message by the linker. Check for oversize objects here
// and provide a useful error message instead.
if int64(len(t)) > 2e9 {
yyerrorl(pos, "%v with length %v is too big", what, len(t))
return 0
}
s.WriteString(Ctxt, int64(off), len(t), t)
return off + len(t)
}
func dsymptr(s *obj.LSym, off int, x *obj.LSym, xoff int) int {
off = int(Rnd(int64(off), int64(Widthptr)))
s.WriteAddr(Ctxt, int64(off), Widthptr, x, int64(xoff))
off += Widthptr
return off
}
func dsymptrOff(s *obj.LSym, off int, x *obj.LSym) int {
s.WriteOff(Ctxt, int64(off), x, 0)
off += 4
return off
}
func dsymptrWeakOff(s *obj.LSym, off int, x *obj.LSym) int {
s.WriteWeakOff(Ctxt, int64(off), x, 0)
off += 4
return off
}
// slicesym writes a static slice symbol {&arr, lencap, lencap} to n.
// arr must be an ONAME. slicesym does not modify n.
func slicesym(n, arr *Node, lencap int64) {
s := n.Sym.Linksym()
off := n.Xoffset
if arr.Op != ONAME {
Fatalf("slicesym non-name arr %v", arr)
}
s.WriteAddr(Ctxt, off, Widthptr, arr.Sym.Linksym(), arr.Xoffset)
s.WriteInt(Ctxt, off+sliceLenOffset, Widthptr, lencap)
s.WriteInt(Ctxt, off+sliceCapOffset, Widthptr, lencap)
}
// addrsym writes the static address of a to n. a must be an ONAME.
// Neither n nor a is modified.
func addrsym(n, a *Node) {
if n.Op != ONAME {
Fatalf("addrsym n op %v", n.Op)
}
if n.Sym == nil {
Fatalf("addrsym nil n sym")
}
if a.Op != ONAME {
Fatalf("addrsym a op %v", a.Op)
}
s := n.Sym.Linksym()
s.WriteAddr(Ctxt, n.Xoffset, Widthptr, a.Sym.Linksym(), a.Xoffset)
}
// pfuncsym writes the static address of f to n. f must be a global function.
// Neither n nor f is modified.
func pfuncsym(n, f *Node) {
if n.Op != ONAME {
Fatalf("pfuncsym n op %v", n.Op)
}
if n.Sym == nil {
Fatalf("pfuncsym nil n sym")
}
if f.Class() != PFUNC {
Fatalf("pfuncsym class not PFUNC %d", f.Class())
}
s := n.Sym.Linksym()
s.WriteAddr(Ctxt, n.Xoffset, Widthptr, funcsym(f.Sym).Linksym(), f.Xoffset)
}
// litsym writes the static literal c to n.
// Neither n nor c is modified.
func litsym(n, c *Node, wid int) {
if n.Op != ONAME {
Fatalf("litsym n op %v", n.Op)
}
if n.Sym == nil {
Fatalf("litsym nil n sym")
}
if c.Op == ONIL {
return
}
if c.Op != OLITERAL {
Fatalf("litsym c op %v", c.Op)
}
s := n.Sym.Linksym()
switch u := c.Val().U.(type) {
case bool:
i := int64(obj.Bool2int(u))
s.WriteInt(Ctxt, n.Xoffset, wid, i)
case *Mpint:
s.WriteInt(Ctxt, n.Xoffset, wid, u.Int64())
case *Mpflt:
f := u.Float64()
switch n.Type.Etype {
case TFLOAT32:
s.WriteFloat32(Ctxt, n.Xoffset, float32(f))
case TFLOAT64:
s.WriteFloat64(Ctxt, n.Xoffset, f)
}
case *Mpcplx:
r := u.Real.Float64()
i := u.Imag.Float64()
switch n.Type.Etype {
case TCOMPLEX64:
s.WriteFloat32(Ctxt, n.Xoffset, float32(r))
s.WriteFloat32(Ctxt, n.Xoffset+4, float32(i))
case TCOMPLEX128:
s.WriteFloat64(Ctxt, n.Xoffset, r)
s.WriteFloat64(Ctxt, n.Xoffset+8, i)
cmd/compile: write some static data directly Instead of generating ADATA instructions for static data, write that static data directly into the linker sym. This is considerably more efficient. The assembler still generates ADATA instructions, so the ADATA machinery cannot be dismantled yet. (Future work.) Skipping ADATA has a significant impact compiling the unicode package, which has lots of static data. name old time/op new time/op delta Unicode 227ms ±10% 192ms ± 4% -15.61% (p=0.000 n=29+30) name old alloc/op new alloc/op delta Unicode 51.0MB ± 0% 45.8MB ± 0% -10.29% (p=0.000 n=30+30) name old allocs/op new allocs/op delta Unicode 610k ± 0% 578k ± 0% -5.29% (p=0.000 n=30+30) This does not pass toolstash -cmp, because this changes the order in which some relocations get added, and thus it changes the output from the compiler. It is not worth the execution time to sort the relocs in the normal case. However, compiling with -S -v generates identical output if (1) you suppress printing of ADATA progs in flushplist and (2) you suppress printing of cpu timing. It is reasonable to suppress printing the ADATA progs, since the data itself is dumped later. I am therefore fairly confident that all changes are superficial and non-functional. Fixes #14786, although there's more to do in general. Change-Id: I8dfabe7b423b31a30e516cfdf005b62a2e9ccd82 Reviewed-on: https://go-review.googlesource.com/20645 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-13 12:16:43 -07:00
}
case string:
symdata := stringsym(n.Pos, u)
s.WriteAddr(Ctxt, n.Xoffset, Widthptr, symdata, 0)
s.WriteInt(Ctxt, n.Xoffset+int64(Widthptr), Widthptr, int64(len(u)))
cmd/compile: write some static data directly Instead of generating ADATA instructions for static data, write that static data directly into the linker sym. This is considerably more efficient. The assembler still generates ADATA instructions, so the ADATA machinery cannot be dismantled yet. (Future work.) Skipping ADATA has a significant impact compiling the unicode package, which has lots of static data. name old time/op new time/op delta Unicode 227ms ±10% 192ms ± 4% -15.61% (p=0.000 n=29+30) name old alloc/op new alloc/op delta Unicode 51.0MB ± 0% 45.8MB ± 0% -10.29% (p=0.000 n=30+30) name old allocs/op new allocs/op delta Unicode 610k ± 0% 578k ± 0% -5.29% (p=0.000 n=30+30) This does not pass toolstash -cmp, because this changes the order in which some relocations get added, and thus it changes the output from the compiler. It is not worth the execution time to sort the relocs in the normal case. However, compiling with -S -v generates identical output if (1) you suppress printing of ADATA progs in flushplist and (2) you suppress printing of cpu timing. It is reasonable to suppress printing the ADATA progs, since the data itself is dumped later. I am therefore fairly confident that all changes are superficial and non-functional. Fixes #14786, although there's more to do in general. Change-Id: I8dfabe7b423b31a30e516cfdf005b62a2e9ccd82 Reviewed-on: https://go-review.googlesource.com/20645 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-13 12:16:43 -07:00
default:
Fatalf("litsym unhandled OLITERAL %v", c)
cmd/compile: write some static data directly Instead of generating ADATA instructions for static data, write that static data directly into the linker sym. This is considerably more efficient. The assembler still generates ADATA instructions, so the ADATA machinery cannot be dismantled yet. (Future work.) Skipping ADATA has a significant impact compiling the unicode package, which has lots of static data. name old time/op new time/op delta Unicode 227ms ±10% 192ms ± 4% -15.61% (p=0.000 n=29+30) name old alloc/op new alloc/op delta Unicode 51.0MB ± 0% 45.8MB ± 0% -10.29% (p=0.000 n=30+30) name old allocs/op new allocs/op delta Unicode 610k ± 0% 578k ± 0% -5.29% (p=0.000 n=30+30) This does not pass toolstash -cmp, because this changes the order in which some relocations get added, and thus it changes the output from the compiler. It is not worth the execution time to sort the relocs in the normal case. However, compiling with -S -v generates identical output if (1) you suppress printing of ADATA progs in flushplist and (2) you suppress printing of cpu timing. It is reasonable to suppress printing the ADATA progs, since the data itself is dumped later. I am therefore fairly confident that all changes are superficial and non-functional. Fixes #14786, although there's more to do in general. Change-Id: I8dfabe7b423b31a30e516cfdf005b62a2e9ccd82 Reviewed-on: https://go-review.googlesource.com/20645 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-13 12:16:43 -07:00
}
}