Commit graph

322 commits

Author SHA1 Message Date
Dan Scales
be64a19d99 cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).

When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.

In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).

I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().

The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).

Cost of defer statement  [ go test -run NONE -bench BenchmarkDefer$ runtime ]
  With normal (stack-allocated) defers only:         35.4  ns/op
  With open-coded defers:                             5.6  ns/op
  Cost of function call alone (remove defer keyword): 4.4  ns/op

Text size increase (including funcdata) for go binary without/with open-coded defers:  0.09%

The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.

The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:

Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
  Without open-coded defers:        62.0 ns/op
  With open-coded defers:           255  ns/op

A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:

CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
  Without open-coded defers:        443 ns/op
  With open-coded defers:           347 ns/op

Updates #14939 (defer performance)
Updates #34481 (design doc)

Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/202340
Reviewed-by: Austin Clements <austin@google.com>
2019-10-24 13:54:11 +00:00
Cherry Zhang
97e497b253 [dev.link] cmd: reference symbols by name when linking against Go shared library
When building a program that links against Go shared libraries,
it needs to reference symbols defined in the shared library. At
compile time, we don't know where the shared library boundary is.
If we reference a symbol in package p by index, and package p is
actually part of a shared library, we cannot resolve the index at
link time, as the linker doesn't see the object file of p.

So when linking against Go shared libraries, always use named
reference for now.

To do this, the compiler needs to know whether we will be linking
against Go shared libraries. The -dynlink flag kind of indicates
that (as the document says), but currently it is actually
overloaded: it is also used when building a plugin or a shared
library, which is self-contained (if -linkshared is not otherwise
specified) and could use index for symbol reference. So we
introduce another compiler flag, -linkshared, specifically for
linking against Go shared libraries. The go command will pass
this flag if its -linkshared flag is specified
("go build -linkshared").

There may be better way to handle this. For example, we can
put the symbol indices in a special section in the shared library
that the linker can read. Or we can generate some per-package
description file to include the indices. (Currently we generate
a .shlibname file for each package that is included in a shared
library, which contains the path of the library. We could
consider extending this.) That said, this CL is a stop-gap
solution. And it is no worse than the old object files.

If we were to redesign the build system so that the shared
library boundary is known at compile time, we could use indices
for symbol references that do not cross shared library boundary,
as well as doing other things better.

Change-Id: I9c02aad36518051cc4785dbe25c4b4cef8f3faeb
Reviewed-on: https://go-review.googlesource.com/c/go/+/201818
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
2019-10-21 21:57:01 +00:00
Bryan C. Mills
b76e6f8825 Revert "cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata"
This reverts CL 190098.

Reason for revert: broke several builders.

Change-Id: I69161352f9ded02537d8815f259c4d391edd9220
Reviewed-on: https://go-review.googlesource.com/c/go/+/201519
Run-TryBot: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Dan Scales <danscales@google.com>
2019-10-16 20:59:53 +00:00
Dan Scales
dad616375f cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).

When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.

In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).

I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().

The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).

Cost of defer statement  [ go test -run NONE -bench BenchmarkDefer$ runtime ]
  With normal (stack-allocated) defers only:         35.4  ns/op
  With open-coded defers:                             5.6  ns/op
  Cost of function call alone (remove defer keyword): 4.4  ns/op

Text size increase (including funcdata) for go cmd without/with open-coded defers:  0.09%

The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.

The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:

Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
  Without open-coded defers:        62.0 ns/op
  With open-coded defers:           255  ns/op

A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:

CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
  Without open-coded defers:        443 ns/op
  With open-coded defers:           347 ns/op

Updates #14939 (defer performance)
Updates #34481 (design doc)

Change-Id: I51a389860b9676cfa1b84722f5fb84d3c4ee9e28
Reviewed-on: https://go-review.googlesource.com/c/go/+/190098
Reviewed-by: Austin Clements <austin@google.com>
2019-10-16 18:27:16 +00:00
Cherry Zhang
2c484c0356 [dev.link] cmd/internal/obj: write object file in new format
If -newobj is set, write object file in new format, which uses
indices for symbol references instead of symbol names. The file
format is described at the beginning of
cmd/internal/goobj2/objfile.go.

A new package, cmd/internal/goobj2, is introduced for reading and
writing new object files. (The package name is temporary.) It is
written in a way that trys to make the encoding as regular as
possible, and the reader and writer as symmetric as possible.

This is incomplete, and currently nothing will consume the new
object file.

Change-Id: Ifefedbf6456d760d15a9f40a28af6486c93100fe
Reviewed-on: https://go-review.googlesource.com/c/go/+/196030
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
2019-10-03 18:49:41 +00:00
Cherry Zhang
53b7c18284 [dev.link] cmd/compile, cmd/asm: assign index to symbols
We are planning to use indices for symbol references, instead of
symbol names. Here we assign indices to symbols defined in the
package being compiled, and propagate the indices to the
dependent packages in the export data.

A symbol is referenced by a tuple, (package index, symbol index).
Normally, for a given symbol, this index is unique, and the
symbol index is globally consistent (but with exceptions, see
below). The package index is local to a compilation. For example,
when compiling the fmt package, fmt.Println gets assigned index
25, then all packages that reference fmt.Println will refer it
as (X, 25) with some X. X is the index for the fmt package, which
may differ in different compilations.

There are some symbols that do not have clear package affiliation,
such as dupOK symbols and linknamed symbols. We cannot give them
globally consistent indices. We categorize them as non-package
symbols, assign them with package index 1 and a symbol index that
is only meaningful locally.

Currently nothing will consume the indices.

All this is behind a flag, -newobj. The flag needs to be set for
all builds (-gcflags=all=-newobj -asmflags=all=-newobj), or none.

Change-Id: I18e489c531e9a9fbc668519af92c6116b7308cab
Reviewed-on: https://go-review.googlesource.com/c/go/+/196029
Reviewed-by: Than McIntosh <thanm@google.com>
2019-10-02 19:07:17 +00:00
Than McIntosh
cdd59205c4 cmd/compile: don't emit autom's into object file
Don't write Autom records when writing a function to the object file;
we no longer need them in the linker for DWARF processing. So as to
keep the object file format unchanged, write out a zero-length list of
automs to the object, as opposed to removing all references.

Updates #34554.

Change-Id: I42a1d67207ea7114ae4f3a315cf37effba57f190
Reviewed-on: https://go-review.googlesource.com/c/go/+/197499
Reviewed-by: Jeremy Faller <jeremy@golang.org>
2019-09-27 13:58:59 +00:00
Than McIntosh
0b486d2a87 cmd/compile: add R_USETYPE relocs to func syms for autom types
During DWARF processing, keep track of the go type symbols for types
directly or indirectly referenced by auto variables in a function,
and add a set of dummy R_USETYPE relocations to the function's DWARF
subprogram DIE symbol.

This change is not useful on its own, but is part of a series of
changes intended to clean up handling of autom's in the compiler
and linker.

Updates #34554.

Change-Id: I974afa9b7092aa5dba808f74e00aa931249d6fe9
Reviewed-on: https://go-review.googlesource.com/c/go/+/197497
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
2019-09-27 13:56:32 +00:00
Jeremy Faller
7defbffcda cmd/compile: remove isStmt symbol from FuncInfo
As promised in CL 188238, removing the obsolete symbol.

Here are the latest stats. This is baselined at "e53edafb66" with only
these changes applied, run on magna.cam. The linker looks straight
better (in memory and speed).

There is still a change I'm working on walking the progs to generate the
debug_lines data in the compiler. That will likely result in a compiler
speedup.

name                      old time/op       new time/op       delta
Template                        324ms ± 3%        317ms ± 3%   -2.07%  (p=0.043 n=10+10)
Unicode                         142ms ± 4%        144ms ± 3%     ~     (p=0.393 n=10+10)
GoTypes                         1.05s ± 2%        1.07s ± 2%   +1.59%  (p=0.019 n=9+9)
Compiler                        4.09s ± 2%        4.11s ± 1%     ~     (p=0.218 n=10+10)
SSA                             12.5s ± 1%        12.7s ± 1%   +1.00%  (p=0.035 n=10+10)
Flate                           199ms ± 7%        203ms ± 5%     ~     (p=0.481 n=10+10)
GoParser                        245ms ± 3%        246ms ± 5%     ~     (p=0.780 n=9+10)
Reflect                         672ms ± 4%        688ms ± 3%   +2.42%  (p=0.015 n=10+10)
Tar                             280ms ± 4%        284ms ± 4%     ~     (p=0.123 n=10+10)
XML                             379ms ± 4%        381ms ± 2%     ~     (p=0.529 n=10+10)
LinkCompiler                    1.16s ± 4%        1.12s ± 2%   -3.03%  (p=0.001 n=10+9)
ExternalLinkCompiler            2.28s ± 3%        2.23s ± 3%   -2.51%  (p=0.011 n=8+9)
LinkWithoutDebugCompiler        686ms ± 9%        667ms ± 2%     ~     (p=0.277 n=9+8)
StdCmd                          14.1s ± 1%        14.0s ± 1%     ~     (p=0.739 n=10+10)

name                      old user-time/op  new user-time/op  delta
Template                        604ms ±23%        564ms ± 7%     ~     (p=0.661 n=10+9)
Unicode                         429ms ±40%        418ms ±37%     ~     (p=0.579 n=10+10)
GoTypes                         2.43s ±12%        2.51s ± 7%     ~     (p=0.393 n=10+10)
Compiler                        9.22s ± 3%        9.27s ± 3%     ~     (p=0.720 n=9+10)
SSA                             26.3s ± 3%        26.6s ± 2%     ~     (p=0.579 n=10+10)
Flate                           328ms ±19%        333ms ±12%     ~     (p=0.842 n=10+9)
GoParser                        387ms ± 5%        378ms ± 9%     ~     (p=0.356 n=9+10)
Reflect                         1.36s ±20%        1.43s ±21%     ~     (p=0.631 n=10+10)
Tar                             469ms ±12%        471ms ±21%     ~     (p=0.497 n=9+10)
XML                             685ms ±18%        698ms ±19%     ~     (p=0.739 n=10+10)
LinkCompiler                    1.86s ±10%        1.87s ±11%     ~     (p=0.968 n=10+9)
ExternalLinkCompiler            3.20s ±13%        3.01s ± 8%   -5.70%  (p=0.046 n=8+9)
LinkWithoutDebugCompiler        1.08s ±15%        1.09s ±20%     ~     (p=0.579 n=10+10)

name                      old alloc/op      new alloc/op      delta
Template                       36.3MB ± 0%       36.4MB ± 0%   +0.26%  (p=0.000 n=10+10)
Unicode                        28.5MB ± 0%       28.5MB ± 0%     ~     (p=0.165 n=10+10)
GoTypes                         120MB ± 0%        121MB ± 0%   +0.29%  (p=0.000 n=9+10)
Compiler                        546MB ± 0%        548MB ± 0%   +0.32%  (p=0.000 n=10+10)
SSA                            1.84GB ± 0%       1.85GB ± 0%   +0.49%  (p=0.000 n=10+10)
Flate                          22.9MB ± 0%       23.0MB ± 0%   +0.25%  (p=0.000 n=10+10)
GoParser                       27.8MB ± 0%       27.9MB ± 0%   +0.25%  (p=0.000 n=10+8)
Reflect                        77.5MB ± 0%       77.7MB ± 0%   +0.27%  (p=0.000 n=9+9)
Tar                            34.5MB ± 0%       34.6MB ± 0%   +0.23%  (p=0.000 n=10+10)
XML                            44.2MB ± 0%       44.4MB ± 0%   +0.32%  (p=0.000 n=10+10)
LinkCompiler                    239MB ± 0%        230MB ± 0%   -3.86%  (p=0.000 n=10+10)
ExternalLinkCompiler            243MB ± 0%        243MB ± 0%   +0.22%  (p=0.000 n=10+10)
LinkWithoutDebugCompiler        164MB ± 0%        155MB ± 0%   -5.45%  (p=0.000 n=10+10)

name                      old allocs/op     new allocs/op     delta
Template                         371k ± 0%         372k ± 0%   +0.44%  (p=0.000 n=10+10)
Unicode                          340k ± 0%         340k ± 0%   +0.05%  (p=0.000 n=10+10)
GoTypes                         1.32M ± 0%        1.32M ± 0%   +0.46%  (p=0.000 n=10+10)
Compiler                        5.34M ± 0%        5.37M ± 0%   +0.59%  (p=0.000 n=10+10)
SSA                             17.6M ± 0%        17.7M ± 0%   +0.63%  (p=0.000 n=10+10)
Flate                            233k ± 0%         234k ± 0%   +0.48%  (p=0.000 n=10+10)
GoParser                         309k ± 0%         310k ± 0%   +0.40%  (p=0.000 n=10+10)
Reflect                          964k ± 0%         969k ± 0%   +0.54%  (p=0.000 n=10+10)
Tar                              346k ± 0%         348k ± 0%   +0.48%  (p=0.000 n=10+9)
XML                              424k ± 0%         426k ± 0%   +0.51%  (p=0.000 n=10+10)
LinkCompiler                     751k ± 0%         645k ± 0%  -14.13%  (p=0.000 n=10+10)
ExternalLinkCompiler            1.79M ± 0%        1.69M ± 0%   -5.30%  (p=0.000 n=10+10)
LinkWithoutDebugCompiler         217k ± 0%         222k ± 0%   +2.02%  (p=0.000 n=10+10)

name                      old object-bytes  new object-bytes  delta
Template                        547kB ± 0%        559kB ± 0%   +2.17%  (p=0.000 n=10+10)
Unicode                         215kB ± 0%        216kB ± 0%   +0.60%  (p=0.000 n=10+10)
GoTypes                        1.99MB ± 0%       2.03MB ± 0%   +2.02%  (p=0.000 n=10+10)
Compiler                       7.86MB ± 0%       8.07MB ± 0%   +2.73%  (p=0.000 n=10+10)
SSA                            26.4MB ± 0%       27.2MB ± 0%   +3.27%  (p=0.000 n=10+10)
Flate                           337kB ± 0%        343kB ± 0%   +2.02%  (p=0.000 n=10+10)
GoParser                        432kB ± 0%        441kB ± 0%   +2.11%  (p=0.000 n=10+10)
Reflect                        1.33MB ± 0%       1.36MB ± 0%   +1.87%  (p=0.000 n=10+10)
Tar                             477kB ± 0%        487kB ± 0%   +2.24%  (p=0.000 n=10+10)
XML                             617kB ± 0%        632kB ± 0%   +2.33%  (p=0.000 n=10+10)

name                      old export-bytes  new export-bytes  delta
Template                       18.5kB ± 0%       18.5kB ± 0%     ~     (all equal)
Unicode                        7.92kB ± 0%       7.92kB ± 0%     ~     (all equal)
GoTypes                        35.0kB ± 0%       35.0kB ± 0%     ~     (all equal)
Compiler                        109kB ± 0%        109kB ± 0%   +0.09%  (p=0.000 n=10+10)
SSA                             137kB ± 0%        137kB ± 0%   +0.03%  (p=0.000 n=10+10)
Flate                          4.89kB ± 0%       4.89kB ± 0%     ~     (all equal)
GoParser                       8.49kB ± 0%       8.49kB ± 0%     ~     (all equal)
Reflect                        11.4kB ± 0%       11.4kB ± 0%     ~     (all equal)
Tar                            10.5kB ± 0%       10.5kB ± 0%     ~     (all equal)
XML                            16.7kB ± 0%       16.7kB ± 0%     ~     (all equal)

name                      old text-bytes    new text-bytes    delta
HelloSize                       760kB ± 0%        760kB ± 0%     ~     (all equal)
CmdGoSize                      10.8MB ± 0%       10.8MB ± 0%     ~     (all equal)

name                      old data-bytes    new data-bytes    delta
HelloSize                      10.7kB ± 0%       10.7kB ± 0%     ~     (all equal)
CmdGoSize                       312kB ± 0%        312kB ± 0%     ~     (all equal)

name                      old bss-bytes     new bss-bytes     delta
HelloSize                       122kB ± 0%        122kB ± 0%     ~     (all equal)
CmdGoSize                       146kB ± 0%        146kB ± 0%     ~     (all equal)

name                      old exe-bytes     new exe-bytes     delta
HelloSize                      1.11MB ± 0%       1.13MB ± 0%   +1.10%  (p=0.000 n=10+10)
CmdGoSize                      14.9MB ± 0%       15.0MB ± 0%   +0.77%  (p=0.000 n=10+10)

Change-Id: I42e6087cd6231dbdcfff5464e46d373474e455e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/192417
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
2019-09-26 15:22:09 +00:00
Jeremy Faller
376fc48338 cmd/compile: add new symbol for debug line numbers
This is broken out from: CL 187117

This new symbol will be populated by the compiler and contain debug line
information that's currently generated in the linker. One might say it's
sad to create a new symbol, but this symbol will replace the isStmt
symbols.

Testing: Ran go build -toolexec 'toolstash -cmp'

Change-Id: If8f7ae4b43b7247076605b6429b7d03a1fd239c5
Reviewed-on: https://go-review.googlesource.com/c/go/+/188238
Reviewed-by: Austin Clements <austin@google.com>
2019-09-23 19:40:07 +00:00
Joel Sing
7ef890db91 cmd/internal/obj: instructions and registers for RISC-V
Start implementing an assembler for RISC-V - this provides register
definitions and instruction mnemonics as defined in the RISC-V
Instruction Set Manual, along with instruction encoding.

The instruction encoding is generated by the parse_opcodes script with
the "opcodes" and "opcodes-pseudo" files from (`make inst.go`):

  https://github.com/riscv/riscv-opcodes

This is based on the riscv-go port:

  https://github.com/riscv/riscv-go

Contributors to the riscv-go port are:

  Amol Bhave <ammubhave@gmail.com>
  Benjamin Barenblat <bbaren@google.com>
  Josh Bleecher Snyder <josharian@gmail.com>
  Michael Pratt <michael@pratt.im>
  Michael Yenik <myenik@google.com>
  Ronald G. Minnich <rminnich@gmail.com>
  Stefan O'Rear <sorear2@gmail.com>

This port has been updated to Go 1.13:

  https://github.com/4a6f656c/riscv-go

Updates #27532

Change-Id: I257b6de87e9864df61a2b0ce9be15968c1227b49
Reviewed-on: https://go-review.googlesource.com/c/go/+/193677
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-09-07 13:24:59 +00:00
LE Manh Cuong
2d7cb295fd cmd/compile: clarify the difference between types.Sym and obj.LSym
Both types.Sym and obj.LSym have the field Name, and that field is
widely used in compiler source. It can lead to confusion that when to
use which one.

So, adding documentation for clarifying the difference between them,
eliminate the confusion, or at least, make the code which use them
clearer for the reader.

See https://github.com/golang/go/issues/31252#issuecomment-481929174

Change-Id: I31f7fc6e4de4cf68f67ab2e3a385a7f451c796f5
Reviewed-on: https://go-review.googlesource.com/c/go/+/175019
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2019-05-21 03:03:01 +00:00
David Chase
807761f334 cmd/link: revert/revise CL 98075 because LLDB is very picky now
This was originally

Revert "cmd/link: fix up debug_range for dsymutil (revert CL 72371)"

which has the effect of no longer using Base Address Selection
Entries in DWARF.  However, the build-time costs of that are
about 2%, so instead the hacky fixup that generated technically
incorrect DWARF was removed from the linker, and the choice
is instead made in the compiler, dependent on platform, but
also under control of a flag so that we can report this bug
against LLDB/dsymutil/dwarfdump (really, the LLVM dwarf
libraries).

This however does not solve #31188; debugging still fails,
but dwarfdump no longer complains.  There are at least two
LLDB bugs involved, and this change will at allow us
to report them without them being rejected because our
now-obsolete workaround for the first bug creates
not-quite-DWARF.

Updates #31188.

Change-Id: I5300c51ad202147bab7333329ebe961623d2b47d
Reviewed-on: https://go-review.googlesource.com/c/go/+/170638
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2019-04-23 20:52:23 +00:00
Michael Munday
aafe257390 cmd/link, runtime: mark goexit as the top of the call stack
This CL adds a new attribute, TOPFRAME, which can be used to mark
functions that should be treated as being at the top of the call
stack. The function `runtime.goexit` has been marked this way on
architectures that use a link register.

This will stop programs that use DWARF to unwind the call stack
from unwinding past `runtime.goexit` on architectures that use a
link register. For example, it eliminates "corrupt stack?"
warnings when generating a backtrace that hits `runtime.goexit`
in GDB on s390x.

Similar code should be added for non-link-register architectures
(i.e. amd64, 386). They mark the top of the call stack slightly
differently to link register architectures so I haven't added
that code (they need to mark "rip" as undefined).

Fixes #24385.

Change-Id: I15b4c69ac75b491daa0acf0d981cb80eb06488de
Reviewed-on: https://go-review.googlesource.com/c/go/+/169726
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-04-15 13:17:28 +00:00
Austin Clements
a2e79571a9 cmd/compile: separate data and function LSyms
Currently, obj.Ctxt's symbol table does not distinguish between ABI0
and ABIInternal symbols. This is *almost* okay, since a given symbol
name in the final object file is only going to belong to one ABI or
the other, but it requires that the compiler mark a Sym as being a
function symbol before it retrieves its LSym. If it retrieves the LSym
first, that LSym will be created as ABI0, and later marking the Sym as
a function symbol won't change the LSym's ABI.

Marking a Sym as a function symbol before looking up its LSym sounds
easy, except Syms have a dual purpose: they are used just as interned
strings (every function, variable, parameter, etc with the same
textual name shares a Sym), and *also* to store state for whatever
package global has that name. As a result, it's easy to slip up and
look up an LSym when a Sym is serving as the name of a local variable,
and then later mark it as a function when it's serving as the global
with the name.

In general, we were careful to avoid this, but #29610 demonstrates one
case where we messed up. Because of on-demand importing from indexed
export data, it's possible to compile a method wrapper for a type
imported from another package before importing an init function from
that package. If the argument of the method is named "init", the
"init" LSym will be created as a data symbol when compiling the
wrapper, before it gets marked as a function symbol.

To fix this, we separate obj.Ctxt's symbol tables for ABI0 and
ABIInternal symbols. This way, the compiler will simply get a
different LSym once the Sym takes on its package-global meaning as a
function.

This fixes the above ordering issue, and means we no longer need to go
out of our way to create the "init" function early and mark it as a
function symbol.

Fixes #29610.
Updates #27539.

Change-Id: Id9458b40017893d46ef9e4a3f9b47fc49e1ce8df
Reviewed-on: https://go-review.googlesource.com/c/157017
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-01-11 00:45:49 +00:00
Keith Randall
69c2c56453 cmd/compile,runtime: redo mid-stack inlining tracebacks
Work involved in getting a stack trace is divided between
runtime.Callers and runtime.CallersFrames.

Before this CL, runtime.Callers returns a pc per runtime frame.
runtime.CallersFrames is responsible for expanding a runtime frame
into potentially multiple user frames.

After this CL, runtime.Callers returns a pc per user frame.
runtime.CallersFrames just maps those to user frame info.

Entries in the result of runtime.Callers are now pcs
of the calls (or of the inline marks), not of the instruction
just after the call.

Fixes #29007
Fixes #28640
Update #26320

Change-Id: I1c9567596ff73dc73271311005097a9188c3406f
Reviewed-on: https://go-review.googlesource.com/c/152537
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-12-28 20:55:36 +00:00
Keith Randall
01a1eaa10c cmd/compile: use innermost line number for -S
When functions are inlined, for instructions in the inlined body, does
-S print the location of the call, or the location of the body? Right
now, we do the former. I'd like to do the latter by default, it makes
much more sense when reading disassembly. With mid-stack inlining
enabled in more cases, this quandry will come up more often.

The original behavior is still available with -S=2. Some tests
use this mode (so they can find assembly generated by a particular
source line).

This helped me with understanding what the compiler was doing
while fixing #29007.

Change-Id: Id14a3a41e1b18901e7c5e460aa4caf6d940ed064
Reviewed-on: https://go-review.googlesource.com/c/153241
Reviewed-by: David Chase <drchase@google.com>
2018-12-11 20:24:45 +00:00
Clément Chigot
4295ed9bef cmd: fix symbols addressing for aix/ppc64
This commit changes the code generated for addressing symbols on AIX
operating system.

On AIX, every symbol accesses must be done via another symbol near the TOC,
named TOC anchor or TOC entry. This TOC anchor is a pointer to the symbol
address.
During Progedit function, when a symbol access is detected, its instructions
are modified to create a load on its TOC anchor and retrieve the symbol.

Change-Id: I00cf8f49c13004bc99fa8af13d549a709320f797
Reviewed-on: https://go-review.googlesource.com/c/151039
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-11-27 21:06:16 +00:00
Austin Clements
c5718b6b26 cmd/internal/obj, cmd/link: record ABIs and aliases in Go obj files
This repurposes the "version" field of a symbol reference in the Go
object file format to be an ABI field. Currently, this is just 0 or 1
depending on whether the symbol is static (the linker turns it into a
different internal version number), so it's already only tenuously a
symbol version. We change this to be -1 for static symbols and
otherwise by the ABI number.

This also adds a separate list of ABI alias symbols to be recorded in
the object file. The ABI aliases must be a separate list and not just
part of the symbol definitions because it's possible to have a symbol
defined in one package and the alias "defined" in a different package.
For example, this can happen if a symbol is defined in assembly in one
package and stubbed in a different package. The stub triggers the
generation of the ABI alias, but in a different package from the
definition.

For #27539.

Change-Id: I015c9fe54690c027de6ef77e22b5585976a01587
Reviewed-on: https://go-review.googlesource.com/c/147157
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-11-12 20:46:48 +00:00
Austin Clements
97e4010fd4 cmd/compile: accept and parse symabis
This doesn't yet do anything with this information.

For #27539.

Change-Id: Ia12c905812aa1ed425eedd6ab2f55ec75d81c0ce
Reviewed-on: https://go-review.googlesource.com/c/147099
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-11-12 20:46:37 +00:00
Austin Clements
15265ec421 cmd/compile: avoid duplicate GC bitmap symbols
Currently, liveness produces a distinct obj.LSym for each GC bitmap
for each function. These are then named by content hash and only
ultimately deduplicated by WriteObjFile.

For various reasons (see next commit), we want to remove this
deduplication behavior from WriteObjFile. Furthermore, it's
inefficient to produce these duplicate symbols in the first place.

GC bitmaps are the only source of duplicate symbols in the compiler.
This commit eliminates these duplicate symbols by declaring them in
the Ctxt symbol hash just like every other obj.LSym. As a result, all
GC bitmaps with the same content now refer to the same obj.LSym.

The next commit will remove deduplication from WriteObjFile.

For #27539.

Change-Id: I4f15e3d99530122cdf473b7a838c69ef5f79db59
Reviewed-on: https://go-review.googlesource.com/c/146557
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-11-03 15:12:34 +00:00
Lynn Boger
bdba55653f cmd/asm/internal,cmd/internal/obj/ppc64: add alignment directive to asm for ppc64x
This adds support for an alignment directive that can be used
within Go asm to indicate preferred code alignment for ppc64x.
This is intended to be used with loops to improve
performance.

This change only adds the directive and aligns the code based
on it. Follow up changes will modify asm functions for
ppc64x that benefit from preferred alignment.

Fixes #14935

Here is one example of the improvement in memmove when the
directive is used on the loops in the code:

Memmove/64      8.74ns ± 0%    8.64ns ± 0%   -1.19%  (p=0.000 n=8+8)
Memmove/128     11.5ns ± 0%    11.0ns ± 0%   -4.35%  (p=0.000 n=8+8)
Memmove/256     23.0ns ± 0%    15.3ns ± 0%  -33.48%  (p=0.000 n=8+8)
Memmove/512     31.7ns ± 0%    31.8ns ± 0%   +0.32%  (p=0.000 n=8+8)
Memmove/1024    52.3ns ± 0%    43.9ns ± 0%  -16.10%  (p=0.000 n=8+8)
Memmove/2048    93.2ns ± 0%    76.2ns ± 0%  -18.24%  (p=0.000 n=8+8)
Memmove/4096     174ns ± 0%     141ns ± 0%  -18.97%  (p=0.000 n=8+8)

Change-Id: I200d77e923dd5d78c22fe3f8eb142a8fbaff57bf
Reviewed-on: https://go-review.googlesource.com/c/144218
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-10-23 20:37:29 +00:00
Keith Randall
cbafcc55e8 cmd/compile,runtime: implement stack objects
Rework how the compiler+runtime handles stack-allocated variables
whose address is taken.

Direct references to such variables work as before. References through
pointers, however, use a new mechanism. The new mechanism is more
precise than the old "ambiguously live" mechanism. It computes liveness
at runtime based on the actual references among objects on the stack.

Each function records all of its address-taken objects in a FUNCDATA.
These are called "stack objects". The runtime then uses that
information while scanning a stack to find all of the stack objects on
a stack. It then does a mark phase on the stack objects, using all the
pointers found on the stack (and ancillary structures, like defer
records) as the root set. Only stack objects which are found to be
live during this mark phase will be scanned and thus retain any heap
objects they point to.

A subsequent CL will remove all the "ambiguously live" logic from
the compiler, so that the stack object tracing will be required.
For this CL, the stack tracing is all redundant with the current
ambiguously live logic.

Update #22350

Change-Id: Ide19f1f71a5b6ec8c4d54f8f66f0e9a98344772f
Reviewed-on: https://go-review.googlesource.com/c/134155
Reviewed-by: Austin Clements <austin@google.com>
2018-10-03 19:52:49 +00:00
Austin Clements
9f95c9db23 cmd/compile, cmd/internal/obj: record register maps in binary
This adds FUNCDATA and PCDATA that records the register maps much like
the existing live arguments maps and live locals maps. The register
map is indexed independently from the argument and locals maps since
changes in register liveness tend not to correlate with changes to
argument and local liveness.

This is the final CL toward adding safe-points everywhere. The
following CLs will optimize liveness analysis to bring down the cost.
The effect of this CL is:

name        old time/op       new time/op       delta
Template          195ms ± 2%        197ms ± 1%    ~     (p=0.136 n=9+9)
Unicode          98.4ms ± 2%       99.7ms ± 1%  +1.39%  (p=0.004 n=10+10)
GoTypes           685ms ± 1%        700ms ± 1%  +2.06%  (p=0.000 n=9+9)
Compiler          3.28s ± 1%        3.34s ± 0%  +1.71%  (p=0.000 n=9+8)
SSA               7.79s ± 1%        7.91s ± 1%  +1.55%  (p=0.000 n=10+9)
Flate             133ms ± 2%        133ms ± 2%    ~     (p=0.190 n=10+10)
GoParser          161ms ± 2%        164ms ± 3%  +1.83%  (p=0.015 n=10+10)
Reflect           450ms ± 1%        457ms ± 1%  +1.62%  (p=0.000 n=10+10)
Tar               183ms ± 2%        185ms ± 1%  +0.91%  (p=0.008 n=9+10)
XML               234ms ± 1%        238ms ± 1%  +1.60%  (p=0.000 n=9+9)
[Geo mean]        411ms             417ms       +1.40%

name        old exe-bytes     new exe-bytes     delta
HelloSize         1.47M ± 0%        1.51M ± 0%  +2.79%  (p=0.000 n=10+10)

Compared to just before "cmd/internal/obj: consolidate emitting entry
stack map", the cumulative effect of adding stack maps everywhere and
register maps is:

name        old time/op       new time/op       delta
Template          185ms ± 2%        197ms ± 1%   +6.42%  (p=0.000 n=10+9)
Unicode          96.3ms ± 3%       99.7ms ± 1%   +3.60%  (p=0.000 n=10+10)
GoTypes           658ms ± 0%        700ms ± 1%   +6.37%  (p=0.000 n=10+9)
Compiler          3.14s ± 1%        3.34s ± 0%   +6.53%  (p=0.000 n=9+8)
SSA               7.41s ± 2%        7.91s ± 1%   +6.71%  (p=0.000 n=9+9)
Flate             126ms ± 1%        133ms ± 2%   +6.15%  (p=0.000 n=10+10)
GoParser          153ms ± 1%        164ms ± 3%   +6.89%  (p=0.000 n=10+10)
Reflect           437ms ± 1%        457ms ± 1%   +4.59%  (p=0.000 n=10+10)
Tar               178ms ± 1%        185ms ± 1%   +4.18%  (p=0.000 n=10+10)
XML               223ms ± 1%        238ms ± 1%   +6.39%  (p=0.000 n=10+9)
[Geo mean]        394ms             417ms        +5.78%

name        old alloc/op      new alloc/op      delta
Template         34.5MB ± 0%       38.0MB ± 0%  +10.19%  (p=0.000 n=10+10)
Unicode          29.3MB ± 0%       30.3MB ± 0%   +3.56%  (p=0.000 n=8+9)
GoTypes           113MB ± 0%        125MB ± 0%  +10.89%  (p=0.000 n=10+10)
Compiler          510MB ± 0%        575MB ± 0%  +12.79%  (p=0.000 n=10+10)
SSA              1.46GB ± 0%       1.64GB ± 0%  +12.40%  (p=0.000 n=10+10)
Flate            23.9MB ± 0%       25.9MB ± 0%   +8.56%  (p=0.000 n=10+10)
GoParser         28.0MB ± 0%       30.8MB ± 0%  +10.08%  (p=0.000 n=10+10)
Reflect          77.6MB ± 0%       84.3MB ± 0%   +8.63%  (p=0.000 n=10+10)
Tar              34.1MB ± 0%       37.0MB ± 0%   +8.44%  (p=0.000 n=10+10)
XML              42.7MB ± 0%       47.2MB ± 0%  +10.75%  (p=0.000 n=10+10)
[Geo mean]       76.0MB            83.3MB        +9.60%

name        old allocs/op     new allocs/op     delta
Template           321k ± 0%         337k ± 0%   +4.98%  (p=0.000 n=10+10)
Unicode            337k ± 0%         340k ± 0%   +1.04%  (p=0.000 n=10+9)
GoTypes           1.13M ± 0%        1.18M ± 0%   +4.85%  (p=0.000 n=10+10)
Compiler          4.67M ± 0%        4.96M ± 0%   +6.25%  (p=0.000 n=10+10)
SSA               11.7M ± 0%        12.3M ± 0%   +5.69%  (p=0.000 n=10+10)
Flate              216k ± 0%         226k ± 0%   +4.52%  (p=0.000 n=10+9)
GoParser           271k ± 0%         283k ± 0%   +4.52%  (p=0.000 n=10+10)
Reflect            927k ± 0%         972k ± 0%   +4.78%  (p=0.000 n=10+10)
Tar                318k ± 0%         333k ± 0%   +4.56%  (p=0.000 n=10+10)
XML                376k ± 0%         395k ± 0%   +5.04%  (p=0.000 n=10+10)
[Geo mean]         730k              764k        +4.61%

name        old exe-bytes     new exe-bytes     delta
HelloSize         1.46M ± 0%        1.51M ± 0%   +3.66%  (p=0.000 n=10+10)

For #24543.

Change-Id: I91e003dc64151916b384274884bf02a2d6862547
Reviewed-on: https://go-review.googlesource.com/109353
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-05-22 15:55:03 +00:00
isharipo
5437cde96c cmd/asm: enable AVX512
- Uncomment tests for AVX512 encoder
- Permit instruction suffixes for x86
- Permit limited reg list [reg-reg] syntax for x86 for multi-source ops
- EVEX encoding support in obj/x86 (Z-cases, asmevex, etc.)
- optabs and ytabs generated by x86avxgen (https://golang.org/cl/107216)

Note: suffix formatting implemented with updated CConv function.
Now arch asm backend should register formatting function by
calling RegisterOpSuffix.

Updates #22779

Change-Id: I076a167ee49582700e058c56ad74e6696710c8c8
Reviewed-on: https://go-review.googlesource.com/113315
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-05-22 14:57:15 +00:00
Richard Musiol
3b137dd2df cmd/compile: add wasm architecture
This commit adds the wasm architecture to the compile command.
A later commit will contain the corresponding linker changes.

Design doc: https://docs.google.com/document/d/131vjr4DH6JFnb-blm_uRdaC0_Nv3OUwjEY5qVCxCup4

The following files are generated:
- src/cmd/compile/internal/ssa/opGen.go
- src/cmd/compile/internal/ssa/rewriteWasm.go
- src/cmd/internal/obj/wasm/anames.go

Updates #18892

Change-Id: Ifb4a96a3e427aac2362a1c97967d5667450fba3b
Reviewed-on: https://go-review.googlesource.com/103295
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-05-04 17:56:12 +00:00
Wei Xiao
bd8a88729c cmd/compile: intrinsify runtime.getcallerpc on arm64
Add a compiler intrinsic for getcallerpc on arm64 for better code generation.

Change-Id: I897e670a2b8ffa1a8c2fdc638f5b2c44bda26318
Reviewed-on: https://go-review.googlesource.com/109276
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-30 13:29:14 +00:00
David Chase
dead03b794 cmd/link: process is_stmt data into dwarf line tables
To improve debugging, instructions should be annotated with
DWARF is_stmt.  The DWARF default before was is_stmt=1, and
to remove "jumpy" stepping the optimizer was tagging
instructions with a no-position position, which interferes
with the accuracy of profiling information.  This allows
that to be corrected, and also allows more "jumpy" positions
to be annotated with is_stmt=0 (these changes were not made
for 1.10 because of worries about further messing up
profiling).

The is_stmt values are placed in a pc-encoded table and
passed through a symbol derived from the name of the
function and processed in the linker alongside its
processing of each function's pc/line tables.

The only change in binary size is in the .debug_line tables
measured with "objdump -h --section=.debug_line go1.test"
For go1.test, these are 2614 bytes larger,
or 0.72% of the size of .debug_line,
or 0.025% of the file size.

This will increase in proportion to how much the is_stmt
flag is used (toggled).

Change-Id: Ic1f1aeccff44591ad0494d29e1a0202a3c506a7a
Reviewed-on: https://go-review.googlesource.com/93664
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-04-04 22:14:29 +00:00
Daniel Martí
c15b7b2a54 cmd: re-generate all stringer files
The tool has gotten better over time, so re-generating the files brings
some advantages like fewer objects, dropping the use of fmt, and
dropping unnecessary bounds checks.

While at it, add the missing go:generate line for obj.AddrType.

Change-Id: I120c9795ee8faddf5961ff0384b9dcaf58d831ff
Reviewed-on: https://go-review.googlesource.com/100015
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-10 21:20:50 +00:00
Than McIntosh
88c2fb9d04 cmd/compile: fix bug in DWARF inl handling of unused autos
The DWARF inline info generation hooks weren't properly
handling unused auto vars in certain cases, triggering an assert (now
fixed). Also with this change, introduce a new autom "flavor" to
use for autom entries that are added to insure that a specific
auto type makes it into the linker (this is a follow-on to the fix
for 22941).

Fixes #22962.

Change-Id: I7a2d8caf47f6ca897b12acb6a6de0eb25f5cac8f
Reviewed-on: https://go-review.googlesource.com/81557
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-12-04 18:36:11 +00:00
Than McIntosh
4435fcfd6c compiler,linker: support for DWARF inlined instances
Compiler and linker changes to support DWARF inlined instances,
see https://go.googlesource.com/proposal/+/HEAD/design/22080-dwarf-inlining.md
for design details.

This functionality is gated via the cmd/compile option -gendwarfinl=N,
where N={0,1,2}, where a value of 0 disables dwarf inline generation,
a value of 1 turns on dwarf generation without tracking of formal/local
vars from inlined routines, and a value of 2 enables inlines with
variable tracking.

Updates #22080

Change-Id: I69309b3b815d9fed04aebddc0b8d33d0dbbfad6e
Reviewed-on: https://go-review.googlesource.com/75550
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-11-30 14:39:19 +00:00
isharipo
1e83f883c5 cmd/internal/obj: make it possible to have all AVX1/2 insts
Current AllowedOpCodes is 1024, which is not enough for modern x86.
Changed limit to 2048 (though AVX512 will exceed this).

Additional Z-cases and ytab tables are added to make it possible
to handle missing AVX1 and AVX2 instructions.

This CL is required by x86avxgen to work properly:
https://go-review.googlesource.com/c/arch/+/66972

Change-Id: I290214bbda554d2cba53349f50dcd34014fe4cee
Reviewed-on: https://go-review.googlesource.com/70650
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2017-11-02 16:12:17 +00:00
Wei Xiao
531e6c06c4 cmd/asm: refine Go assembly for ARM64
Some ARM64-specific instructions (such as SIMD instructions) are not supported.
This patch adds support for the following:
1. Extended register, e.g.:
     ADD	Rm.<ext>[<<amount], Rn, Rd
     <ext> can have the following values:
       UXTB, UXTH, UXTW, UXTX, SXTB, SXTH, SXTW and SXTX
2. Arrangement for SIMD instructions, e.g.:
     VADDP	Vm.<T>, Vn.<T>, Vd.<T>
     <T> can have the following values:
       B8, B16, H4, H8, S2, S4 and D2
3. Width specifier and element index for SIMD instructions, e.g.:
     VMOV	Vn.<T>[index], Rd // MOV(to general register)
     <T> can have the following values:
       S and D
4. Register List, e.g.:
     VLD1	(Rn), [Vt1.<T>, Vt2.<T>, Vt3.<T>]
5. Register offset variant, e.g.:
     VLD1.P	(Rn)(Rm), [Vt1.<T>, Vt2.<T>] // Rm is the post-index register
6. Go assembly for ARM64 reference manual
     new added instructions are required to have according explanation items in
     the manual and items for existed instructions will be added incrementally

For more information about the refinement background, please refer to the
discussion (https://groups.google.com/forum/#!topic/golang-dev/rWgDxCrL4GU)

This patch only adds syntax and doesn't break any assembly that already exists.

Change-Id: I34e90b7faae032820593a0e417022c354a882008
Reviewed-on: https://go-review.googlesource.com/41654
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-13 13:41:19 +00:00
Keith Randall
e130dcf051 cmd/compile: abort earlier if stack frame too large
If the stack frame is too large, abort immediately.
We used to generate code first, then abort.
In issue 22200, generating code raised a panic
so we got an ICE instead of an error message.

Change the max frame size to 1GB (from 2GB).
Stack frames between 1.1GB and 2GB didn't used to work anyway,
the pcln table generation would have failed and generated an ICE.

Fixes #22200

Change-Id: I1d918ab27ba6ebf5c87ec65d1bccf973f8c8541e
Reviewed-on: https://go-review.googlesource.com/69810
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-10-11 18:24:13 +00:00
isharipo
8c67f210a1 cmd/internal/obj: change Prog.From3 to RestArgs ([]Addr)
This change makes it easier to express instructions
with arbitrary number of operands.

Rationale: previous approach with operand "hiding" does
not scale well, AVX and especially AVX512 have many
instructions with 3+ operands.

x86 asm backend is updated to handle up to 6 explicit operands.
It also fixes issue with 4-th immediate operand type checks.
All `ytab` tables are updated accordingly.

Changes to non-x86 backends only include these patterns:
`p.From3 = X` => `p.SetFrom3(X)`
`p.From3.X = Y` => `p.GetFrom3().X = Y`

Over time, other backends can adapt Prog.RestArgs
and reduce the amount of workarounds.

-- Performance --

x/benchmark/build:

$ benchstat upstream.bench patched.bench
name      old time/op                 new time/op                 delta
Build-48                  21.7s ± 2%                  21.8s ± 2%   ~     (p=0.218 n=10+10)

name      old binary-size             new binary-size             delta
Build-48                  10.3M ± 0%                  10.3M ± 0%   ~     (all equal)

name      old build-time/op           new build-time/op           delta
Build-48                  21.7s ± 2%                  21.8s ± 2%   ~     (p=0.218 n=10+10)

name      old build-peak-RSS-bytes    new build-peak-RSS-bytes    delta
Build-48                  145MB ± 5%                  148MB ± 5%   ~     (p=0.218 n=10+10)

name      old build-user+sys-time/op  new build-user+sys-time/op  delta
Build-48                  21.0s ± 2%                  21.2s ± 2%   ~     (p=0.075 n=10+10)

Microbenchmark shows a slight slowdown.

name        old time/op  new time/op  delta
AMD64asm-4  49.5ms ± 1%  49.9ms ± 1%  +0.67%  (p=0.001 n=23+15)

func BenchmarkAMD64asm(b *testing.B) {
  for i := 0; i < b.N; i++ {
    TestAMD64EndToEnd(nil)
    TestAMD64Encoder(nil)
  }
}

Change-Id: I4f1d37b5c2c966da3f2127705ccac9bff0038183
Reviewed-on: https://go-review.googlesource.com/63490
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-09-15 21:05:03 +00:00
Heschi Kreinick
4c54a047c6 [dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.

After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.

There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:

  type myint int
  var a int
  b := myint(int)
and
  b := (*uint64)(unsafe.Pointer(a))

don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.

Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.

A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.

TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.

go build -toolexec 'toolstash -cmp' -a std succeeds.

With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after:  06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean  /tmp/220352263 /tmp/621364410
completed   15 of   15, estimated time remaining 0s (eta 3:52PM)
name        old time/op       new time/op       delta
Template          199ms ± 3%        198ms ± 2%     ~     (p=0.400 n=15+14)
Unicode          96.6ms ± 5%       96.4ms ± 5%     ~     (p=0.838 n=15+15)
GoTypes           653ms ± 2%        647ms ± 2%     ~     (p=0.102 n=15+14)
Flate             133ms ± 6%        129ms ± 3%   -2.62%  (p=0.041 n=15+15)
GoParser          164ms ± 5%        159ms ± 3%   -3.05%  (p=0.000 n=15+15)
Reflect           428ms ± 4%        422ms ± 3%     ~     (p=0.156 n=15+13)
Tar               123ms ±10%        124ms ± 8%     ~     (p=0.461 n=15+15)
XML               228ms ± 3%        224ms ± 3%   -1.57%  (p=0.045 n=15+15)
[Geo mean]        206ms             377ms       +82.86%

name        old user-time/op  new user-time/op  delta
Template          292ms ±10%        301ms ±12%     ~     (p=0.189 n=15+15)
Unicode           166ms ±37%        158ms ±14%     ~     (p=0.418 n=15+14)
GoTypes           962ms ± 6%        963ms ± 7%     ~     (p=0.976 n=15+15)
Flate             207ms ±19%        200ms ±14%     ~     (p=0.345 n=14+15)
GoParser          246ms ±22%        240ms ±15%     ~     (p=0.587 n=15+15)
Reflect           611ms ±13%        587ms ±14%     ~     (p=0.085 n=15+13)
Tar               211ms ±12%        217ms ±14%     ~     (p=0.355 n=14+15)
XML               335ms ±15%        320ms ±18%     ~     (p=0.169 n=15+15)
[Geo mean]        317ms             583ms       +83.72%

name        old alloc/op      new alloc/op      delta
Template         40.2MB ± 0%       40.2MB ± 0%   -0.15%  (p=0.000 n=14+15)
Unicode          29.2MB ± 0%       29.3MB ± 0%     ~     (p=0.624 n=15+15)
GoTypes           114MB ± 0%        114MB ± 0%   -0.15%  (p=0.000 n=15+14)
Flate            25.7MB ± 0%       25.6MB ± 0%   -0.18%  (p=0.000 n=13+15)
GoParser         32.2MB ± 0%       32.2MB ± 0%   -0.14%  (p=0.003 n=15+15)
Reflect          77.8MB ± 0%       77.9MB ± 0%     ~     (p=0.061 n=15+15)
Tar              27.1MB ± 0%       27.0MB ± 0%   -0.11%  (p=0.029 n=15+15)
XML              42.7MB ± 0%       42.5MB ± 0%   -0.29%  (p=0.000 n=15+15)
[Geo mean]       42.1MB            75.0MB       +78.05%

name        old allocs/op     new allocs/op     delta
Template           402k ± 1%         398k ± 0%   -0.91%  (p=0.000 n=15+15)
Unicode            344k ± 1%         344k ± 0%     ~     (p=0.715 n=15+14)
GoTypes           1.18M ± 0%        1.17M ± 0%   -0.91%  (p=0.000 n=15+14)
Flate              243k ± 0%         240k ± 1%   -1.05%  (p=0.000 n=13+15)
GoParser           327k ± 1%         324k ± 1%   -0.96%  (p=0.000 n=15+15)
Reflect            984k ± 1%         982k ± 0%     ~     (p=0.050 n=15+15)
Tar                261k ± 1%         259k ± 1%   -0.77%  (p=0.000 n=15+15)
XML                411k ± 0%         404k ± 1%   -1.55%  (p=0.000 n=15+15)
[Geo mean]         439k              755k       +72.01%

name        old text-bytes    new text-bytes    delta
HelloSize         694kB ± 0%        694kB ± 0%   -0.00%  (p=0.000 n=15+15)

name        old data-bytes    new data-bytes    delta
HelloSize        5.55kB ± 0%       5.55kB ± 0%     ~     (all equal)

name        old bss-bytes     new bss-bytes     delta
HelloSize         133kB ± 0%        133kB ± 0%     ~     (all equal)

name        old exe-bytes     new exe-bytes     delta
HelloSize        1.04MB ± 0%       1.04MB ± 0%     ~     (all equal)

Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-27 20:19:44 +00:00
Alessandro Arzilli
2ad41a3090 cmd/compile: output DWARF lexical blocks for local variables
Change compiler and linker to emit DWARF lexical blocks in .debug_info
section when compiling with -N -l.

Version of debug_info is updated from DWARF v2 to DWARF v3 since
version 2 does not allow lexical blocks with discontinuous PC ranges.

Remaining open problems:
- scope information is removed from inlined functions
- variables records do not have DW_AT_start_scope attributes so a
variable will shadow other variables with the same name as soon as its
containing scope begins, even before its declaration.

Updates #6913.
Updates #12899.

Change-Id: Idc6808788512ea20e7e45bcf782453acb416fb49
Reviewed-on: https://go-review.googlesource.com/40095
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-05-18 23:10:50 +00:00
Josh Bleecher Snyder
756b9ce3a5 cmd/compile: add initial backend concurrency support
This CL adds initial support for concurrent backend compilation.

BACKGROUND

The compiler currently consists (very roughly) of the following phases:

1. Initialization.
2. Lexing and parsing into the cmd/compile/internal/syntax AST.
3. Translation into the cmd/compile/internal/gc AST.
4. Some gc AST passes: typechecking, escape analysis, inlining,
   closure handling, expression evaluation ordering (order.go),
   and some lowering and optimization (walk.go).
5. Translation into the cmd/compile/internal/ssa SSA form.
6. Optimization and lowering of SSA form.
7. Translation from SSA form to assembler instructions.
8. Translation from assembler instructions to machine code.
9. Writing lots of output: machine code, DWARF symbols,
   type and reflection info, export data.

Phase 2 was already concurrent as of Go 1.8.

Phase 3 is planned for eventual removal;
we hope to go straight from syntax AST to SSA.

Phases 5–8 are per-function; this CL adds support for
processing multiple functions concurrently.
The slowest phases in the compiler are 5 and 6,
so this offers the opportunity for some good speed-ups.

Unfortunately, it's not quite that straightforward.
In the current compiler, the latter parts of phase 4
(order, walk) are done function-at-a-time as needed.
Making order and walk concurrency-safe proved hard,
and they're not particularly slow, so there wasn't much reward.
To enable phases 5–8 to be done concurrently,
when concurrent backend compilation is requested,
we complete phase 4 for all functions
before starting later phases for any functions.

Also, in reality, we automatically generate new
functions in phase 9, such as method wrappers
and equality and has routines.
Those new functions then go through phases 4–8.
This CL disables concurrent backend compilation
after the first, big, user-provided batch of
functions has been compiled.
This is done to keep things simple,
and because the autogenerated functions
tend to be small, few, simple, and fast to compile.

USAGE

Concurrent backend compilation still defaults to off.
To set the number of functions that may be backend-compiled
concurrently, use the compiler flag -c.
In future work, cmd/go will automatically set -c.

Furthermore, this CL has been intentionally written
so that the c=1 path has no backend concurrency whatsoever,
not even spawning any goroutines.
This helps ensure that, should problems arise
late in the development cycle,
we can simply have cmd/go set c=1 always,
and revert to the original compiler behavior.

MUTEXES

Most of the work required to make concurrent backend
compilation safe has occurred over the past month.
This CL adds a handful of mutexes to get the rest of the way there;
they are the mutexes that I didn't see a clean way to avoid.
Some of them may still be eliminable in future work.

In no particular order:

* gc.funcsymsmu. The global funcsyms slice is populated
  lazily when we need function symbols for closures.
  This occurs during gc AST to SSA translation.
  The function funcsym also does a package lookup,
  which is a source of races on types.Pkg.Syms;
  funcsymsmu also covers that package lookup.
  This mutex is low priority: it adds a single global,
  it is in an infrequently used code path, and it is low contention.
  Since funcsyms may now be added in any order,
  we must sort them to preserve reproducible builds.

* gc.largeStackFramesMu. We don't discover until after SSA compilation
  that a function's stack frame is gigantic.
  Recording that error happens basically never,
  but it does happen concurrently.
  Fix with a low priority mutex and sorting.

* obj.Link.hashmu. ctxt.hash stores the mapping from
  types.Syms (compiler symbols) to obj.LSyms (linker symbols).
  It is accessed fairly heavily through all the phases.
  This is the only heavily contended mutex.

* gc.signatlistmu. The global signatlist map is
  populated with types through several of the concurrent phases,
  including notably via ngotype during DWARF generation.
  It is low priority for removal.

* gc.typepkgmu. Looking up symbols in the types package
  happens a fair amount during backend compilation
  and DWARF generation, particularly via ngotype.
  This mutex helps us to avoid a broader mutex on types.Pkg.Syms.
  It has low-to-moderate contention.

* types.internedStringsmu. gc AST to SSA conversion and
  some SSA work introduce new autotmps.
  Those autotmps have their names interned to reduce allocations.
  That interning requires protecting types.internedStrings.
  The autotmp names are heavily re-used, and the mutex
  overhead and contention here are low, so it is probably
  a worthwhile performance optimization to keep this mutex.

TESTING

I have been testing this code locally by running
'go install -race cmd/compile'
and then doing
'go build -a -gcflags=-c=128 std cmd'
for all architectures and a variety of compiler flags.
This obviously needs to be made part of the builders,
but it is too expensive to make part of all.bash.
I have filed #19962 for this.

REPRODUCIBLE BUILDS

This version of the compiler generates reproducible builds.
Testing reproducible builds also needs automation, however,
and is also too expensive for all.bash.
This is #19961.

Also of note is that some of the compiler flags used by 'toolstash -cmp'
are currently incompatible with concurrent backend compilation.
They still work fine with c=1.
Time will tell whether this is a problem.

NEXT STEPS

* Continue to find and fix races and bugs,
  using a combination of code inspection, fuzzing,
  and hopefully some community experimentation.
  I do not know of any outstanding races,
  but there probably are some.
* Improve testing.
* Improve performance, for many values of c.
* Integrate with cmd/go and fine tune.
* Support concurrent compilation with the -race flag.
  It is a sad irony that it does not yet work.
* Minor code cleanup that has been deferred during
  the last month due to uncertainty about the
  ultimate shape of this CL.

PERFORMANCE

Here's the buried lede, at last. :)

All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop.

First, going from tip to this CL with c=1 has almost no impact.

name        old time/op       new time/op       delta
Template          195ms ± 3%        194ms ± 5%    ~     (p=0.370 n=30+29)
Unicode          86.6ms ± 3%       87.0ms ± 7%    ~     (p=0.958 n=29+30)
GoTypes           548ms ± 3%        555ms ± 4%  +1.35%  (p=0.001 n=30+28)
Compiler          2.51s ± 2%        2.54s ± 2%  +1.17%  (p=0.000 n=28+30)
SSA               5.16s ± 3%        5.16s ± 2%    ~     (p=0.910 n=30+29)
Flate             124ms ± 5%        124ms ± 4%    ~     (p=0.947 n=30+30)
GoParser          146ms ± 3%        146ms ± 3%    ~     (p=0.150 n=29+28)
Reflect           354ms ± 3%        352ms ± 4%    ~     (p=0.096 n=29+29)
Tar               107ms ± 5%        106ms ± 3%    ~     (p=0.370 n=30+29)
XML               200ms ± 4%        201ms ± 4%    ~     (p=0.313 n=29+28)
[Geo mean]        332ms             333ms       +0.10%

name        old user-time/op  new user-time/op  delta
Template          227ms ± 5%        225ms ± 5%    ~     (p=0.457 n=28+27)
Unicode           109ms ± 4%        109ms ± 5%    ~     (p=0.758 n=29+29)
GoTypes           713ms ± 4%        721ms ± 5%    ~     (p=0.051 n=30+29)
Compiler          3.36s ± 2%        3.38s ± 3%    ~     (p=0.146 n=30+30)
SSA               7.46s ± 3%        7.47s ± 3%    ~     (p=0.804 n=30+29)
Flate             146ms ± 7%        147ms ± 3%    ~     (p=0.833 n=29+27)
GoParser          179ms ± 5%        179ms ± 5%    ~     (p=0.866 n=30+30)
Reflect           431ms ± 4%        429ms ± 4%    ~     (p=0.593 n=29+30)
Tar               124ms ± 5%        123ms ± 5%    ~     (p=0.140 n=29+29)
XML               243ms ± 4%        242ms ± 7%    ~     (p=0.404 n=29+29)
[Geo mean]        415ms             415ms       +0.02%

name        old obj-bytes     new obj-bytes     delta
Template           382k ± 0%         382k ± 0%    ~     (all equal)
Unicode            203k ± 0%         203k ± 0%    ~     (all equal)
GoTypes           1.18M ± 0%        1.18M ± 0%    ~     (all equal)
Compiler          3.98M ± 0%        3.98M ± 0%    ~     (all equal)
SSA               8.28M ± 0%        8.28M ± 0%    ~     (all equal)
Flate              230k ± 0%         230k ± 0%    ~     (all equal)
GoParser           287k ± 0%         287k ± 0%    ~     (all equal)
Reflect           1.00M ± 0%        1.00M ± 0%    ~     (all equal)
Tar                190k ± 0%         190k ± 0%    ~     (all equal)
XML                416k ± 0%         416k ± 0%    ~     (all equal)
[Geo mean]         660k              660k       +0.00%

Comparing this CL to itself, from c=1 to c=2
improves real times 20-30%, costs 5-10% more CPU time,
and adds about 2% alloc.
The allocation increase comes from allocating more ssa.Caches.

name       old time/op       new time/op       delta
Template         202ms ± 3%        149ms ± 3%  -26.15%  (p=0.000 n=49+49)
Unicode         87.4ms ± 4%       84.2ms ± 3%   -3.68%  (p=0.000 n=48+48)
GoTypes          560ms ± 2%        398ms ± 2%  -28.96%  (p=0.000 n=49+49)
Compiler         2.46s ± 3%        1.76s ± 2%  -28.61%  (p=0.000 n=48+46)
SSA              6.17s ± 2%        4.04s ± 1%  -34.52%  (p=0.000 n=49+49)
Flate            126ms ± 3%         92ms ± 2%  -26.81%  (p=0.000 n=49+48)
GoParser         148ms ± 4%        107ms ± 2%  -27.78%  (p=0.000 n=49+48)
Reflect          361ms ± 3%        281ms ± 3%  -22.10%  (p=0.000 n=49+49)
Tar              109ms ± 4%         86ms ± 3%  -20.81%  (p=0.000 n=49+47)
XML              204ms ± 3%        144ms ± 2%  -29.53%  (p=0.000 n=48+45)

name       old user-time/op  new user-time/op  delta
Template         246ms ± 9%        246ms ± 4%     ~     (p=0.401 n=50+48)
Unicode          109ms ± 4%        111ms ± 4%   +1.47%  (p=0.000 n=44+50)
GoTypes          728ms ± 3%        765ms ± 3%   +5.04%  (p=0.000 n=46+50)
Compiler         3.33s ± 3%        3.41s ± 2%   +2.31%  (p=0.000 n=49+48)
SSA              8.52s ± 2%        9.11s ± 2%   +6.93%  (p=0.000 n=49+47)
Flate            149ms ± 4%        161ms ± 3%   +8.13%  (p=0.000 n=50+47)
GoParser         181ms ± 5%        192ms ± 2%   +6.40%  (p=0.000 n=49+46)
Reflect          452ms ± 9%        474ms ± 2%   +4.99%  (p=0.000 n=50+48)
Tar              126ms ± 6%        136ms ± 4%   +7.95%  (p=0.000 n=50+49)
XML              247ms ± 5%        264ms ± 3%   +6.94%  (p=0.000 n=48+50)

name       old alloc/op      new alloc/op      delta
Template        38.8MB ± 0%       39.3MB ± 0%   +1.48%  (p=0.008 n=5+5)
Unicode         29.8MB ± 0%       30.2MB ± 0%   +1.19%  (p=0.008 n=5+5)
GoTypes          113MB ± 0%        114MB ± 0%   +0.69%  (p=0.008 n=5+5)
Compiler         443MB ± 0%        447MB ± 0%   +0.95%  (p=0.008 n=5+5)
SSA             1.25GB ± 0%       1.26GB ± 0%   +0.89%  (p=0.008 n=5+5)
Flate           25.3MB ± 0%       25.9MB ± 1%   +2.35%  (p=0.008 n=5+5)
GoParser        31.7MB ± 0%       32.2MB ± 0%   +1.59%  (p=0.008 n=5+5)
Reflect         78.2MB ± 0%       78.9MB ± 0%   +0.91%  (p=0.008 n=5+5)
Tar             26.6MB ± 0%       27.0MB ± 0%   +1.80%  (p=0.008 n=5+5)
XML             42.4MB ± 0%       43.4MB ± 0%   +2.35%  (p=0.008 n=5+5)

name       old allocs/op     new allocs/op     delta
Template          379k ± 0%         378k ± 0%     ~     (p=0.421 n=5+5)
Unicode           322k ± 0%         321k ± 0%     ~     (p=0.222 n=5+5)
GoTypes          1.14M ± 0%        1.14M ± 0%     ~     (p=0.548 n=5+5)
Compiler         4.12M ± 0%        4.11M ± 0%   -0.14%  (p=0.032 n=5+5)
SSA              9.72M ± 0%        9.72M ± 0%     ~     (p=0.421 n=5+5)
Flate             234k ± 1%         234k ± 0%     ~     (p=0.421 n=5+5)
GoParser          316k ± 1%         315k ± 0%     ~     (p=0.222 n=5+5)
Reflect           980k ± 0%         979k ± 0%     ~     (p=0.095 n=5+5)
Tar               249k ± 1%         249k ± 1%     ~     (p=0.841 n=5+5)
XML               392k ± 0%         391k ± 0%     ~     (p=0.095 n=5+5)

From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%:

name       old time/op       new time/op       delta
Template         203ms ± 3%        131ms ± 5%  -35.45%  (p=0.000 n=50+50)
Unicode         87.2ms ± 4%       84.1ms ± 2%   -3.61%  (p=0.000 n=48+47)
GoTypes          560ms ± 4%        310ms ± 2%  -44.65%  (p=0.000 n=50+49)
Compiler         2.47s ± 3%        1.41s ± 2%  -43.10%  (p=0.000 n=50+46)
SSA              6.17s ± 2%        3.20s ± 2%  -48.06%  (p=0.000 n=49+49)
Flate            126ms ± 4%         74ms ± 2%  -41.06%  (p=0.000 n=49+48)
GoParser         148ms ± 4%         89ms ± 3%  -39.97%  (p=0.000 n=49+50)
Reflect          360ms ± 3%        242ms ± 3%  -32.81%  (p=0.000 n=49+49)
Tar              108ms ± 4%         73ms ± 4%  -32.48%  (p=0.000 n=50+49)
XML              203ms ± 3%        119ms ± 3%  -41.56%  (p=0.000 n=49+48)

name       old user-time/op  new user-time/op  delta
Template         246ms ± 9%        287ms ± 9%  +16.98%  (p=0.000 n=50+50)
Unicode          109ms ± 4%        118ms ± 5%   +7.56%  (p=0.000 n=46+50)
GoTypes          735ms ± 4%        806ms ± 2%   +9.62%  (p=0.000 n=50+50)
Compiler         3.34s ± 4%        3.56s ± 2%   +6.78%  (p=0.000 n=49+49)
SSA              8.54s ± 3%       10.04s ± 3%  +17.55%  (p=0.000 n=50+50)
Flate            149ms ± 6%        176ms ± 3%  +17.82%  (p=0.000 n=50+48)
GoParser         181ms ± 5%        213ms ± 3%  +17.47%  (p=0.000 n=50+50)
Reflect          453ms ± 6%        499ms ± 2%  +10.11%  (p=0.000 n=50+48)
Tar              126ms ± 5%        149ms ±11%  +18.76%  (p=0.000 n=50+50)
XML              246ms ± 5%        287ms ± 4%  +16.53%  (p=0.000 n=49+50)

name       old alloc/op      new alloc/op      delta
Template        38.8MB ± 0%       40.4MB ± 0%   +4.21%  (p=0.008 n=5+5)
Unicode         29.8MB ± 0%       30.9MB ± 0%   +3.68%  (p=0.008 n=5+5)
GoTypes          113MB ± 0%        116MB ± 0%   +2.71%  (p=0.008 n=5+5)
Compiler         443MB ± 0%        455MB ± 0%   +2.75%  (p=0.008 n=5+5)
SSA             1.25GB ± 0%       1.27GB ± 0%   +1.84%  (p=0.008 n=5+5)
Flate           25.3MB ± 0%       26.9MB ± 1%   +6.31%  (p=0.008 n=5+5)
GoParser        31.7MB ± 0%       33.2MB ± 0%   +4.61%  (p=0.008 n=5+5)
Reflect         78.2MB ± 0%       80.2MB ± 0%   +2.53%  (p=0.008 n=5+5)
Tar             26.6MB ± 0%       27.9MB ± 0%   +5.19%  (p=0.008 n=5+5)
XML             42.4MB ± 0%       44.6MB ± 0%   +5.20%  (p=0.008 n=5+5)

name       old allocs/op     new allocs/op     delta
Template          380k ± 0%         379k ± 0%   -0.39%  (p=0.032 n=5+5)
Unicode           321k ± 0%         321k ± 0%     ~     (p=0.841 n=5+5)
GoTypes          1.14M ± 0%        1.14M ± 0%     ~     (p=0.421 n=5+5)
Compiler         4.12M ± 0%        4.14M ± 0%   +0.52%  (p=0.008 n=5+5)
SSA              9.72M ± 0%        9.76M ± 0%   +0.37%  (p=0.008 n=5+5)
Flate             234k ± 1%         234k ± 1%     ~     (p=0.690 n=5+5)
GoParser          316k ± 0%         317k ± 1%     ~     (p=0.841 n=5+5)
Reflect           981k ± 0%         981k ± 0%     ~     (p=1.000 n=5+5)
Tar               250k ± 0%         249k ± 1%     ~     (p=0.151 n=5+5)
XML               393k ± 0%         392k ± 0%     ~     (p=0.056 n=5+5)

Going beyond c=4 on my machine tends to increase CPU time and allocs
without impacting real time.

The CPU time numbers matter, because when there are many concurrent
compilation processes, that will impact the overall throughput.

The numbers above are in many ways the best case scenario;
we can take full advantage of all cores.
Fortunately, the most common compilation scenario is incremental
re-compilation of a single package during a build/test cycle.

Updates #15756

Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea
Reviewed-on: https://go-review.googlesource.com/40693
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-27 00:59:07 +00:00
Josh Bleecher Snyder
405a280d01 cmd/internal/obj: eliminate LSym.Version
There were only two versions, 0 and 1,
and the only user of version 1 was the assembler,
to indicate that a symbol was static.

Rename LSym.Version to Static,
and add it to LSym.Attributes.
Simplify call-sites.

Passes toolstash-check.

Change-Id: Iabd39918f5019cce78f381d13f0481ae09f3871f
Reviewed-on: https://go-review.googlesource.com/41201
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-20 21:56:23 +00:00
Josh Bleecher Snyder
6e97c71cb7 cmd/internal/obj: split Link.hash into version 0 and 1
Though LSym.Version is an int, it can only have the value 0 or 1.
Using that, split Link.hash into two maps, one for version 0
(which is far more common) and one for version 1.
This lets use just the name for lookups,
which is both faster and more compact.
This matters because Link.hash map lookups are frequent,
and will be contended once the backend is concurrent.

name        old time/op       new time/op       delta
Template          194ms ± 3%        192ms ± 5%  -1.46%  (p=0.000 n=47+49)
Unicode          84.5ms ± 3%       83.8ms ± 3%  -0.81%  (p=0.011 n=50+49)
GoTypes           543ms ± 2%        545ms ± 4%    ~     (p=0.566 n=46+49)
Compiler          2.48s ± 2%        2.48s ± 3%    ~     (p=0.706 n=47+50)
SSA               5.94s ± 3%        5.98s ± 2%  +0.55%  (p=0.040 n=49+50)
Flate             119ms ± 6%        119ms ± 4%    ~     (p=0.681 n=48+47)
GoParser          145ms ± 4%        145ms ± 3%    ~     (p=0.662 n=47+49)
Reflect           348ms ± 3%        344ms ± 3%  -1.17%  (p=0.000 n=47+47)
Tar               105ms ± 4%        104ms ± 3%    ~     (p=0.155 n=50+47)
XML               197ms ± 2%        197ms ± 3%    ~     (p=0.666 n=49+49)
[Geo mean]        332ms             331ms       -0.37%

name        old user-time/op  new user-time/op  delta
Template          230ms ±10%        226ms ±10%  -1.85%  (p=0.041 n=50+50)
Unicode           104ms ± 6%        103ms ± 5%    ~     (p=0.076 n=49+49)
GoTypes           707ms ± 4%        705ms ± 5%    ~     (p=0.521 n=50+50)
Compiler          3.30s ± 3%        3.33s ± 4%  +0.76%  (p=0.003 n=50+49)
SSA               8.17s ± 4%        8.23s ± 3%  +0.66%  (p=0.030 n=50+49)
Flate             139ms ± 6%        138ms ± 8%    ~     (p=0.184 n=49+48)
GoParser          174ms ± 5%        172ms ± 6%    ~     (p=0.107 n=48+49)
Reflect           431ms ± 8%        420ms ± 5%  -2.57%  (p=0.000 n=50+46)
Tar               119ms ± 6%        118ms ± 7%  -0.95%  (p=0.033 n=50+49)
XML               236ms ± 4%        236ms ± 4%    ~     (p=0.935 n=50+48)
[Geo mean]        410ms             407ms       -0.67%

name        old alloc/op      new alloc/op      delta
Template         38.7MB ± 0%       38.6MB ± 0%  -0.29%  (p=0.008 n=5+5)
Unicode          29.8MB ± 0%       29.7MB ± 0%  -0.24%  (p=0.008 n=5+5)
GoTypes           113MB ± 0%        113MB ± 0%  -0.29%  (p=0.008 n=5+5)
Compiler          462MB ± 0%        462MB ± 0%  -0.12%  (p=0.008 n=5+5)
SSA              1.27GB ± 0%       1.27GB ± 0%  -0.05%  (p=0.008 n=5+5)
Flate            25.2MB ± 0%       25.1MB ± 0%  -0.37%  (p=0.008 n=5+5)
GoParser         31.7MB ± 0%       31.6MB ± 0%    ~     (p=0.056 n=5+5)
Reflect          77.5MB ± 0%       77.2MB ± 0%  -0.38%  (p=0.008 n=5+5)
Tar              26.4MB ± 0%       26.3MB ± 0%    ~     (p=0.151 n=5+5)
XML              41.9MB ± 0%       41.9MB ± 0%  -0.20%  (p=0.032 n=5+5)
[Geo mean]       74.5MB            74.3MB       -0.23%

name        old allocs/op     new allocs/op     delta
Template           378k ± 1%         377k ± 1%    ~     (p=0.690 n=5+5)
Unicode            321k ± 0%         322k ± 0%    ~     (p=0.595 n=5+5)
GoTypes           1.14M ± 0%        1.14M ± 0%    ~     (p=0.310 n=5+5)
Compiler          4.25M ± 0%        4.25M ± 0%    ~     (p=0.151 n=5+5)
SSA               9.84M ± 0%        9.84M ± 0%    ~     (p=0.841 n=5+5)
Flate              232k ± 1%         232k ± 0%    ~     (p=0.690 n=5+5)
GoParser           315k ± 1%         315k ± 1%    ~     (p=0.841 n=5+5)
Reflect            970k ± 0%         970k ± 0%    ~     (p=0.841 n=5+5)
Tar                248k ± 0%         248k ± 1%    ~     (p=0.841 n=5+5)
XML                389k ± 0%         389k ± 0%    ~     (p=1.000 n=5+5)
[Geo mean]         724k              724k       +0.01%

Updates #15756

Change-Id: I2646332e89f0444ca9d5a41d7172537d904ed636
Reviewed-on: https://go-review.googlesource.com/41050
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-20 13:07:00 +00:00
Matthew Dempsky
1e3570ac86 cmd/internal/objabi: extract shared functionality from obj
Now only cmd/asm and cmd/compile depend on cmd/internal/obj. Changing
the assembler backends no longer requires reinstalling cmd/link or
cmd/addr2line.

There's also now one canonical definition of the object file format in
cmd/internal/objabi/doc.go, with a warning to update all three
implementations.

objabi is still something of a grab bag of unrelated code (e.g., flag
and environment variable handling probably belong in a separate "tool"
package), but this is still progress.

Fixes #15165.
Fixes #20026.

Change-Id: Ic4b92fac7d0d35438e0d20c9579aad4085c5534c
Reviewed-on: https://go-review.googlesource.com/40972
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-04-19 00:00:09 +00:00
Matthew Dempsky
1747078695 cmd/internal/obj: un-embed FuncInfo field in LSym
Automated refactoring using github.com/mdempsky/unbed (to rewrite
s.Foo to s.FuncInfo.Foo) and then gorename (to rename the FuncInfo
field to just Func).

Passes toolstash-check -all.

Change-Id: I802c07a1239e0efea058a91a87c5efe12170083a
Reviewed-on: https://go-review.googlesource.com/40670
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-04-18 17:29:50 +00:00
Josh Bleecher Snyder
9d4a84677b cmd/internal/obj: unexport Link.Hash
A prior CL eliminated the last reference to Ctxt.Hash
from the compiler.

Change-Id: Ic97ff84ed1a14e0c93fb0e8ec0b2617c3397c0e8
Reviewed-on: https://go-review.googlesource.com/40699
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-18 02:14:02 +00:00
Josh Bleecher Snyder
da15fe6870 cmd/internal/obj: rework gclocals handling
The compiler handled gcargs and gclocals LSyms unusually.
It generated placeholder symbols (makefuncdatasym),
filled them in, and then renamed them for content-addressability.
This is an important binary size optimization;
the same locals information occurs over and over.

This CL continues to treat these LSyms unusually,
but in a slightly more explicit way,
and importantly for concurrent compilation,
in a way that does not require concurrent
modification of Ctxt.Hash.

Instead of creating gcargs and gclocals in the usual way,
by creating a types.Sym and then an obj.LSym,
we add them directly to obj.FuncInfo,
initialize them in obj.InitTextSym,
and deduplicate and add them to ctxt.Data at the end.
Then the backend's job is simply to fill them in
and rename them appropriately.

Updates #15756

name       old alloc/op      new alloc/op      delta
Template        38.8MB ± 0%       38.7MB ± 0%  -0.22%  (p=0.016 n=5+5)
Unicode         29.8MB ± 0%       29.8MB ± 0%    ~     (p=0.690 n=5+5)
GoTypes          113MB ± 0%        113MB ± 0%  -0.24%  (p=0.008 n=5+5)
SSA             1.25GB ± 0%       1.24GB ± 0%  -0.39%  (p=0.008 n=5+5)
Flate           25.3MB ± 0%       25.2MB ± 0%  -0.43%  (p=0.008 n=5+5)
GoParser        31.7MB ± 0%       31.7MB ± 0%  -0.22%  (p=0.008 n=5+5)
Reflect         78.2MB ± 0%       77.6MB ± 0%  -0.80%  (p=0.008 n=5+5)
Tar             26.6MB ± 0%       26.3MB ± 0%  -0.85%  (p=0.008 n=5+5)
XML             42.4MB ± 0%       41.9MB ± 0%  -1.04%  (p=0.008 n=5+5)

name       old allocs/op     new allocs/op     delta
Template          378k ± 0%         377k ± 1%    ~     (p=0.151 n=5+5)
Unicode           321k ± 1%         321k ± 0%    ~     (p=0.841 n=5+5)
GoTypes          1.14M ± 0%        1.14M ± 0%  -0.47%  (p=0.016 n=5+5)
SSA              9.71M ± 0%        9.67M ± 0%  -0.33%  (p=0.008 n=5+5)
Flate             233k ± 1%         232k ± 1%    ~     (p=0.151 n=5+5)
GoParser          316k ± 0%         315k ± 0%  -0.49%  (p=0.016 n=5+5)
Reflect           979k ± 0%         972k ± 0%  -0.75%  (p=0.008 n=5+5)
Tar               250k ± 0%         247k ± 1%  -0.92%  (p=0.008 n=5+5)
XML               392k ± 1%         389k ± 0%  -0.67%  (p=0.008 n=5+5)

Change-Id: Idc36186ca9d2f8214b5f7720bbc27b6bb22fdc48
Reviewed-on: https://go-review.googlesource.com/40697
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-18 02:13:32 +00:00
Josh Bleecher Snyder
7b5f94e76c cmd/internal/obj: cache dwarfSym
Follow-up to review feedback from
mdempsky on CL 40507.

Reduces mutex contention by about 1%.

Change-Id: I540ea6772925f4a59e58f55a3458eff15880c328
Reviewed-on: https://go-review.googlesource.com/40575
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-13 15:18:54 +00:00
Josh Bleecher Snyder
ce3ee7cdae cmd/internal/obj: stop storing Text flags in From3
Prior to this CL, flags such as NOSPLIT
on ATEXT Progs were stored in From3.Offset.
Some but not all of those flags were also
duplicated into From.Sym.Attribute.

This CL migrates all of those flags into
From.Sym.Attribute and stops creating a From3.

A side-effect of this is that printing an
ATEXT Prog can no longer simply dump From3.Offset.
That's kind of good, since the raw flag value
wasn't very informative anyway, but it did
necessitate a bunch of updates to the cmd/asm tests.

The reason I'm doing this work now is that
avoiding storing flags in both From.Sym and From3.Offset
simplifies some other changes to fix the data
race first described in CL 40254.

This CL almost passes toolstash-check -all.
The only changes are in cases where the assembler
has decided that a function's flags may be altered,
e.g. to make a function with no calls in it NOSPLIT.
Prior to this CL, that information was not printed.

Sample before:

"".Ctz64 t=1 size=63 args=0x10 locals=0x0
	0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35)	TEXT	"".Ctz64(SB), $0-16
	0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35)	FUNCDATA	$0, gclocals·f207267fbf96a0178e8758c6e3e0ce28(SB)

Sample after:

"".Ctz64 t=1 nosplit size=63 args=0x10 locals=0x0
	0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35)	TEXT	"".Ctz64(SB), NOSPLIT, $0-16
	0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35)	FUNCDATA	$0, gclocals·f207267fbf96a0178e8758c6e3e0ce28(SB)

Observe the additional "nosplit" in the first line
and the additional "NOSPLIT" in the second line.

Updates #15756

Change-Id: I5c59bd8f3bdc7c780361f801d94a261f0aef3d13
Reviewed-on: https://go-review.googlesource.com/40495
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-12 21:53:39 +00:00
Josh Bleecher Snyder
49f4b5a4f5 cmd/internal/obj: remove Link.Plan9privates
Move it to the x86 package, matching our handling
of deferreturn in x86 and arm.
While we're here, improve the concurrency safety
of both Plan9privates and deferreturn
by eagerly initializing them in instinit.

Updates #15756

Change-Id: If3b1995c1e4ec816a5443a18f8d715631967a8b1
Reviewed-on: https://go-review.googlesource.com/40408
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-12 15:56:14 +00:00
Josh Bleecher Snyder
f30de83d79 cmd/internal/obj: remove Link.Version
It is zeroed pointlessly and never read.

Change-Id: I65390501a878f545122ec558cb621b91e394a538
Reviewed-on: https://go-review.googlesource.com/40406
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-12 15:54:39 +00:00
Josh Bleecher Snyder
30ddffadd5 cmd/internal/obj: remove Link.Debugdivmod
It is only used once and never written to.
Switch to a local constant instead.

Change-Id: Icdd84e47b81f0de44ad9ed56ab5f4f91df22e6b6
Reviewed-on: https://go-review.googlesource.com/40405
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-12 14:38:15 +00:00
Josh Bleecher Snyder
eacfa59220 cmd/internal/obj: remove dead Link fields
These are unused after CLs 39922, 40252, 40370, 40371, and 40372.

Change-Id: I76f9276c581067a8cb555de761550d960f6e39b8
Reviewed-on: https://go-review.googlesource.com/40404
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-12 14:38:06 +00:00