Commit graph

149 commits

Author SHA1 Message Date
Quim Muntal
a9780d8dfd Revert "runtime: use explicit NOFRAME on darwin/amd64"
This reverts CL 460235.

Reason for revert: This breaks darwin 10 and 11

Change-Id: I3c663ebe3b77eba45a006a3ebec5cabe667faa9e
Reviewed-on: https://go-review.googlesource.com/c/go/+/463635
Auto-Submit: Quim Muntal <quimmuntal@gmail.com>
Run-TryBot: Quim Muntal <quimmuntal@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-01-26 15:41:28 +00:00
qmuntal
cea70301e2 runtime: use explicit NOFRAME on darwin/amd64
This CL marks some darwin assembly functions as NOFRAME to avoid relying
on the implicit amd64 NOFRAME heuristic, where NOSPLIT functions
without stack were also marked as NOFRAME.

Change-Id: I797f3909bcf7f7aad304e4ede820c884231e54f6
Reviewed-on: https://go-review.googlesource.com/c/go/+/460235
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2023-01-26 07:17:26 +00:00
qmuntal
083d94f69c runtime: use explicit NOFRAME on windows/amd64
This CL marks non-leaf nosplit assembly functions as NOFRAME to avoid
relying on the implicit amd64 NOFRAME heuristic, where NOSPLIT functions
without stack were also marked as NOFRAME.

Updates #57302
Updates #40044

Change-Id: Ia4d26f8420dcf2b54528969ffbf40a73f1315d61
Reviewed-on: https://go-review.googlesource.com/c/go/+/459395
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Quim Muntal <quimmuntal@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2023-01-24 19:29:51 +00:00
qmuntal
28f8dbd7b9 runtime,cmd/internal/obj/x86: use TEB TLS slots on windows/i386
This CL redesign how we get the TLS pointer on windows/i386.
It applies the same changes as done in CL 431775 for windows/amd64.

We were previously reading it from the [TEB] arbitrary data slot,
located at 0x14(FS), which can only hold 1 TLS pointer.

With this CL, we will read the TLS pointer from the TEB TLS slot array,
located at 0xE10(GS). The TLS slot array can hold multiple
TLS pointers, up to 64, so multiple Go runtimes running on the
same thread can coexists with different TLS.

Each new TLS slot has to be allocated via [TlsAlloc],
which returns the slot index. This index can then be used to get the
slot offset from GS with the following formula: 0xE10 + index*4.

The slot index is fixed per Go runtime, so we can store it
in runtime.tls_g and use it latter on to read/update the TLS pointer.

Loading the TLS pointer requires the following asm instructions:

  MOVQ runtime.tls_g, AX
  MOVQ AX(FS), AX

Notice that this approach will now be implemented in all the supported
windows arches.

[TEB]: https://en.wikipedia.org/wiki/Win32_Thread_Information_Block
[TlsAlloc]: https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-tlsalloc

Change-Id: If4550b0d44694ee6480d4093b851f4991a088b32
Reviewed-on: https://go-review.googlesource.com/c/go/+/454675
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Quim Muntal <quimmuntal@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-01-23 18:15:35 +00:00
qmuntal
da564d0006 runtime,cmd/internal/obj/x86: use TEB TLS slots on windows/amd64
This CL redesign how we get the TLS pointer on windows/amd64.

We were previously reading it from the [TEB] arbitrary data slot,
located at 0x28(GS), which can only hold 1 TLS pointer.

With this CL, we will read the TLS pointer from the TEB TLS slot array,
located at 0x1480(GS). The TLS slot array can hold multiple
TLS pointers, up to 64, so multiple Go runtimes running on the
same thread can coexists with different TLS.

Each new TLS slot has to be allocated via [TlsAlloc],
which returns the slot index. This index can then be used to get the
slot offset from GS with the following formula: 0x1480 + index*8

The slot index is fixed per Go runtime, so we can store it
in runtime.tls_g and use it latter on to read/update the TLS pointer.

Loading the TLS pointer requires the following asm instructions:

  MOVQ runtime.tls_g, AX
  MOVQ AX(GS), AX

Notice that this approach is also implemented on windows/arm64.

[TEB]: https://en.wikipedia.org/wiki/Win32_Thread_Information_Block
[TlsAlloc]: https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-tlsalloc

Updates #22192

Change-Id: Idea7119fd76a3cd083979a4d57ed64b552fa101b
Reviewed-on: https://go-review.googlesource.com/c/go/+/431775
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Quim Muntal <quimmuntal@gmail.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
2022-11-14 20:43:12 +00:00
Austin Clements
3839b60014 cmd/{asm,compile,internal/obj}: add "maymorestack" support
This adds a debugging hook for optionally calling a "maymorestack"
function in the prologue of any function that might call morestack
(whether it does at run time or not). The maymorestack function will
let us improve lock checking and add debugging modes that stress
function preemption and stack growth.

Passes toolstash-check -all (except on js/wasm, where toolstash
appears to be broken)

Fixes #48297.

Change-Id: I27197947482b329af75dafb9971fc0d3a52eaf31
Reviewed-on: https://go-review.googlesource.com/c/go/+/359795
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-11-05 00:52:06 +00:00
Cherry Mui
c93d5d1a52 [dev.typeparams] all: always enable regabig on AMD64
Always enable regabig on AMD64, which enables the G register and
the X15 zero register. Remove the fallback path.

Also remove the regabig GOEXPERIMENT. On AMD64 it is always
enabled (this CL). Other architectures already have a G register,
except for 386, where there are too few registers and it is
unlikely that we will reserve one. (If we really do, we can just
add a new experiment).

Change-Id: I229cac0060f48fe58c9fdaabd38d6fa16b8a0855
Reviewed-on: https://go-review.googlesource.com/c/go/+/327272
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-06-11 20:52:41 +00:00
Russ Cox
95ed5c3800 internal/buildcfg: move build configuration out of cmd/internal/objabi
The go/build package needs access to this configuration,
so move it into a new package available to the standard library.

Change-Id: I868a94148b52350c76116451f4ad9191246adcff
Reviewed-on: https://go-review.googlesource.com/c/go/+/310731
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Jay Conrod <jayconrod@google.com>
2021-04-16 19:20:53 +00:00
Austin Clements
dba2eab826 runtime,runtime/cgo: save all necessary registers on entry to Go on Windows
There are several assembly functions that transition from the Windows
ABI to the Go ABI. These all need to save all registers that are
callee-save in the Windows ABI and caller-save in the Go ABI and
prepare the register state for Go. However, they all do this slightly
differently and most of them don't save the necessary XMM registers
for this transition (which could corrupt them in the C caller).
Furthermore, now that we have a carefully specified Go ABI, it's clear
that none of these actually get all of the details 100% right.

So, unify this code into two macros in a shared header in
runtime/cgo/abi_amd64.h that handle all necessary registers and setup
and use these macros everywhere on Windows that handles transitions
from C to Go.

Change-Id: I62f41345a507aad1ca383814ac8b7e2a9ffb821e
Reviewed-on: https://go-review.googlesource.com/c/go/+/309769
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2021-04-15 12:38:11 +00:00
Cherry Zhang
5cc5576a9c cmd/compile: untangle Wrapper and ABIWrapper flags
Currently, there are Wrapper and ABIWrapper attributes. Wrapper
is set when compiler generates an wrapper function (e.g. method
wrapper). ABIWrapper is set when compiler generates an ABI
wrapper. It also sets Wrapper flag for ABI wrappers.

Currently, they have the following meanings:
- Wrapper flag hides the frame from (normal) traceback.
- Wrapper flag enables special panic+recover adjustment, so it
  can correctly recover when a wrapper function is deferred.
- ABIWrapper flag disables the panic+recover adjustment, because
  we never defer an ABI wrapper that can recover.

This CL changes them to:
- Both Wrapper and ABIWrapper flags hide the frame from (normal)
  traceback. (Setting one is enough.)
- Wrapper flag enables special panic+recover adjustment.
  ABIWrapper flag no longer has effect on this.

This makes it clearer if we do want an ABI wrapper that also does
the panic+recover adjustment. In the old mechanism we'd have to
unset ABIWrapper flag, even if the function is actually an ABI
wrapper. In the new mechanism we just need to set both ABIWrapper
and Wrapper flags.

Updates #40724.

Change-Id: I7fbc83f85d23676dc94db51dfda63dcacdf1fc19
Reviewed-on: https://go-review.googlesource.com/c/go/+/307235
Trust: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2021-04-05 17:27:18 +00:00
Austin Clements
ef3122e909 cmd/internal/obj/x86: simplify huge frame prologue
For stack frames larger than StackBig, the stack split prologue needs
to guard against potential wraparound. Currently, it carefully
arranges to avoid underflow, but this is complicated and requires a
special check for StackPreempt. StackPreempt is no longer the only
stack poison value, so this check will incorrectly succeed if the
stack bound is poisoned with any other value.

This CL simplifies the logic of the check, reduces its length, and
accounts for any possible poison value by directly checking for
underflow.

Change-Id: I917a313102d6a21895ef7c4b0f304fb84b292c81
Reviewed-on: https://go-review.googlesource.com/c/go/+/307010
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
2021-04-05 16:22:10 +00:00
Austin Clements
2ebe77a2fd cmd/internal/obj: use REGENTRYTMP* in a few more places
There are a few remaining places in obj6 where we hard-code
safe-on-entry registers. Fix those to use the consts.

For #40724.

Change-Id: Ie640521aa67d6c99bc057553dc122160049c6edc
Reviewed-on: https://go-review.googlesource.com/c/go/+/307009
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-04-02 21:59:37 +00:00
Michael Anthony Knyszek
87c6fa4f47 cmd/internal/obj/x86: use ABI scratch registers for WRAPPER prologue
Currently the prologue generated for WRAPPER assembly functions uses BX
and DI, but these are argument registers in the register-based calling
convention. Thus, these end up being clobbered when we want to have an
ABIInternal assembly function.

Define REGENTRYTMP0 and REGENTRYTMP1, aliases for the dedicated function
entry scratch registers R12 and R13, and use those instead.

For #40724.

Change-Id: Ica78c4ccc67a757359900a66b56ef28c83d88b3e
Reviewed-on: https://go-review.googlesource.com/c/go/+/303314
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-03-31 20:28:39 +00:00
Austin Clements
eaa1ddee84 all: explode GOEXPERIMENT=regabi into 5 sub-experiments
This separates GOEXPERIMENT=regabi into five sub-experiments:
regabiwrappers, regabig, regabireflect, regabidefer, and regabiargs.
Setting GOEXPERIMENT=regabi now implies the working subset of these
(currently, regabiwrappers, regabig, and regabireflect).

This simplifies testing, helps derisk the register ABI project,
and will also help with performance comparisons.

This replaces the -abiwrap flag to the compiler and linker with
the regabiwrappers experiment.

As part of this, regabiargs now enables registers for all calls
in the compiler. Previously, this was statically disabled in
regabiEnabledForAllCompilation, but now that we can control it
independently, this isn't necessary.

For #40724.

Change-Id: I5171e60cda6789031f2ef034cc2e7c5d62459122
Reviewed-on: https://go-review.googlesource.com/c/go/+/302070
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2021-03-18 16:51:27 +00:00
Keith Randall
c870e86329 cmd/asm: when dynamic linking, reject code that uses a clobbered R15
The assember uses R15 as scratch space when assembling global variable
references in dynamically linked code. If the assembly code uses the
clobbered value of R15, report an error. The user is probably expecting
some other value in that register.

Getting rid of the R15 use isn't very practical (we could save a
register to a field in the G maybe, but that gets cumbersome).

Fixes #43661

Change-Id: I43f848a3d8b8a28931ec733386b85e6e9a42d8ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/283474
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-03-16 21:18:43 +00:00
David Chase
a2d92b5143 cmd/compile: register abi, morestack work and mole whacking
Morestack works for non-pointer register parameters

Within a function body, pointer-typed parameters are correctly
tracked.

Results still not hooked up.

For #40724.

Change-Id: Icaee0b51d0da54af983662d945d939b756088746
Reviewed-on: https://go-review.googlesource.com/c/go/+/294410
Trust: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-03-04 16:19:12 +00:00
Russ Cox
4dd77bdc91 cmd/asm, cmd/link, runtime: introduce FuncInfo flag bits
The runtime traceback code has its own definition of which functions
mark the top frame of a stack, separate from the TOPFRAME bits that
exist in the assembly and are passed along in DWARF information.
It's error-prone and redundant to have two different sources of truth.
This CL provides the actual TOPFRAME bits to the runtime, so that
the runtime can use those bits instead of reinventing its own category.

This CL also adds a new bit, SPWRITE, which marks functions that
write directly to SP (anything but adding and subtracting constants).
Such functions must stop a traceback, because the traceback has no
way to rederive the SP on entry. Again, the runtime has its own definition
which is mostly correct, but also missing some functions. During ordinary
goroutine context switches, such functions do not appear on the stack,
so the incompleteness in the runtime usually doesn't matter.
But profiling signals can arrive at any moment, and the runtime may
crash during traceback if it attempts to unwind an SP-writing frame
and gets out-of-sync with the actual stack. The runtime contains code
to try to detect likely candidates but again it is incomplete.
Deriving the SPWRITE bit automatically from the actual assembly code
provides the complete truth, and passing it to the runtime lets the
runtime use it.

This CL is part of a stack adding windows/arm64
support (#36439), intended to land in the Go 1.17 cycle.
This CL is, however, not windows/arm64-specific.
It is cleanup meant to make the port (and future ports) easier.

Change-Id: I227f53b23ac5b3dabfcc5e8ee3f00df4e113cf58
Reviewed-on: https://go-review.googlesource.com/c/go/+/288800
Trust: Russ Cox <rsc@golang.org>
Trust: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-19 00:02:30 +00:00
Cherry Zhang
2e60c00f56 [dev.regabi] cmd/internal/obj/x86: use g register in stack bounds check
In ABIInternal context, we can directly use the g register for
stack bounds check.

Change-Id: I8b1073a3343984a6cd76cf5734ddc4a8cd5dc73f
Reviewed-on: https://go-review.googlesource.com/c/go/+/289711
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2021-02-08 17:49:38 +00:00
David Chase
c1370e918f [dev.regabi] cmd/compile: add code to support register ABI spills around morestack calls
This is a selected copy from the register ABI experiment CL, focused
on the files and data structures that handle spilling around morestack.
Unnecessary code from the experiment was removed, other code was adapted.

Would it make sense to leave comments in the experiment as pieces are
brought over?

Experiment CL (for comparison purposes)
https://go-review.googlesource.com/c/go/+/28832

Change-Id: I92136f070351d4fcca1407b52ecf9b80898fed95
Reviewed-on: https://go-review.googlesource.com/c/go/+/279520
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
2021-01-13 15:47:13 +00:00
Than McIntosh
cb28c96be8 [dev.regabi] cmd/compile,cmd/link: initial support for ABI wrappers
Add compiler support for emitting ABI wrappers by creating real IR as
opposed to introducing ABI aliases. At the moment these are "no-op"
wrappers in the sense that they make a simple call (using the existing
ABI) to their target. The assumption here is that once late call
expansion can handle both ABI0 and the "new" ABIInternal (register
version), it can expand the call to do the right thing.

Note that the runtime contains functions that do not strictly follow
the rules of the current Go ABI0; this has been handled in most cases
by treating these as ABIInternal instead (these changes have been made
in previous patches).

Generation of ABI wrappers (as opposed to ABI aliases) is currently
gated by GOEXPERIMENT=regabi -- wrapper generation is on by default if
GOEXPERIMENT=regabi is set and off otherwise (but can be turned on
using "-gcflags=all=-abiwrap -ldflags=-abiwrap"). Wrapper generation
currently only workd on AMD64; explicitly enabling wrapper for other
architectures (via the command line) is not supported.

Also in this patch are a few other command line options for debugging
(tracing and/or limiting wrapper creation). These will presumably go
away at some point.

Updates #27539, #40724.

Change-Id: I1ee3226fc15a3c32ca2087b8ef8e41dbe6df4a75
Reviewed-on: https://go-review.googlesource.com/c/go/+/270863
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Trust: Than McIntosh <thanm@google.com>
2020-12-22 18:13:48 +00:00
Than McIntosh
a313eec386 reflect,runtime: use internal ABI for selected ASM routines, attempt 2
[This is a roll-forward of CL 262319, with a fix for some Darwin test
failures].

Change the definitions of selected runtime assembly routines
from ABI0 (the default) to ABIInternal. The ABIInternal def is
intended to indicate that these functions don't follow the existing Go
runtime ABI. In addition, convert the assembly reference to
runtime.main (from runtime.mainPC) to ABIInternal. Finally, for
functions such as "runtime.duffzero" that are called directly from
generated code, make sure that the compiler looks up the correct
ABI version.

This is intended to support the register abi work, however these
changes should not have any issues even when GOEXPERIMENT=regabi is
not in effect.

Updates #27539, #40724.

Change-Id: Idf507f1c06176073563845239e1a54dad51a9ea9
Reviewed-on: https://go-review.googlesource.com/c/go/+/266638
Trust: Than McIntosh <thanm@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
2020-10-30 17:41:35 +00:00
Russ Cox
912262b806 cmd/internal/obj: move LSym.Func into LSym.Extra
This creates space for a different kind of extension field
in LSym without making the struct any larger.
(There are many LSym, so we care about keeping the struct small.)

Change-Id: Ib16edb9e15f54c2a7351c8b875e19684058711e5
Reviewed-on: https://go-review.googlesource.com/c/go/+/243943
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-16 03:02:36 +00:00
Keith Randall
9e70564f63 cmd/compile,cmd/asm: simplify recording of branch targets, take 2
We currently use two fields to store the targets of branches.
Some phases use p.To.Val, some use p.Pcond. Rewrite so that
every branch instruction uses p.To.Val.
p.From.Val is also used in rare instances.
Introduce a Pool link for use by arm/arm64, instead of
repurposing Pcond.

This is a cleanup CL in preparation for some stack frame CLs.

Change-Id: If8239177e4b1ea2bccd0608eb39553d23210d405
Reviewed-on: https://go-review.googlesource.com/c/go/+/251437
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-08-31 17:36:08 +00:00
Keith Randall
26ad27bb02 Revert "cmd/compile,cmd/asm: simplify recording of branch targets"
This reverts CL 243318.

Reason for revert: Seems to be crashing some builders.

Change-Id: I2ffc59bc5535be60b884b281c8d0eff4647dc756
Reviewed-on: https://go-review.googlesource.com/c/go/+/251169
Reviewed-by: Bryan C. Mills <bcmills@google.com>
2020-08-28 02:10:13 +00:00
Keith Randall
8247da3662 cmd/compile,cmd/asm: simplify recording of branch targets
We currently use two fields to store the targets of branches.
Some phases use p.To.Val, some use p.Pcond. Rewrite so that
every branch instruction uses p.To.Val.
p.From.Val is also used in rare instances.
Introduce a Pool link for use by arm/arm64, instead of
repurposing Pcond.

This is a cleanup CL in preparation for some stack frame CLs.

Change-Id: I9055bf0a1d986aff421e47951a1dedc301c846f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/243318
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-08-27 22:35:45 +00:00
Keith Randall
5c2c6d3fbf runtime: framepointers are no longer an experiment - hard code them
I think they are no longer experimental status. Might as well promote
them to permanent.

Change-Id: Id1259601b3dd2061dd60df86ee48080bfb575d2f
Reviewed-on: https://go-review.googlesource.com/c/go/+/249857
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2020-08-27 21:15:47 +00:00
Tobias Klauser
bffb8818e7 all: fix dead links to inferno-os bitbucket repository
Generated using:

  perl -i -npe 's#inferno-os/src/default#inferno-os/src/master#' $(git grep -l "inferno-os/src/default" | grep -v vendor)

Change-Id: I4b6443bd09a8ea4c8aaeb40a1c73520d1f7ca648
Reviewed-on: https://go-review.googlesource.com/c/go/+/235821
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2020-06-04 07:25:06 +00:00
Cherry Zhang
b2482e4817 cmd/internal/obj: mark split-stack prologue nonpreemptible
When there are both a synchronous preemption request (by
clobbering the stack guard) and an asynchronous one (by signal),
the running goroutine may observe the synchronous request first
in stack bounds check, and go to the path of calling morestack.
If the preemption signal arrives at this point before the call to
morestack, the goroutine will be asynchronously preempted,
entering the scheduler. When it is resumed, the scheduler clears
the preemption request, unclobbers the stack guard. But the
resumed goroutine will still call morestack, as it is already on
its way. morestack will, as there is no preemption request,
double the stack unnecessarily. If this happens multiple times,
the stack may grow too big, although only a small amount is
actually used.

To fix this, we mark the stack bounds check and the call to
morestack async-nonpreemptible, starting after the memory
instruction (mostly a load, on x86 CMP with memory).

Not done for Wasm as it does not support async preemption.

Fixes #35470.

Change-Id: Ibd7f3d935a3649b80f47539116ec9b9556680cf2
Reviewed-on: https://go-review.googlesource.com/c/go/+/207350
Reviewed-by: David Chase <drchase@google.com>
2019-11-27 01:30:19 +00:00
Austin Clements
9d032026d6 cmd/internal/obj/x86: correct pcsp for ADJSP
The x86 assembler supports an "ADJSP" pseudo-op that compiles to an
ADD/SUB from SP. Unfortunately, while this seems perfect for an
instruction that would allow obj to continue to track the SP/FP delta,
obj currently doesn't do that. As a result, FP-relative references
won't work and, perhaps worse, the pcsp table will have the wrong
frame size.

We don't currently use this instruction in any assembly or generate it
in the compiler, but this is a perfect instruction for solving a
problem in #24543.

This CL makes ADJSP useful by incorporating it into the SP delta
logic.

One subtlety is that we do generate ADJSP in obj itself to open a
function's stack frame. Currently, when preprocess enters the loop to
compute the SP delta, it may or may not start at this ADJSP
instruction depending on various factors. We clean this up by instead
always starting the SP delta at 0 and always starting this loop at the
entry to the function.

Why not just recognize ADD/SUB of SP? The danger is that could change
the meaning of existing code. For example, walltime1 in
sys_linux_amd64.s saves SP, SUBs from it, and aligns it. Later, it
restores the saved copy and then does a few FP-relative references.
Currently obj doesn't know any of this is happening, but that's fine
once it gets to the FP-relative references. If we taught obj to
recognize the SUB, it would start to miscompile this code. An
alternative would be to recognize unknown instructions that write to
SP and refuse subsequent FP-relative references, but that's kind of
annoying.

This passes toolstash -cmp for std on both amd64 and 386.

Change-Id: Ic6c6a7cbf980bca904576676c07b44c0aaa9c82d
Reviewed-on: https://go-review.googlesource.com/c/go/+/200877
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-10-25 16:34:10 +00:00
Dan Scales
be64a19d99 cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).

When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.

In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).

I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().

The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).

Cost of defer statement  [ go test -run NONE -bench BenchmarkDefer$ runtime ]
  With normal (stack-allocated) defers only:         35.4  ns/op
  With open-coded defers:                             5.6  ns/op
  Cost of function call alone (remove defer keyword): 4.4  ns/op

Text size increase (including funcdata) for go binary without/with open-coded defers:  0.09%

The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.

The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:

Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
  Without open-coded defers:        62.0 ns/op
  With open-coded defers:           255  ns/op

A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:

CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
  Without open-coded defers:        443 ns/op
  With open-coded defers:           347 ns/op

Updates #14939 (defer performance)
Updates #34481 (design doc)

Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/202340
Reviewed-by: Austin Clements <austin@google.com>
2019-10-24 13:54:11 +00:00
Artem Alekseev
24494440e0 cmd/asm: add missing x86 instructions
Instructions added: CLDEMOTE, CLWB, TPAUSE, UMWAIT, UMONITOR.

Change-Id: I1ba550d4d5acc41a2fd97068ff5834e0412d3bcf
Reviewed-on: https://go-review.googlesource.com/c/go/+/183225
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-17 16:30:15 +00:00
Brad Fitzpatrick
03ef105dae all: remove nacl (part 3, more amd64p32)
Part 1: CL 199499 (GOOS nacl)
Part 2: CL 200077 (amd64p32 files, toolchain)
Part 3: stuff that arguably should've been part of Part 2, but I forgot
        one of my grep patterns when splitting the original CL up into
        two parts.

This one might also have interesting stuff to resurrect for any future
x32 ABI support.

Updates #30439

Change-Id: I2b4143374a253a003666f3c69e776b7e456bdb9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/200318
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-10-10 22:38:38 +00:00
Brad Fitzpatrick
07b4abd62e all: remove the nacl port (part 2, amd64p32 + toolchain)
This is part two if the nacl removal. Part 1 was CL 199499.

This CL removes amd64p32 support, which might be useful in the future
if we implement the x32 ABI. It also removes the nacl bits in the
toolchain, and some remaining nacl bits.

Updates #30439

Change-Id: I2475d5bb066d1b474e00e40d95b520e7c2e286e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/200077
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-10-09 22:34:34 +00:00
Iskander Sharipov
720af3c8c4 cmd/asm: reject BSWAPW on amd64
Since BSWAP operation on 16-bit registers is undefined,
forbid the usage of BSWAPW. Users should rely on XCHGB instead.

This behavior is consistent with what GAS does.

Fixes #29167

Change-Id: I3b31e3dd2acfd039f7564a1c17e6068617bcde8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/174312
Run-TryBot: Iskander Sharipov <quasilyte@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-04-30 21:21:14 +00:00
Keith Randall
48ef01051a cmd/compile: handle new panicindex/slice names in optimizations
These new calls should not prevent NOSPLIT promotion, like the old ones.
These new calls should not prevent racefuncenter/exit removal.

(The latter was already true, as the new calls are not yet lowered
to StaticCalls at the point where racefuncenter/exit removal is done.)

Add tests to make sure we don't regress (again).

Fixes #31219

Change-Id: I3fb6b17cdd32c425829f1e2498defa813a5a9ace
Reviewed-on: https://go-review.googlesource.com/c/go/+/170639
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2019-04-03 21:24:17 +00:00
Elias Naur
1d10b17589 cmd/link/ld,cmd/internal/obj,runtime: make the Android TLS offset dynamic
We're going to need a different TLS offset for Android Q, so the static
offsets used for 386 and amd64 are no longer viable on Android.

Introduce runtime·tls_g and use that for indexing into TLS storage. As
an added benefit, we can then merge the TLS setup code for all android
GOARCHs.

While we're at it, remove a bunch of android special cases no longer
needed.

Updates #29674
Updates #29249 (perhaps fixes it)

Change-Id: I77c7385aec7de8f1f6a4da7c9c79999157e39572
Reviewed-on: https://go-review.googlesource.com/c/go/+/169817
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-03-29 17:16:32 +00:00
Keith Randall
585c9e8412 cmd/compile: implement shifts by signed amounts
Allow shifts by signed amounts. Panic if the shift amount is negative.

TODO: We end up doing two compares per shift, see Ian's comment
https://github.com/golang/go/issues/19113#issuecomment-443241799 that
we could do it with a single comparison in the normal case.

The prove pass mostly handles this code well. For instance, it removes the
<0 check for cases like this:
    if s >= 0 { _ = x << s }
    _ = x << len(a)

This case isn't handled well yet:
    _ = x << (y & 0xf)
I'll do followon CLs for unhandled cases as needed.

Update #19113

R=go1.13

Change-Id: I839a5933d94b54ab04deb9dd5149f32c51c90fa1
Reviewed-on: https://go-review.googlesource.com/c/158719
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2019-02-15 23:13:09 +00:00
Clément Chigot
20ac64a2dd cmd/dist, cmd/link, runtime: fix stack size when cross-compiling aix/ppc64
This commit allows to cross-compiling aix/ppc64. The nosplit limit must
twice as large as on others platforms because of AIX syscalls.
The stack limit, especially stackGuardMultiplier, was set by cmd/dist
during the bootstrap and doesn't depend on GOOS/GOARCH target.

Fixes #29572

Change-Id: Id51e38885e1978d981aa9e14972eaec17294322e
Reviewed-on: https://go-review.googlesource.com/c/157117
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-01-09 22:06:51 +00:00
Austin Clements
02495da6b6 cmd/internal/obj: consolidate emitting entry stack map
The obj package needs to emit the PCDATA to select the entry stack map
before calling morestack. Currently this is copied for every
architecture. Since we're about to change how this works, consolidate
all of these copies into a single helper function.

For #24543.

Change-Id: Ia92d94de78f8e23fd06dba747c43e03e5989f67b
Reviewed-on: https://go-review.googlesource.com/109346
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-05-22 14:43:35 +00:00
David Chase
7bac2a95f6 cmd/compile: plumb prologueEnd into DWARF
This marks the first instruction after the prologue for
consumption by debuggers, specifically Delve, who asked
for it.  gdb appears to ignore it, lldb appears to use it.

The bits for end-of-prologue and beginning-of-epilogue
are added to Pos (reducing maximum line number by 4x, to
1048575).  They're added in cmd/internal/obj/<ARCH>.go
(currently x86 only), so the compiler-proper need not
deal with them.

The linker currently does nothing with beginning-of-epilogue,
but the plumbing exists to make it easier in the future.

This also upgrades the line number table to DWARF version 3.

This CL includes a regression in the coverage for
testdata/i22558.gdb-dbg.nexts, this appears to be a gdb
artifact but the fix would be in the preceding CL in the
stack.

Change-Id: I3bda5f46a0ed232d137ad48f65a14835c742c506
Reviewed-on: https://go-review.googlesource.com/110416
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-05-14 14:12:31 +00:00
Daniel Martí
2722650415 cmd: remove some unused parameters
Change-Id: I9d2a4b8df324897e264d30801e95ddc0f0e75f3a
Reviewed-on: https://go-review.googlesource.com/102337
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-04-02 15:51:31 +00:00
Tim Wright
88129f0cb2 all: enable c-shared/c-archive support for freebsd/amd64
Fixes #14327
Much of the code is based on the linux/amd64 code that implements these
build modes, and code is shared where possible.

Change-Id: Ia510f2023768c0edbc863aebc585929ec593b332
Reviewed-on: https://go-review.googlesource.com/93875
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-21 21:56:20 +00:00
isharipo
b80b4a23d1 cmd/internal/obj/x86: add missing legacy insts
Minimizes the amount of "TODO" stuff in test suite
of cmd/asm/internal/asm/testdata/amd64enc.s.

Some instructions were already implemented, but
test cases for them were commented-out.

Does not enable MMX instructions, calls/jumps and some
segment registers instructions.

-- Affected instructions --
BLENDVPD, BLENDVPS
BSWAPW
CBW
CDQE
CLAC
CLFLUSHOPT
CMPXCHG16B
CRC32B, CRC32L, CRC32W
CWDE
FBLD
FBSTP
FCMOVB
FCMOVBE
FCMOVE
FCMOVNB
FCMOVNBE
FCMOVU
FCOMI
FCOMIP
IMUL3L, IMUL3Q, IMUL3W
ICEBP, INT
INVPCID
LARQ
LGDT, LIDT, LLDT
LMSW
LTR
LZCNTL, LZCNTQ, LZCNTW
MONITOR
MOVBELL, MOVBEQQ, MOVBEWW
MOVBQZX
MOVQ
MOVSWW, MOVZWW
MWAIT
NOPL, NOPW
PBLENDVB
PEXTRW
RDPKRU
RDRANDL, RDRANDQ, RDRANDW
RDSEEDL, RDSEEDQ, RDSEEDW
RDTSCP
SAHF
SGDT, SIDT
SLDTL, SLDTQ, SLDTW
SMSWL, SMSWQ, SMSWW
STAC
STRL, STRQ, STRW
SYSENTER, SYSENTER64
SYSEXIT, SYSEXIT64
SHA256RNDS2
TZCNTL, TZCNTQ, TZCNTW
UD1, UD2
WRPKRU
XRSTOR, XRSTOR64
XRSTORS, XRSTORS64
XSAVE, XSAVE64
XSAVEC, XSAVEC64
XSAVEOPT, XSAVEOPT64
XSAVES, XSAVES64
XSETBV

Fixes #6739

Change-Id: I8b125d9a5ea39bb4b9da7e66a63a16f609cef376
Reviewed-on: https://go-review.googlesource.com/97235
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-02-27 21:55:25 +00:00
Austin Clements
9b331189c1 cmd/internal/obj/x86: adjust SP correctly for tail calls
Currently, tail calls on x86 don't adjust the SP on return, so it's
important that the compiler produce a zero-sized frame and disable the
frame pointer. However, these constraints aren't necessary. For
example, on other architectures it's generally necessary to restore
the saved LR before a tail call, so obj simply makes this work.
Likewise, on x86, there's no reason we can't simply make this work.

Hence, this CL adjusts the compiler to use the same tail call
convention for x86 that we use on LR machines by producing a RET with
a target, rather than a JMP with a target. In fact, obj already
understands this convention for x86 except that it's buggy with
non-zero frame sizes. So we also fix this bug obj. As a result of
these fixes, the compiler no longer needs to mark wrappers as
NoFramePointer since it's now perfectly fine to save the frame
pointer.

In fact, this eliminates the only use of NoFramePointer in the
compiler, which will enable further cleanups.

This also fixes what is very nearly, but not quite, a code generation
bug. NoFramePointer becomes obj.NOFRAME in the object file, which on
ppc64 and s390x means to omit the saved LR. Hence, on these
architectures, NoFramePointer (and NOFRAME) is only safe to set on
leaf functions. However, on *most* architectures, wrappers aren't
necessarily leaf functions because they may call DUFFZERO. We're saved
on ppc64 and s390x only because the compiler doesn't have the rules to
produce DUFFZERO calls on these architectures. Hence, this only works
because the set of LR architectures that implement NOFRAME is disjoint
from the set where the compiler produces DUFFZERO operations. (I
discovered this whole mess when I attempted to add NOFRAME support to
arm.)

Change-Id: Icc589aeb86beacb850d0a6a80bd3024974a33947
Reviewed-on: https://go-review.googlesource.com/92035
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-12 21:41:19 +00:00
Than McIntosh
c09ee9d1ce cmd/compile: fix buglet/typo in DWARF x86 setup
Fix typo in DWARF register config for GOOARCH=x86; was
picking up the AMD64 set, should have been selecting
x86 set.

Change-Id: I9a4c6f1378baf3cb2f0ad8d60f3ee2f24cd5dc91
Reviewed-on: https://go-review.googlesource.com/77990
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2017-11-16 13:32:24 +00:00
isharipo
8c67f210a1 cmd/internal/obj: change Prog.From3 to RestArgs ([]Addr)
This change makes it easier to express instructions
with arbitrary number of operands.

Rationale: previous approach with operand "hiding" does
not scale well, AVX and especially AVX512 have many
instructions with 3+ operands.

x86 asm backend is updated to handle up to 6 explicit operands.
It also fixes issue with 4-th immediate operand type checks.
All `ytab` tables are updated accordingly.

Changes to non-x86 backends only include these patterns:
`p.From3 = X` => `p.SetFrom3(X)`
`p.From3.X = Y` => `p.GetFrom3().X = Y`

Over time, other backends can adapt Prog.RestArgs
and reduce the amount of workarounds.

-- Performance --

x/benchmark/build:

$ benchstat upstream.bench patched.bench
name      old time/op                 new time/op                 delta
Build-48                  21.7s ± 2%                  21.8s ± 2%   ~     (p=0.218 n=10+10)

name      old binary-size             new binary-size             delta
Build-48                  10.3M ± 0%                  10.3M ± 0%   ~     (all equal)

name      old build-time/op           new build-time/op           delta
Build-48                  21.7s ± 2%                  21.8s ± 2%   ~     (p=0.218 n=10+10)

name      old build-peak-RSS-bytes    new build-peak-RSS-bytes    delta
Build-48                  145MB ± 5%                  148MB ± 5%   ~     (p=0.218 n=10+10)

name      old build-user+sys-time/op  new build-user+sys-time/op  delta
Build-48                  21.0s ± 2%                  21.2s ± 2%   ~     (p=0.075 n=10+10)

Microbenchmark shows a slight slowdown.

name        old time/op  new time/op  delta
AMD64asm-4  49.5ms ± 1%  49.9ms ± 1%  +0.67%  (p=0.001 n=23+15)

func BenchmarkAMD64asm(b *testing.B) {
  for i := 0; i < b.N; i++ {
    TestAMD64EndToEnd(nil)
    TestAMD64Encoder(nil)
  }
}

Change-Id: I4f1d37b5c2c966da3f2127705ccac9bff0038183
Reviewed-on: https://go-review.googlesource.com/63490
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-09-15 21:05:03 +00:00
isharipo
4074e4e5be cmd/asm: add amd64 CLFLUSH instruction
This is the last instruction I found missing in SSE2 set.

It does not reuse 'yprefetch' ytabs due to differences in
operands SRC/DST roles:
- PREFETCHx: ModRM:r/m(r) -> FROM
- CLFLUSH:   ModRM:r/m(w) -> TO

unaryDst map is extended accordingly.

Change-Id: I89e34ebb81cc0ee5f9ebbb1301bad417f7ee437f
Reviewed-on: https://go-review.googlesource.com/56833
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-09-06 15:37:00 +00:00
Daniel Martí
99da8730b0 all: remove some double spaces from comments
Went mainly for the ones that make no sense, such as the ones
mid-sentence or after commas.

Change-Id: Ie245d2c19cc7428a06295635cf6a9482ade25ff0
Reviewed-on: https://go-review.googlesource.com/57293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-26 15:09:09 +00:00
Heschi Kreinick
4c54a047c6 [dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.

After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.

There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:

  type myint int
  var a int
  b := myint(int)
and
  b := (*uint64)(unsafe.Pointer(a))

don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.

Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.

A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.

TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.

go build -toolexec 'toolstash -cmp' -a std succeeds.

With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after:  06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean  /tmp/220352263 /tmp/621364410
completed   15 of   15, estimated time remaining 0s (eta 3:52PM)
name        old time/op       new time/op       delta
Template          199ms ± 3%        198ms ± 2%     ~     (p=0.400 n=15+14)
Unicode          96.6ms ± 5%       96.4ms ± 5%     ~     (p=0.838 n=15+15)
GoTypes           653ms ± 2%        647ms ± 2%     ~     (p=0.102 n=15+14)
Flate             133ms ± 6%        129ms ± 3%   -2.62%  (p=0.041 n=15+15)
GoParser          164ms ± 5%        159ms ± 3%   -3.05%  (p=0.000 n=15+15)
Reflect           428ms ± 4%        422ms ± 3%     ~     (p=0.156 n=15+13)
Tar               123ms ±10%        124ms ± 8%     ~     (p=0.461 n=15+15)
XML               228ms ± 3%        224ms ± 3%   -1.57%  (p=0.045 n=15+15)
[Geo mean]        206ms             377ms       +82.86%

name        old user-time/op  new user-time/op  delta
Template          292ms ±10%        301ms ±12%     ~     (p=0.189 n=15+15)
Unicode           166ms ±37%        158ms ±14%     ~     (p=0.418 n=15+14)
GoTypes           962ms ± 6%        963ms ± 7%     ~     (p=0.976 n=15+15)
Flate             207ms ±19%        200ms ±14%     ~     (p=0.345 n=14+15)
GoParser          246ms ±22%        240ms ±15%     ~     (p=0.587 n=15+15)
Reflect           611ms ±13%        587ms ±14%     ~     (p=0.085 n=15+13)
Tar               211ms ±12%        217ms ±14%     ~     (p=0.355 n=14+15)
XML               335ms ±15%        320ms ±18%     ~     (p=0.169 n=15+15)
[Geo mean]        317ms             583ms       +83.72%

name        old alloc/op      new alloc/op      delta
Template         40.2MB ± 0%       40.2MB ± 0%   -0.15%  (p=0.000 n=14+15)
Unicode          29.2MB ± 0%       29.3MB ± 0%     ~     (p=0.624 n=15+15)
GoTypes           114MB ± 0%        114MB ± 0%   -0.15%  (p=0.000 n=15+14)
Flate            25.7MB ± 0%       25.6MB ± 0%   -0.18%  (p=0.000 n=13+15)
GoParser         32.2MB ± 0%       32.2MB ± 0%   -0.14%  (p=0.003 n=15+15)
Reflect          77.8MB ± 0%       77.9MB ± 0%     ~     (p=0.061 n=15+15)
Tar              27.1MB ± 0%       27.0MB ± 0%   -0.11%  (p=0.029 n=15+15)
XML              42.7MB ± 0%       42.5MB ± 0%   -0.29%  (p=0.000 n=15+15)
[Geo mean]       42.1MB            75.0MB       +78.05%

name        old allocs/op     new allocs/op     delta
Template           402k ± 1%         398k ± 0%   -0.91%  (p=0.000 n=15+15)
Unicode            344k ± 1%         344k ± 0%     ~     (p=0.715 n=15+14)
GoTypes           1.18M ± 0%        1.17M ± 0%   -0.91%  (p=0.000 n=15+14)
Flate              243k ± 0%         240k ± 1%   -1.05%  (p=0.000 n=13+15)
GoParser           327k ± 1%         324k ± 1%   -0.96%  (p=0.000 n=15+15)
Reflect            984k ± 1%         982k ± 0%     ~     (p=0.050 n=15+15)
Tar                261k ± 1%         259k ± 1%   -0.77%  (p=0.000 n=15+15)
XML                411k ± 0%         404k ± 1%   -1.55%  (p=0.000 n=15+15)
[Geo mean]         439k              755k       +72.01%

name        old text-bytes    new text-bytes    delta
HelloSize         694kB ± 0%        694kB ± 0%   -0.00%  (p=0.000 n=15+15)

name        old data-bytes    new data-bytes    delta
HelloSize        5.55kB ± 0%       5.55kB ± 0%     ~     (all equal)

name        old bss-bytes     new bss-bytes     delta
HelloSize         133kB ± 0%        133kB ± 0%     ~     (all equal)

name        old exe-bytes     new exe-bytes     delta
HelloSize        1.04MB ± 0%       1.04MB ± 0%     ~     (all equal)

Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-27 20:19:44 +00:00
Michael Hudson-Doyle
1f85d3ad09 cmd/internal/obj/x86: use LEAx rather than ADDx when calling DUFFxxxx via GOT
DUFFZERO on 386 is not marked as clobbering flags, but rewriteToUseGot rewrote
"ADUFFZERO $offset" to "MOVL runtime.duffxxx@GOT, CX; ADDL $offset, CX; CALL CX"
which does. Luckily the fix is easier than figuring out what the problem was:
replace the ADDL $offset, CX with LEAL $offset(CX), CX.

On amd64 DUFFZERO clobbers flags, on arm, arm64 and ppc64 ADD does not clobber
flags and s390x does not use the duff functions, so I'm fairly confident this
is the only fix required.

I don't know how to write a test though.

Change-Id: I69b0958f5f45771d61db5f5ecb4ded94e8960d4d
Reviewed-on: https://go-review.googlesource.com/41821
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-01 18:57:35 +00:00