This CL instructs the Go x86 compiler to load the frame pointer address
using a MOV instead of a LEA instruction, being MOV 1 byte shorter:
Before
55 PUSHQ BP
48 8d 2c 24 LEAQ 0(SP), BP
After
55 PUSHQ BP
48 89 e5 MOVQ SP, BP
This reduces the size of the Go toolchain ~0.06%.
Updates #6853
Change-Id: I5557cf34c47e871d264ba0deda9b78338681a12c
Reviewed-on: https://go-review.googlesource.com/c/go/+/463845
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Quim Muntal <quimmuntal@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This CL changes how the x86 compiler stores and loads the frame pointer
on each function prologue and epilogue, with the goal to reduce the
final binary size without affecting performance.
The compiler is currently using MOV instructions to load and store BP,
which can take from 5 to 8 bytes each.
This CL changes this approach so it emits PUSH/POP instructions instead,
which always take only 1 byte each (when operating with BP). It can also
avoid using the SUBQ/ADDQ to grow the stack for functions that have
frame pointer but does not have local variables.
On Windows, this CL reduces the go toolchain size from 15,697,920 bytes
to 15,584,768 bytes, a reduction of 0.7%.
Example of epilog and prologue for a function with 0x10 bytes of
local variables:
Before
===
SUBQ $0x18, SP
MOVQ BP, 0x10(SP)
LEAQ 0x10(SP), BP
... function body ...
MOVQ 0x10(SP), BP
ADDQ $0x18, SP
RET
===
After
===
PUSHQ BP
LEAQ 0(SP), BP
SUBQ $0x10, SP
... function body ...
MOVQ ADDQ $0x10, SP
POPQ BP
RET
===
Updates #6853
Change-Id: Ice9e14bbf8dff083c5f69feb97e9a764c3ca7785
Reviewed-on: https://go-review.googlesource.com/c/go/+/462300
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Quim Muntal <quimmuntal@gmail.com>
This CL marks some darwin assembly functions as NOFRAME to avoid relying
on the implicit amd64 NOFRAME heuristic, where NOSPLIT functions
without stack were also marked as NOFRAME.
Change-Id: I797f3909bcf7f7aad304e4ede820c884231e54f6
Reviewed-on: https://go-review.googlesource.com/c/go/+/460235
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
This CL marks non-leaf nosplit assembly functions as NOFRAME to avoid
relying on the implicit amd64 NOFRAME heuristic, where NOSPLIT functions
without stack were also marked as NOFRAME.
Updates #57302
Updates #40044
Change-Id: Ia4d26f8420dcf2b54528969ffbf40a73f1315d61
Reviewed-on: https://go-review.googlesource.com/c/go/+/459395
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Quim Muntal <quimmuntal@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
This CL redesign how we get the TLS pointer on windows/i386.
It applies the same changes as done in CL 431775 for windows/amd64.
We were previously reading it from the [TEB] arbitrary data slot,
located at 0x14(FS), which can only hold 1 TLS pointer.
With this CL, we will read the TLS pointer from the TEB TLS slot array,
located at 0xE10(GS). The TLS slot array can hold multiple
TLS pointers, up to 64, so multiple Go runtimes running on the
same thread can coexists with different TLS.
Each new TLS slot has to be allocated via [TlsAlloc],
which returns the slot index. This index can then be used to get the
slot offset from GS with the following formula: 0xE10 + index*4.
The slot index is fixed per Go runtime, so we can store it
in runtime.tls_g and use it latter on to read/update the TLS pointer.
Loading the TLS pointer requires the following asm instructions:
MOVQ runtime.tls_g, AX
MOVQ AX(FS), AX
Notice that this approach will now be implemented in all the supported
windows arches.
[TEB]: https://en.wikipedia.org/wiki/Win32_Thread_Information_Block
[TlsAlloc]: https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-tlsalloc
Change-Id: If4550b0d44694ee6480d4093b851f4991a088b32
Reviewed-on: https://go-review.googlesource.com/c/go/+/454675
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Quim Muntal <quimmuntal@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This CL redesign how we get the TLS pointer on windows/amd64.
We were previously reading it from the [TEB] arbitrary data slot,
located at 0x28(GS), which can only hold 1 TLS pointer.
With this CL, we will read the TLS pointer from the TEB TLS slot array,
located at 0x1480(GS). The TLS slot array can hold multiple
TLS pointers, up to 64, so multiple Go runtimes running on the
same thread can coexists with different TLS.
Each new TLS slot has to be allocated via [TlsAlloc],
which returns the slot index. This index can then be used to get the
slot offset from GS with the following formula: 0x1480 + index*8
The slot index is fixed per Go runtime, so we can store it
in runtime.tls_g and use it latter on to read/update the TLS pointer.
Loading the TLS pointer requires the following asm instructions:
MOVQ runtime.tls_g, AX
MOVQ AX(GS), AX
Notice that this approach is also implemented on windows/arm64.
[TEB]: https://en.wikipedia.org/wiki/Win32_Thread_Information_Block
[TlsAlloc]: https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-tlsalloc
Updates #22192
Change-Id: Idea7119fd76a3cd083979a4d57ed64b552fa101b
Reviewed-on: https://go-review.googlesource.com/c/go/+/431775
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Quim Muntal <quimmuntal@gmail.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
This adds a debugging hook for optionally calling a "maymorestack"
function in the prologue of any function that might call morestack
(whether it does at run time or not). The maymorestack function will
let us improve lock checking and add debugging modes that stress
function preemption and stack growth.
Passes toolstash-check -all (except on js/wasm, where toolstash
appears to be broken)
Fixes#48297.
Change-Id: I27197947482b329af75dafb9971fc0d3a52eaf31
Reviewed-on: https://go-review.googlesource.com/c/go/+/359795
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Always enable regabig on AMD64, which enables the G register and
the X15 zero register. Remove the fallback path.
Also remove the regabig GOEXPERIMENT. On AMD64 it is always
enabled (this CL). Other architectures already have a G register,
except for 386, where there are too few registers and it is
unlikely that we will reserve one. (If we really do, we can just
add a new experiment).
Change-Id: I229cac0060f48fe58c9fdaabd38d6fa16b8a0855
Reviewed-on: https://go-review.googlesource.com/c/go/+/327272
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
The go/build package needs access to this configuration,
so move it into a new package available to the standard library.
Change-Id: I868a94148b52350c76116451f4ad9191246adcff
Reviewed-on: https://go-review.googlesource.com/c/go/+/310731
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Jay Conrod <jayconrod@google.com>
There are several assembly functions that transition from the Windows
ABI to the Go ABI. These all need to save all registers that are
callee-save in the Windows ABI and caller-save in the Go ABI and
prepare the register state for Go. However, they all do this slightly
differently and most of them don't save the necessary XMM registers
for this transition (which could corrupt them in the C caller).
Furthermore, now that we have a carefully specified Go ABI, it's clear
that none of these actually get all of the details 100% right.
So, unify this code into two macros in a shared header in
runtime/cgo/abi_amd64.h that handle all necessary registers and setup
and use these macros everywhere on Windows that handles transitions
from C to Go.
Change-Id: I62f41345a507aad1ca383814ac8b7e2a9ffb821e
Reviewed-on: https://go-review.googlesource.com/c/go/+/309769
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Currently, there are Wrapper and ABIWrapper attributes. Wrapper
is set when compiler generates an wrapper function (e.g. method
wrapper). ABIWrapper is set when compiler generates an ABI
wrapper. It also sets Wrapper flag for ABI wrappers.
Currently, they have the following meanings:
- Wrapper flag hides the frame from (normal) traceback.
- Wrapper flag enables special panic+recover adjustment, so it
can correctly recover when a wrapper function is deferred.
- ABIWrapper flag disables the panic+recover adjustment, because
we never defer an ABI wrapper that can recover.
This CL changes them to:
- Both Wrapper and ABIWrapper flags hide the frame from (normal)
traceback. (Setting one is enough.)
- Wrapper flag enables special panic+recover adjustment.
ABIWrapper flag no longer has effect on this.
This makes it clearer if we do want an ABI wrapper that also does
the panic+recover adjustment. In the old mechanism we'd have to
unset ABIWrapper flag, even if the function is actually an ABI
wrapper. In the new mechanism we just need to set both ABIWrapper
and Wrapper flags.
Updates #40724.
Change-Id: I7fbc83f85d23676dc94db51dfda63dcacdf1fc19
Reviewed-on: https://go-review.googlesource.com/c/go/+/307235
Trust: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Austin Clements <austin@google.com>
For stack frames larger than StackBig, the stack split prologue needs
to guard against potential wraparound. Currently, it carefully
arranges to avoid underflow, but this is complicated and requires a
special check for StackPreempt. StackPreempt is no longer the only
stack poison value, so this check will incorrectly succeed if the
stack bound is poisoned with any other value.
This CL simplifies the logic of the check, reduces its length, and
accounts for any possible poison value by directly checking for
underflow.
Change-Id: I917a313102d6a21895ef7c4b0f304fb84b292c81
Reviewed-on: https://go-review.googlesource.com/c/go/+/307010
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
There are a few remaining places in obj6 where we hard-code
safe-on-entry registers. Fix those to use the consts.
For #40724.
Change-Id: Ie640521aa67d6c99bc057553dc122160049c6edc
Reviewed-on: https://go-review.googlesource.com/c/go/+/307009
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Currently the prologue generated for WRAPPER assembly functions uses BX
and DI, but these are argument registers in the register-based calling
convention. Thus, these end up being clobbered when we want to have an
ABIInternal assembly function.
Define REGENTRYTMP0 and REGENTRYTMP1, aliases for the dedicated function
entry scratch registers R12 and R13, and use those instead.
For #40724.
Change-Id: Ica78c4ccc67a757359900a66b56ef28c83d88b3e
Reviewed-on: https://go-review.googlesource.com/c/go/+/303314
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This separates GOEXPERIMENT=regabi into five sub-experiments:
regabiwrappers, regabig, regabireflect, regabidefer, and regabiargs.
Setting GOEXPERIMENT=regabi now implies the working subset of these
(currently, regabiwrappers, regabig, and regabireflect).
This simplifies testing, helps derisk the register ABI project,
and will also help with performance comparisons.
This replaces the -abiwrap flag to the compiler and linker with
the regabiwrappers experiment.
As part of this, regabiargs now enables registers for all calls
in the compiler. Previously, this was statically disabled in
regabiEnabledForAllCompilation, but now that we can control it
independently, this isn't necessary.
For #40724.
Change-Id: I5171e60cda6789031f2ef034cc2e7c5d62459122
Reviewed-on: https://go-review.googlesource.com/c/go/+/302070
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
The assember uses R15 as scratch space when assembling global variable
references in dynamically linked code. If the assembly code uses the
clobbered value of R15, report an error. The user is probably expecting
some other value in that register.
Getting rid of the R15 use isn't very practical (we could save a
register to a field in the G maybe, but that gets cumbersome).
Fixes#43661
Change-Id: I43f848a3d8b8a28931ec733386b85e6e9a42d8ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/283474
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Morestack works for non-pointer register parameters
Within a function body, pointer-typed parameters are correctly
tracked.
Results still not hooked up.
For #40724.
Change-Id: Icaee0b51d0da54af983662d945d939b756088746
Reviewed-on: https://go-review.googlesource.com/c/go/+/294410
Trust: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The runtime traceback code has its own definition of which functions
mark the top frame of a stack, separate from the TOPFRAME bits that
exist in the assembly and are passed along in DWARF information.
It's error-prone and redundant to have two different sources of truth.
This CL provides the actual TOPFRAME bits to the runtime, so that
the runtime can use those bits instead of reinventing its own category.
This CL also adds a new bit, SPWRITE, which marks functions that
write directly to SP (anything but adding and subtracting constants).
Such functions must stop a traceback, because the traceback has no
way to rederive the SP on entry. Again, the runtime has its own definition
which is mostly correct, but also missing some functions. During ordinary
goroutine context switches, such functions do not appear on the stack,
so the incompleteness in the runtime usually doesn't matter.
But profiling signals can arrive at any moment, and the runtime may
crash during traceback if it attempts to unwind an SP-writing frame
and gets out-of-sync with the actual stack. The runtime contains code
to try to detect likely candidates but again it is incomplete.
Deriving the SPWRITE bit automatically from the actual assembly code
provides the complete truth, and passing it to the runtime lets the
runtime use it.
This CL is part of a stack adding windows/arm64
support (#36439), intended to land in the Go 1.17 cycle.
This CL is, however, not windows/arm64-specific.
It is cleanup meant to make the port (and future ports) easier.
Change-Id: I227f53b23ac5b3dabfcc5e8ee3f00df4e113cf58
Reviewed-on: https://go-review.googlesource.com/c/go/+/288800
Trust: Russ Cox <rsc@golang.org>
Trust: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
In ABIInternal context, we can directly use the g register for
stack bounds check.
Change-Id: I8b1073a3343984a6cd76cf5734ddc4a8cd5dc73f
Reviewed-on: https://go-review.googlesource.com/c/go/+/289711
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
This is a selected copy from the register ABI experiment CL, focused
on the files and data structures that handle spilling around morestack.
Unnecessary code from the experiment was removed, other code was adapted.
Would it make sense to leave comments in the experiment as pieces are
brought over?
Experiment CL (for comparison purposes)
https://go-review.googlesource.com/c/go/+/28832
Change-Id: I92136f070351d4fcca1407b52ecf9b80898fed95
Reviewed-on: https://go-review.googlesource.com/c/go/+/279520
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
Add compiler support for emitting ABI wrappers by creating real IR as
opposed to introducing ABI aliases. At the moment these are "no-op"
wrappers in the sense that they make a simple call (using the existing
ABI) to their target. The assumption here is that once late call
expansion can handle both ABI0 and the "new" ABIInternal (register
version), it can expand the call to do the right thing.
Note that the runtime contains functions that do not strictly follow
the rules of the current Go ABI0; this has been handled in most cases
by treating these as ABIInternal instead (these changes have been made
in previous patches).
Generation of ABI wrappers (as opposed to ABI aliases) is currently
gated by GOEXPERIMENT=regabi -- wrapper generation is on by default if
GOEXPERIMENT=regabi is set and off otherwise (but can be turned on
using "-gcflags=all=-abiwrap -ldflags=-abiwrap"). Wrapper generation
currently only workd on AMD64; explicitly enabling wrapper for other
architectures (via the command line) is not supported.
Also in this patch are a few other command line options for debugging
(tracing and/or limiting wrapper creation). These will presumably go
away at some point.
Updates #27539, #40724.
Change-Id: I1ee3226fc15a3c32ca2087b8ef8e41dbe6df4a75
Reviewed-on: https://go-review.googlesource.com/c/go/+/270863
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Trust: Than McIntosh <thanm@google.com>
[This is a roll-forward of CL 262319, with a fix for some Darwin test
failures].
Change the definitions of selected runtime assembly routines
from ABI0 (the default) to ABIInternal. The ABIInternal def is
intended to indicate that these functions don't follow the existing Go
runtime ABI. In addition, convert the assembly reference to
runtime.main (from runtime.mainPC) to ABIInternal. Finally, for
functions such as "runtime.duffzero" that are called directly from
generated code, make sure that the compiler looks up the correct
ABI version.
This is intended to support the register abi work, however these
changes should not have any issues even when GOEXPERIMENT=regabi is
not in effect.
Updates #27539, #40724.
Change-Id: Idf507f1c06176073563845239e1a54dad51a9ea9
Reviewed-on: https://go-review.googlesource.com/c/go/+/266638
Trust: Than McIntosh <thanm@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
This creates space for a different kind of extension field
in LSym without making the struct any larger.
(There are many LSym, so we care about keeping the struct small.)
Change-Id: Ib16edb9e15f54c2a7351c8b875e19684058711e5
Reviewed-on: https://go-review.googlesource.com/c/go/+/243943
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
We currently use two fields to store the targets of branches.
Some phases use p.To.Val, some use p.Pcond. Rewrite so that
every branch instruction uses p.To.Val.
p.From.Val is also used in rare instances.
Introduce a Pool link for use by arm/arm64, instead of
repurposing Pcond.
This is a cleanup CL in preparation for some stack frame CLs.
Change-Id: If8239177e4b1ea2bccd0608eb39553d23210d405
Reviewed-on: https://go-review.googlesource.com/c/go/+/251437
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This reverts CL 243318.
Reason for revert: Seems to be crashing some builders.
Change-Id: I2ffc59bc5535be60b884b281c8d0eff4647dc756
Reviewed-on: https://go-review.googlesource.com/c/go/+/251169
Reviewed-by: Bryan C. Mills <bcmills@google.com>
We currently use two fields to store the targets of branches.
Some phases use p.To.Val, some use p.Pcond. Rewrite so that
every branch instruction uses p.To.Val.
p.From.Val is also used in rare instances.
Introduce a Pool link for use by arm/arm64, instead of
repurposing Pcond.
This is a cleanup CL in preparation for some stack frame CLs.
Change-Id: I9055bf0a1d986aff421e47951a1dedc301c846f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/243318
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
I think they are no longer experimental status. Might as well promote
them to permanent.
Change-Id: Id1259601b3dd2061dd60df86ee48080bfb575d2f
Reviewed-on: https://go-review.googlesource.com/c/go/+/249857
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
When there are both a synchronous preemption request (by
clobbering the stack guard) and an asynchronous one (by signal),
the running goroutine may observe the synchronous request first
in stack bounds check, and go to the path of calling morestack.
If the preemption signal arrives at this point before the call to
morestack, the goroutine will be asynchronously preempted,
entering the scheduler. When it is resumed, the scheduler clears
the preemption request, unclobbers the stack guard. But the
resumed goroutine will still call morestack, as it is already on
its way. morestack will, as there is no preemption request,
double the stack unnecessarily. If this happens multiple times,
the stack may grow too big, although only a small amount is
actually used.
To fix this, we mark the stack bounds check and the call to
morestack async-nonpreemptible, starting after the memory
instruction (mostly a load, on x86 CMP with memory).
Not done for Wasm as it does not support async preemption.
Fixes#35470.
Change-Id: Ibd7f3d935a3649b80f47539116ec9b9556680cf2
Reviewed-on: https://go-review.googlesource.com/c/go/+/207350
Reviewed-by: David Chase <drchase@google.com>
The x86 assembler supports an "ADJSP" pseudo-op that compiles to an
ADD/SUB from SP. Unfortunately, while this seems perfect for an
instruction that would allow obj to continue to track the SP/FP delta,
obj currently doesn't do that. As a result, FP-relative references
won't work and, perhaps worse, the pcsp table will have the wrong
frame size.
We don't currently use this instruction in any assembly or generate it
in the compiler, but this is a perfect instruction for solving a
problem in #24543.
This CL makes ADJSP useful by incorporating it into the SP delta
logic.
One subtlety is that we do generate ADJSP in obj itself to open a
function's stack frame. Currently, when preprocess enters the loop to
compute the SP delta, it may or may not start at this ADJSP
instruction depending on various factors. We clean this up by instead
always starting the SP delta at 0 and always starting this loop at the
entry to the function.
Why not just recognize ADD/SUB of SP? The danger is that could change
the meaning of existing code. For example, walltime1 in
sys_linux_amd64.s saves SP, SUBs from it, and aligns it. Later, it
restores the saved copy and then does a few FP-relative references.
Currently obj doesn't know any of this is happening, but that's fine
once it gets to the FP-relative references. If we taught obj to
recognize the SUB, it would start to miscompile this code. An
alternative would be to recognize unknown instructions that write to
SP and refuse subsequent FP-relative references, but that's kind of
annoying.
This passes toolstash -cmp for std on both amd64 and 386.
Change-Id: Ic6c6a7cbf980bca904576676c07b44c0aaa9c82d
Reviewed-on: https://go-review.googlesource.com/c/go/+/200877
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).
When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.
In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).
I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().
The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).
Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ]
With normal (stack-allocated) defers only: 35.4 ns/op
With open-coded defers: 5.6 ns/op
Cost of function call alone (remove defer keyword): 4.4 ns/op
Text size increase (including funcdata) for go binary without/with open-coded defers: 0.09%
The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.
The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:
Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
Without open-coded defers: 62.0 ns/op
With open-coded defers: 255 ns/op
A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:
CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
Without open-coded defers: 443 ns/op
With open-coded defers: 347 ns/op
Updates #14939 (defer performance)
Updates #34481 (design doc)
Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/202340
Reviewed-by: Austin Clements <austin@google.com>
Part 1: CL 199499 (GOOS nacl)
Part 2: CL 200077 (amd64p32 files, toolchain)
Part 3: stuff that arguably should've been part of Part 2, but I forgot
one of my grep patterns when splitting the original CL up into
two parts.
This one might also have interesting stuff to resurrect for any future
x32 ABI support.
Updates #30439
Change-Id: I2b4143374a253a003666f3c69e776b7e456bdb9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/200318
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This is part two if the nacl removal. Part 1 was CL 199499.
This CL removes amd64p32 support, which might be useful in the future
if we implement the x32 ABI. It also removes the nacl bits in the
toolchain, and some remaining nacl bits.
Updates #30439
Change-Id: I2475d5bb066d1b474e00e40d95b520e7c2e286e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/200077
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Since BSWAP operation on 16-bit registers is undefined,
forbid the usage of BSWAPW. Users should rely on XCHGB instead.
This behavior is consistent with what GAS does.
Fixes#29167
Change-Id: I3b31e3dd2acfd039f7564a1c17e6068617bcde8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/174312
Run-TryBot: Iskander Sharipov <quasilyte@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
These new calls should not prevent NOSPLIT promotion, like the old ones.
These new calls should not prevent racefuncenter/exit removal.
(The latter was already true, as the new calls are not yet lowered
to StaticCalls at the point where racefuncenter/exit removal is done.)
Add tests to make sure we don't regress (again).
Fixes#31219
Change-Id: I3fb6b17cdd32c425829f1e2498defa813a5a9ace
Reviewed-on: https://go-review.googlesource.com/c/go/+/170639
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
We're going to need a different TLS offset for Android Q, so the static
offsets used for 386 and amd64 are no longer viable on Android.
Introduce runtime·tls_g and use that for indexing into TLS storage. As
an added benefit, we can then merge the TLS setup code for all android
GOARCHs.
While we're at it, remove a bunch of android special cases no longer
needed.
Updates #29674
Updates #29249 (perhaps fixes it)
Change-Id: I77c7385aec7de8f1f6a4da7c9c79999157e39572
Reviewed-on: https://go-review.googlesource.com/c/go/+/169817
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Allow shifts by signed amounts. Panic if the shift amount is negative.
TODO: We end up doing two compares per shift, see Ian's comment
https://github.com/golang/go/issues/19113#issuecomment-443241799 that
we could do it with a single comparison in the normal case.
The prove pass mostly handles this code well. For instance, it removes the
<0 check for cases like this:
if s >= 0 { _ = x << s }
_ = x << len(a)
This case isn't handled well yet:
_ = x << (y & 0xf)
I'll do followon CLs for unhandled cases as needed.
Update #19113
R=go1.13
Change-Id: I839a5933d94b54ab04deb9dd5149f32c51c90fa1
Reviewed-on: https://go-review.googlesource.com/c/158719
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
This commit allows to cross-compiling aix/ppc64. The nosplit limit must
twice as large as on others platforms because of AIX syscalls.
The stack limit, especially stackGuardMultiplier, was set by cmd/dist
during the bootstrap and doesn't depend on GOOS/GOARCH target.
Fixes#29572
Change-Id: Id51e38885e1978d981aa9e14972eaec17294322e
Reviewed-on: https://go-review.googlesource.com/c/157117
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The obj package needs to emit the PCDATA to select the entry stack map
before calling morestack. Currently this is copied for every
architecture. Since we're about to change how this works, consolidate
all of these copies into a single helper function.
For #24543.
Change-Id: Ia92d94de78f8e23fd06dba747c43e03e5989f67b
Reviewed-on: https://go-review.googlesource.com/109346
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This marks the first instruction after the prologue for
consumption by debuggers, specifically Delve, who asked
for it. gdb appears to ignore it, lldb appears to use it.
The bits for end-of-prologue and beginning-of-epilogue
are added to Pos (reducing maximum line number by 4x, to
1048575). They're added in cmd/internal/obj/<ARCH>.go
(currently x86 only), so the compiler-proper need not
deal with them.
The linker currently does nothing with beginning-of-epilogue,
but the plumbing exists to make it easier in the future.
This also upgrades the line number table to DWARF version 3.
This CL includes a regression in the coverage for
testdata/i22558.gdb-dbg.nexts, this appears to be a gdb
artifact but the fix would be in the preceding CL in the
stack.
Change-Id: I3bda5f46a0ed232d137ad48f65a14835c742c506
Reviewed-on: https://go-review.googlesource.com/110416
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Fixes#14327
Much of the code is based on the linux/amd64 code that implements these
build modes, and code is shared where possible.
Change-Id: Ia510f2023768c0edbc863aebc585929ec593b332
Reviewed-on: https://go-review.googlesource.com/93875
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Currently, tail calls on x86 don't adjust the SP on return, so it's
important that the compiler produce a zero-sized frame and disable the
frame pointer. However, these constraints aren't necessary. For
example, on other architectures it's generally necessary to restore
the saved LR before a tail call, so obj simply makes this work.
Likewise, on x86, there's no reason we can't simply make this work.
Hence, this CL adjusts the compiler to use the same tail call
convention for x86 that we use on LR machines by producing a RET with
a target, rather than a JMP with a target. In fact, obj already
understands this convention for x86 except that it's buggy with
non-zero frame sizes. So we also fix this bug obj. As a result of
these fixes, the compiler no longer needs to mark wrappers as
NoFramePointer since it's now perfectly fine to save the frame
pointer.
In fact, this eliminates the only use of NoFramePointer in the
compiler, which will enable further cleanups.
This also fixes what is very nearly, but not quite, a code generation
bug. NoFramePointer becomes obj.NOFRAME in the object file, which on
ppc64 and s390x means to omit the saved LR. Hence, on these
architectures, NoFramePointer (and NOFRAME) is only safe to set on
leaf functions. However, on *most* architectures, wrappers aren't
necessarily leaf functions because they may call DUFFZERO. We're saved
on ppc64 and s390x only because the compiler doesn't have the rules to
produce DUFFZERO calls on these architectures. Hence, this only works
because the set of LR architectures that implement NOFRAME is disjoint
from the set where the compiler produces DUFFZERO operations. (I
discovered this whole mess when I attempted to add NOFRAME support to
arm.)
Change-Id: Icc589aeb86beacb850d0a6a80bd3024974a33947
Reviewed-on: https://go-review.googlesource.com/92035
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Fix typo in DWARF register config for GOOARCH=x86; was
picking up the AMD64 set, should have been selecting
x86 set.
Change-Id: I9a4c6f1378baf3cb2f0ad8d60f3ee2f24cd5dc91
Reviewed-on: https://go-review.googlesource.com/77990
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
This change makes it easier to express instructions
with arbitrary number of operands.
Rationale: previous approach with operand "hiding" does
not scale well, AVX and especially AVX512 have many
instructions with 3+ operands.
x86 asm backend is updated to handle up to 6 explicit operands.
It also fixes issue with 4-th immediate operand type checks.
All `ytab` tables are updated accordingly.
Changes to non-x86 backends only include these patterns:
`p.From3 = X` => `p.SetFrom3(X)`
`p.From3.X = Y` => `p.GetFrom3().X = Y`
Over time, other backends can adapt Prog.RestArgs
and reduce the amount of workarounds.
-- Performance --
x/benchmark/build:
$ benchstat upstream.bench patched.bench
name old time/op new time/op delta
Build-48 21.7s ± 2% 21.8s ± 2% ~ (p=0.218 n=10+10)
name old binary-size new binary-size delta
Build-48 10.3M ± 0% 10.3M ± 0% ~ (all equal)
name old build-time/op new build-time/op delta
Build-48 21.7s ± 2% 21.8s ± 2% ~ (p=0.218 n=10+10)
name old build-peak-RSS-bytes new build-peak-RSS-bytes delta
Build-48 145MB ± 5% 148MB ± 5% ~ (p=0.218 n=10+10)
name old build-user+sys-time/op new build-user+sys-time/op delta
Build-48 21.0s ± 2% 21.2s ± 2% ~ (p=0.075 n=10+10)
Microbenchmark shows a slight slowdown.
name old time/op new time/op delta
AMD64asm-4 49.5ms ± 1% 49.9ms ± 1% +0.67% (p=0.001 n=23+15)
func BenchmarkAMD64asm(b *testing.B) {
for i := 0; i < b.N; i++ {
TestAMD64EndToEnd(nil)
TestAMD64Encoder(nil)
}
}
Change-Id: I4f1d37b5c2c966da3f2127705ccac9bff0038183
Reviewed-on: https://go-review.googlesource.com/63490
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
This is the last instruction I found missing in SSE2 set.
It does not reuse 'yprefetch' ytabs due to differences in
operands SRC/DST roles:
- PREFETCHx: ModRM:r/m(r) -> FROM
- CLFLUSH: ModRM:r/m(w) -> TO
unaryDst map is extended accordingly.
Change-Id: I89e34ebb81cc0ee5f9ebbb1301bad417f7ee437f
Reviewed-on: https://go-review.googlesource.com/56833
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Went mainly for the ones that make no sense, such as the ones
mid-sentence or after commas.
Change-Id: Ie245d2c19cc7428a06295635cf6a9482ade25ff0
Reviewed-on: https://go-review.googlesource.com/57293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>