When doing a tail call the link register is live as the callee
will directly return to the caller (of the function that does the
tail call). Don't allocate or clobber the link register.
Fixes#49032.
Change-Id: I2d60f2354e5b6c14aa285c8983a9786687b90223
Reviewed-on: https://go-review.googlesource.com/c/go/+/358435
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
For certain type of method wrappers we used to generate a tail
call. That was disabled in CL 307234 when register ABI is used,
because with the current IR it was difficult to generate a tail
call with the arguments in the right places. The problem was that
the IR does not contain a CALL-like node with arguments; instead,
it contains an OAS node that adjusts the receiver, than an
OTAILCALL node that just contains the target, but no argument
(with the assumption that the OAS node will put the adjusted
receiver in the right place). With register ABI, putting
arguments in registers are done in SSA. The assignment (OAS)
doesn't put the receiver in register.
This CL changes the IR of a tail call to take an actual OCALL
node. Specifically, a tail call is represented as
OTAILCALL (OCALL target args...)
This way, the call target and args are connected through the OCALL
node. So the call can be analyzed in SSA and the args can be passed
in the right places.
(Alternatively, we could have OTAILCALL node directly take the
target and the args, without the OCALL node. Using an OCALL node is
convenient as there are existing code that processes OCALL nodes
which do not need to be changed. Also, a tail call is similar to
ORETURN (OCALL target args...), except it doesn't preserve the
frame. I did the former but I'm open to change.)
The SSA representation is similar. Previously, the IR lowers to
a Store the receiver then a BlockRetJmp which jumps to the target
(without putting the arg in register). Now we use a TailCall op,
which takes the target and the args. The call expansion pass and
the register allocator handles TailCall pretty much like a
StaticCall, and it will do the right ABI analysis and put the args
in the right places. (Args other than the receiver are already in
the right places. For register args it generates no code for them.
For stack args currently it generates a self copy. I'll work on
optimize that out.) BlockRetJmp is still used, signaling it is a
tail call. The actual call is made in the TailCall op so
BlockRetJmp generates no code (we could use BlockExit if we like).
This slightly reduces binary size:
old new
cmd/go 14003088 13953936
cmd/link 6275552 6271456
Change-Id: I2d16d8d419fe1f17554916d317427383e17e27f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/350145
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: David Chase <drchase@google.com>
Looks like CL 322850 didn't have the change to ARM64Ops.go
properly gofmt'ed.
Change-Id: I1a080bc13ea27b897fbb91f18ded754ce440994b
Reviewed-on: https://go-review.googlesource.com/c/go/+/324109
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Now they take variable number of args.
Change-Id: I49c8bce9c3a403947eac03e397ae264a8f4fdd2c
Reviewed-on: https://go-review.googlesource.com/c/go/+/323929
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Define the registers.
They are not really enabled for now. Otherwise the compiler will
start using them for go:registerparams functions and it is not
fully working. Some test will fail.
Now we can compile a simple Add function with registerparams
(with registers enabled).
Change-Id: Ifdfac931052c0196096a1dd8b0687b5fdedb14d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/322850
Trust: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
For arm64 constant shift instructions, e.g., LSL(immediate), they use
only the low 6 bits. To conform the semantics of the hardware instructions,
this CL comments in ARM64OPS.go about restricted AuxInt ranges for the
various instructions involved.
Change-Id: I2b6560d6580e22ba7cbfa744a02b046dd5714b8a
Reviewed-on: https://go-review.googlesource.com/c/go/+/303569
Trust: fannie zhang <Fannie.Zhang@arm.com>
Reviewed-by: Keith Randall <khr@golang.org>
On ARM64, (external) linker generated trampoline may clobber R16
and R17. In CL 183842 we change Duff's devices not to use those
registers. However, this is not enough. The register allocator
also needs to know that these registers may be clobbered in any
calls that don't follow the standard Go calling convention. This
include Duff's devices and the write barrier.
Fixes#32773, second attempt.
Change-Id: Ia52a891d9bbb8515c927617dd53aee5af5bd9aa4
Reviewed-on: https://go-review.googlesource.com/c/go/+/184437
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Meng Zhuo <mzh@golangcn.org>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Meng Zhuo <mzh@golangcn.org>
Optimize some patterns into rev16/rev16w instruction.
Pattern1:
(c & 0xff00ff00)>>8 | (c & 0x00ff00ff)<<8
To:
rev16w c
Pattern2:
(c & 0xff00ff00ff00ff00)>>8 | (c & 0x00ff00ff00ff00ff)<<8
To:
rev16 c
This patch is a copy of CL 239637, contributed by Alice Xu(dianhong.xu@arm.com).
Change-Id: I96936c1db87618bc1903c04221c7e9b2779455b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/268377
Trust: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This CL adds rewrite rules for CSETM, CSINC, CSINV, and CSNEG. By adding
these rules, we can save one instruction.
For example,
func test(cond bool, a int) int {
if cond {
a++
}
return a
}
Before:
MOVD "".a+8(RSP), R0
ADD $1, R0, R1
MOVBU "".cond(RSP), R2
CMPW $0, R2
CSEL NE, R1, R0, R0
After:
MOVBU "".cond(RSP), R0
CMPW $0, R0
MOVD "".a+8(RSP), R0
CSINC EQ, R0, R0, R0
This patch is a copy of CL 285694. Co-authored-by: JunchenLi
<junchen.li@arm.com>
Change-Id: Ic1a79e8b8ece409b533becfcb7950f11e7b76f24
Reviewed-on: https://go-review.googlesource.com/c/go/+/302231
Trust: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Add generic rule to rewrite the single-precision square root expression
with one single-precision instruction. The optimization will reduce two
times of precision converting between double-precision and single-precision.
On arm64 flatform.
previous:
FCVTSD F0, F0
FSQRTD F0, F0
FCVTDS F0, F0
optimized:
FSQRTS S0, S0
And this patch adds the test case to check the correctness.
This patch refers to CL 241877, contributed by Alice Xu
(dianhong.xu@arm.com)
Change-Id: I6de5d02281c693017ac4bd4c10963dd55989bd7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/276873
Trust: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Consider the following example,
func test(a, b float64, x uint64) uint64 {
if a < b {
x = 0
}
return x
}
func main() {
fmt.Println(test(1, math.NaN(), 123))
}
The output is 0, but the expectation is 123.
This is because the rewrite rule
(CSEL [cc] (MOVDconst [0]) y flag) => (CSEL0 [arm64Negate(cc)] y flag)
converts
FCMP NaN, 1
CSEL MI, 0, 123, R0 // if 1 < NaN then R0 = 0 else R0 = 123
to
FCMP NaN, 1
CSEL GE, 123, 0, R0 // if 1 >= NaN then R0 = 123 else R0 = 0
But both 1 < NaN and 1 >= NaN are false. So the output is 0, not 123.
The root cause is arm64Negate not handle negation of floating comparison
correctly. According to the ARM manual, the meaning of MI, GE, and PL
are
MI: Less than
GE: Greater than or equal to
PL: Greater than, equal to, or unordered
Because NaN cannot be compared with other numbers, the result of such
comparison is unordered. So when NaN is involved, unlike integer, the
result of !(a < b) is not a >= b, it is a >= b || a is NaN || b is NaN.
This is exactly what PL means. We add NotLessThanF to represent PL. Then
the negation of LessThanF is NotLessThanF rather than GreaterEqualF. The
same reason for the other floating comparison operations.
Fixes#43619
Change-Id: Ia511b0027ad067436bace9fbfd261dbeaae01bcd
Reviewed-on: https://go-review.googlesource.com/c/go/+/283572
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Keith Randall <khr@golang.org>
These are identical to And8 and Or8, just using LDAXRW/STLXRW instead of
LDAXRB/STLXRB.
Change-Id: I5308832ae165064550bee4bb245809ab952f4cc8
Reviewed-on: https://go-review.googlesource.com/c/go/+/263148
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This is prerequisite to moving call expansion later into SSA,
and probably a good idea anyway. Passes tests.
This is the first minimal CL that does a 1-for-1 substitution
of *ssa.AuxCall for *obj.LSym. Next step (next CL) is to make
this change for all calls so that additional information can
be stored in AuxCall.
Change-Id: Ia3a7715648fd9fb1a176850767a726e6f5b959eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/237680
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The current implementation stores the comparison pseudo-ops of arm64
conditional instructions (CSEL/CSEL0) in Aux, this patch modifies it
and stores it in AuxInt, which can avoid the allocation.
Change-Id: I0b69e51f63acd84c6878c6a59ccf6417501a8cfc
Reviewed-on: https://go-review.googlesource.com/c/go/+/252517
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
These operations are async unsafe on architectures that use
frame pointers.
The reason is they rely on data being safe when stored below the stack
pointer. They do:
45da69: 48 89 6c 24 f0 mov %rbp,-0x10(%rsp)
45da6e: 48 8d 6c 24 f0 lea -0x10(%rsp),%rbp
45da73: e8 7d d0 ff ff callq 45aaf5 <runtime.duffzero+0x115>
45da78: 48 8b 6d 00 mov 0x0(%rbp),%rbp
This dance ensures that inside duffzero, it looks like there is a
proper frame pointer set up, so that stack walkbacks work correctly if
the kernel samples during duffzero.
However, this instruction sequence depends on data not being clobbered
even though it is below the stack pointer.
If there is an async interrupt at any of those last 3 instructions,
and the interrupt decides to insert a call to asyncPreempt, then the
saved frame pointer on the stack gets clobbered. The last instruction
above then restores junk to the frame pointer.
To prevent this, mark these instructions as async unsafe.
(The body of duffzero is already async unsafe, as it is in package runtime.)
Change-Id: I5562e82f9f5bd2fb543dcf2b6b9133d87ff83032
Reviewed-on: https://go-review.googlesource.com/c/go/+/248261
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Fixes the *noov opcodes so they handle a constant argument properly.
Most of the infrastructure for this CL is in CL 238077 (the arm32 one).
Fixes#39505
Change-Id: Id424a4e18964b848f05aa42f4d78e5f2e2cdf43b
Reviewed-on: https://go-review.googlesource.com/c/go/+/237999
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Introduces a few casts, mostly to fix rules that mix int64 and int32
off1 and off2.
Passes
GOARCH=arm64 gotip build -toolexec 'toolstash -cmp' -a std
Change-Id: I1ec75211f3bb8e521dcc5217cf29ab0655a84d79
Reviewed-on: https://go-review.googlesource.com/c/go/+/230840
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This CL changes the arm64 TBZ/TBNZ block from using Aux to using
a (typed) AuxInt. The corresponding rules have also been changed
to be typed.
Passes
GOARCH=arm64 gotip build -toolexec 'toolstash -cmp' -a std
Change-Id: I98d0cd2a791948f1db13259c17fb1b9b2807a043
Reviewed-on: https://go-review.googlesource.com/c/go/+/230839
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
PanicBounds and PanicExtend are lowered to runtime calls (with a
non-Go ABI), but are not currently marked as calls. Since liveness
analysis only emits stack maps at calls in the runtime, this means
these panic call sites in the runtime won't get a stack map. These
almost immediately turn into throws in the runtime, but there's still
a chance they'll try to grow the stack first, which would lead to a
different panic.
To fix this, mark these operations as calls.
Outside the runtime, we currently emit stack maps for everything that
isn't an unsafe-point, so these panic calls get stack maps by default.
However, we're about to move to emitting stack maps only at call
sites, at which point this will start to matter outside the runtime as
well.
I confirmed that this has no effect on anything but PCDATA/FUNCDATA in
runtime and net/http.
For #36365.
Change-Id: Ic5bb463fd152cc320c815dc04cf62005261ae169
Reviewed-on: https://go-review.googlesource.com/c/go/+/230539
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The goal here is improved AuxInt printing in ssa.html.
Instead of displaying an inscrutable encoded integer,
it displays something like
v25 (28) = UBFX <int> [lsb=4,width=8] v52
which is much nicer for debugging.
Change-Id: I40713ff7f4a857c4557486cdf73c2dff137511ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/221420
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This CL adds support of call injection and async preemption on
ARM64.
There seems no way to return from the injected call without
clobbering *any* register. So we have to clobber one, which is
chosen to be REGTMP. Previous CLs have marked code sequences
that use REGTMP async-nonpreemtible.
Change-Id: Ieca4e3ba5557adf3d0f5d923bce5f1769b58e30b
Reviewed-on: https://go-review.googlesource.com/c/go/+/203461
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Introduce a mechanism for marking architecture-specific Ops
unsafe. And mark ones that use REGTMP on ARM64, as for async
preemption we will be using REGTMP as a temporary register in the
injected call.
Change-Id: I8ff22e87d8f9cb10d02a2f0af7c12ad6d7d58f54
Reviewed-on: https://go-review.googlesource.com/c/go/+/203459
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Austin Clements <austin@google.com>
For #10958, #24543, but makes sense on its own.
Change-Id: I2a87dab66b82a1863e4b6512b1f8def51463ce2a
Reviewed-on: https://go-review.googlesource.com/c/go/+/203284
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Currently we use R16 and R17 for ARM64's Duff's devices.
According to ARM64 ABI, R16 and R17 can be used by the (external)
linker as scratch registers in trampolines. So don't use these
registers to pass information across functions.
It seems unlikely that calling Duff's devices would need a
trampoline in normal cases. But it could happen if the call
target is out of the 128 MB direct jump limit.
The choice of R20 and R21 is kind of arbitrary. The register
allocator allocates from low-numbered registers. High numbered
registers are chosen so it is unlikely to hold a live value and
forces a spill.
Fixes#32773.
Change-Id: Id22d555b5afeadd4efcf62797d1580d641c39218
Reviewed-on: https://go-review.googlesource.com/c/go/+/183842
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
This CL instrinsifies Add64 with arm64 instruction sequence ADDS, ADCS
and ADC, and optimzes the case of carry chains.The CL also changes the
test code so that the intrinsic implementation can be tested.
Benchmarks:
name old time/op new time/op delta
Add-224 2.500000ns +- 0% 2.090000ns +- 4% -16.40% (p=0.000 n=9+10)
Add32-224 2.500000ns +- 0% 2.500000ns +- 0% ~ (all equal)
Add64-224 2.500000ns +- 0% 1.577778ns +- 2% -36.89% (p=0.000 n=10+9)
Add64multiple-224 6.000000ns +- 0% 2.000000ns +- 0% -66.67% (p=0.000 n=10+10)
Change-Id: I6ee91c9a85c16cc72ade5fd94868c579f16c7615
Reviewed-on: https://go-review.googlesource.com/c/go/+/159017
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
A few examples (for accessing a slice of length 3):
s[-1] runtime error: index out of range [-1]
s[3] runtime error: index out of range [3] with length 3
s[-1:0] runtime error: slice bounds out of range [-1:]
s[3:0] runtime error: slice bounds out of range [3:0]
s[3:-1] runtime error: slice bounds out of range [:-1]
s[3:4] runtime error: slice bounds out of range [:4] with capacity 3
s[0:3:4] runtime error: slice bounds out of range [::4] with capacity 3
Note that in cases where there are multiple things wrong with the
indexes (e.g. s[3:-1]), we report one of those errors kind of
arbitrarily, currently the rightmost one.
An exhaustive set of examples is in issue30116[u].out in the CL.
The message text has the same prefix as the old message text. That
leads to slightly awkward phrasing but hopefully minimizes the chance
that code depending on the error text will break.
Increases the size of the go binary by 0.5% (amd64). The panic functions
take arguments in registers in order to keep the size of the compiled code
as small as possible.
Fixes#30116
Change-Id: Idb99a827b7888822ca34c240eca87b7e44a04fdd
Reviewed-on: https://go-review.googlesource.com/c/go/+/161477
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Current compiler reverses operands to work around NaN in
"less than" and "less equal than" comparisons. But if we
want to use "FCMPD/FCMPS $(0.0), Fn" to do some optimization,
the workaround way does not work. Because assembler does
not support instruction "FCMPD/FCMPS Fn, $(0.0)".
This CL sets condition flags for floating-point comparisons
to resolve this problem.
Change-Id: Ia48076a1da95da64596d6e68304018cb301ebe33
Reviewed-on: https://go-review.googlesource.com/c/go/+/164718
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
math.RoundToEven can be done by one arm64 instruction FRINTND, intrinsify it to improve performance.
The current pure Go implementation of the function Abs is translated into five instructions on arm64:
str, ldr, and, str, ldr. The intrinsic implementation requires only one instruction, so in terms of
performance, intrinsify it is worthwhile.
Benchmarks:
name old time/op new time/op delta
Abs-8 3.50ns ± 0% 1.50ns ± 0% -57.14% (p=0.000 n=10+10)
RoundToEven-8 9.26ns ± 0% 1.50ns ± 0% -83.80% (p=0.000 n=10+10)
Change-Id: I9456b26ab282b544dfac0154fc86f17aed96ac3d
Reviewed-on: https://go-review.googlesource.com/116535
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Add some rules to match the Go code like:
y &= 63
x << y | x >> (64-y)
or
y &= 63
x >> y | x << (64-y)
as a ROR instruction. Make math/bits.RotateLeft faster on arm64.
Extends CL 132435 to arm64.
Benchmarks of math/bits.RotateLeftxxN:
name old time/op new time/op delta
RotateLeft-8 3.548750ns +- 1% 2.003750ns +- 0% -43.54% (p=0.000 n=8+8)
RotateLeft8-8 3.925000ns +- 0% 3.925000ns +- 0% ~ (p=1.000 n=8+8)
RotateLeft16-8 3.925000ns +- 0% 3.927500ns +- 0% ~ (p=0.608 n=8+8)
RotateLeft32-8 3.925000ns +- 0% 2.002500ns +- 0% -48.98% (p=0.000 n=8+8)
RotateLeft64-8 3.536250ns +- 0% 2.003750ns +- 0% -43.34% (p=0.000 n=8+8)
Change-Id: I77622cd7f39b917427e060647321f5513973232c
Reviewed-on: https://go-review.googlesource.com/122542
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Some indexed load/store rules lack of type information, and this
CL adds that for them.
Change-Id: Icac315ccb83a2f5bf30b056d4667d5b59eb4e5e2
Reviewed-on: https://go-review.googlesource.com/128455
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
ARMv8.1 has added new instruction (LDADDAL) for atomic memory operations. This
CL improves existing atomic add intrinsics with the new instruction. Since the
new instruction is only guaranteed to be present after ARMv8.1, we guard its
usage with a conditional on CPU feature.
Performance result on ARMv8.1 machine:
name old time/op new time/op delta
Xadd-224 1.05µs ± 6% 0.02µs ± 4% -98.06% (p=0.000 n=10+8)
Xadd64-224 1.05µs ± 3% 0.02µs ±13% -98.10% (p=0.000 n=9+10)
[Geo mean] 1.05µs 0.02µs -98.08%
Performance result on ARMv8.0 machine:
name old time/op new time/op delta
Xadd-46 538ns ± 1% 541ns ± 1% +0.62% (p=0.000 n=9+9)
Xadd64-46 505ns ± 1% 508ns ± 0% +0.48% (p=0.003 n=9+8)
[Geo mean] 521ns 524ns +0.55%
Change-Id: If4b5d8d0e2d6f84fe1492a4f5de0789910ad0ee9
Reviewed-on: https://go-review.googlesource.com/81877
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
ARM64 manual says it is "constrained unpredictable" if the src
and dst registers of STLXRB are same, although it doesn't seem
to cause any problem on real hardwares so far. Fix by allocating
a different register to hold the updated value for
AtomicAnd8/Or8. We do this by making the ops returns <val,mem>
like AtomicAdd, although val will not be used elsewhere.
Fixes#25823.
Change-Id: I735b9822f99877b3c7aee67a65e62b7278dc40df
Reviewed-on: https://go-review.googlesource.com/117976
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Wei Xiao <Wei.Xiao@arm.com>