Commit graph

418 commits

Author SHA1 Message Date
Joel Sing
2e918c3aab cmd/compile: provide Load8/Store8 atomic intrinsics on riscv64
Updates #36765

Change-Id: Ieeb6bbc54e4841a1348ad50e80342ec4bc675e07
Reviewed-on: https://go-review.googlesource.com/c/go/+/223557
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-03-17 06:38:32 +00:00
Joel Sing
26154f31ad cmd/compile: use NEG/NEGW pseudo-instructions on riscv64
Also rewrite subtraction of zero to NEG/NEGW.

Change-Id: I216e286d1860055f2a07fe2f772cd50f366ea097
Reviewed-on: https://go-review.googlesource.com/c/go/+/221691
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-15 08:19:07 +00:00
Joel Sing
7b2f0ba5b9 cmd/compile: use NOT pseudo-instruction on riscv64
Change-Id: I24a72c3fb8d72a47cfded4b523c5d7aa2d40419d
Reviewed-on: https://go-review.googlesource.com/c/go/+/221690
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-15 08:18:17 +00:00
Russ Cox
877ef86bec cmd/compile: add spectre mitigation mode enabled by -spectre
This commit adds a new cmd/compile flag -spectre,
which accepts a comma-separated list of possible
Spectre mitigations to apply, or the empty string (none),
or "all". The only known mitigation right now is "index",
which uses conditional moves to ensure that x86-64 CPUs
do not speculate past index bounds checks.

Speculating past index bounds checks may be problematic
on systems running privileged servers that accept requests
from untrusted users who can execute their own programs
on the same machine. (And some more constraints that
make it even more unlikely in practice.)

The cases this protects against are analogous to the ones
Microsoft explains in the "Array out of bounds load/store feeding ..."
sections here:
https://docs.microsoft.com/en-us/cpp/security/developer-guidance-speculative-execution?view=vs-2019#array-out-of-bounds-load-feeding-an-indirect-branch

Change-Id: Ib7532d7e12466b17e04c4e2075c2a456dc98f610
Reviewed-on: https://go-review.googlesource.com/c/go/+/222660
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-13 19:05:46 +00:00
Joel Sing
cc6a8bd0d7 cmd/compile: add zero store operations for riscv64
This allows for zero stores to be performed using the zero register, rather
than loading a separate register with zero.

Change-Id: Ic81d8dbcdacbb2ca2c3f77682ff5ad7cdc33d18d
Reviewed-on: https://go-review.googlesource.com/c/go/+/221684
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-03-05 11:56:33 +00:00
Josh Bleecher Snyder
37fc092be1 cmd/compile: remove duplicate ppc64 rules
Const64 gets lowered to MOVDconst.
Change rules using interior Const64 to use MOVDconst instead,
to be less dependent on rule application order.

As a result of doing this, some of the rules end up being
exact duplicates; remove those.

We had those exact duplicates because of the order dependency;
ppc64 had no way to optimize away shifts by a constant
if the initial lowering didn't catch it.

Add those optimizations as well.
The outcome is the same, but this makes the overall rules more robust.

Change-Id: Iadd97a9fe73d52358d571d022ace145e506d160b
Reviewed-on: https://go-review.googlesource.com/c/go/+/220877
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-03-02 21:59:19 +00:00
Ruixin(Peter) Bao
2962c96c9f cmd/compile: lower float to uint conversions on s390x
Add rules for lowering float <-> unsigned int on s390x.

During compilation,
Cvt64Uto64F rule triggers around 80 times,
Cvt64Fto64U rule triggers around 20 times,
Cvt64Uto32F rule triggers around 5 times.

Change-Id: If4c9d128b9132fce8c0bea9abc09cb43a5df7989
Reviewed-on: https://go-review.googlesource.com/c/go/+/209177
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-02-29 21:37:47 +00:00
Josh Bleecher Snyder
d889f0cb10 cmd/compile: use correct types in phiopt
We try to preserve type correctness of generic ops.
phiopt modified a bool to be an int without a conversion.
Add a conversion. There are a few random fluctations in the
generated code as a result, but nothing noteworthy or systematic.

no binary size changes

file                        before   after    Δ       %       
math.s                      35966    35961    -5      -0.014% 
debug/dwarf.s               108141   108147   +6      +0.006% 
crypto/dsa.s                6047     6044     -3      -0.050% 
image/png.s                 42882    42885    +3      +0.007% 
go/parser.s                 80281    80278    -3      -0.004% 
cmd/internal/obj.s          115116   115113   -3      -0.003% 
go/types.s                  322130   322118   -12     -0.004% 
cmd/internal/obj/arm64.s    151679   151685   +6      +0.004% 
go/internal/gccgoimporter.s 56487    56493    +6      +0.011% 
cmd/test2json.s             1650     1647     -3      -0.182% 
cmd/link/internal/loadelf.s 35442    35443    +1      +0.003% 
cmd/go/internal/work.s      305039   305035   -4      -0.001% 
cmd/link/internal/ld.s      544835   544834   -1      -0.000% 
net/http.s                  558777   558774   -3      -0.001% 
cmd/compile/internal/ssa.s  3926551  3926994  +443    +0.011% 
cmd/compile/internal/gc.s   1552320  1552321  +1      +0.000% 
total                       18862241 18862670 +429    +0.002% 


Change-Id: I4289e773be6be534ea3f907d68f614441b8f9b46
Reviewed-on: https://go-review.googlesource.com/c/go/+/221607
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-29 17:02:29 +00:00
Josh Bleecher Snyder
2cf3ebaf3d cmd/compile: add dedicated ARM64BitField aux type
The goal here is improved AuxInt printing in ssa.html.
Instead of displaying an inscrutable encoded integer,
it displays something like

v25 (28) = UBFX <int> [lsb=4,width=8] v52

which is much nicer for debugging.

Change-Id: I40713ff7f4a857c4557486cdf73c2dff137511ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/221420
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-02-28 14:52:13 +00:00
Joel Sing
8955a56da0 cmd/compile: improve SignExt32to64 on riscv64
SignExt32to64 can be implemented with a single ADDIW instruction, rather than
the two shifts that are in use currently.

Change-Id: Ie1bbaef4018f1ba5162773fc64fa5a887457cfc9
Reviewed-on: https://go-review.googlesource.com/c/go/+/220922
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-02-28 14:33:28 +00:00
Joel Sing
c27dd0c9e5 cmd/compile: improve Eq32/Neq32 on riscv64
Use SUBW to perform a 32-bit subtraction, rather than zero extending from
32 to 64 bits. This reduces Eq32 and Neq32 to two instructions, rather than
the four instructions required previously.

Change-Id: Ib2798324881e9db842c864e91a0c1b1e48c4b67b
Reviewed-on: https://go-review.googlesource.com/c/go/+/220921
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-02-26 17:59:57 +00:00
Michael Munday
cb74dcc172 cmd/compile: remove Greater* and Geq* generic integer ops
The generic Greater and Geq ops can always be replaced with the Less and
Leq ops. This CL therefore removes them. This simplifies the compiler since
it reduces the number of operations that need handling in both code and in
rewrite rules. This will be especially true when adding control flow
optimizations such as the integer-in-range optimizations in CL 165998.

Change-Id: If0648b2b19998ac1bddccbf251283f3be4ec3040
Reviewed-on: https://go-review.googlesource.com/c/go/+/220417
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-26 13:11:53 +00:00
Josh Bleecher Snyder
1894842b75 cmd/compile: allow values with aux Sym to fault on nil args
And use this newfound power to more precisely describe some PPC64 ops.

Change-Id: Idb2b669d74fbab5f3508edf19f7e3347306b0daf
Reviewed-on: https://go-review.googlesource.com/c/go/+/217002
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2020-02-22 15:33:55 +00:00
Josh Bleecher Snyder
f53f987ebf cmd/compile: merge more shifts into stores
Updates #36223

(Might fix #36223. I'm not sure whether there are more outstanding.)

This helps a bit, but not as much as I'd expected/hoped.

file                                      before  after   Δ       %       
runtime.s                                 477286  477256  -30     -0.006% 
bytes.s                                   31089   31085   -4      -0.013% 
time.s                                    83561   83547   -14     -0.017% 
strings.s                                 43284   43280   -4      -0.009% 
compress/flate.s                          51374   51295   -79     -0.154% 
math/big.s                                184283  184256  -27     -0.015% 
crypto/elliptic.s                         51649   51577   -72     -0.139% 
crypto/sha512.s                           8661    8644    -17     -0.196% 
crypto/sha1.s                             6975    6959    -16     -0.229% 
crypto/sha256.s                           6412    6393    -19     -0.296% 
vendor/golang.org/x/text/unicode/bidi.s   27158   27146   -12     -0.044% 
vendor/golang.org/x/text/unicode/norm.s   66802   66788   -14     -0.021% 
net/http.s                                560936  560929  -7      -0.001% 
text/template.s                           96475   96467   -8      -0.008% 
go/parser.s                               80284   80280   -4      -0.005% 
text/tabwriter.s                          9618    9611    -7      -0.073% 
go/printer.s                              78502   78499   -3      -0.004% 
go/types.s                                321815  321807  -8      -0.002% 
internal/xcoff.s                          23175   23171   -4      -0.017% 
image/jpeg.s                              36609   36587   -22     -0.060% 
cmd/vendor/golang.org/x/arch/x86/x86asm.s 81274   81001   -273    -0.336% 
cmd/internal/obj.s                        115184  115126  -58     -0.050% 
cmd/internal/obj/arm64.s                  151502  151487  -15     -0.010% 
cmd/internal/obj/s390x.s                  128054  128046  -8      -0.006% 
cmd/internal/obj/wasm.s                   44295   44291   -4      -0.009% 
cmd/compile/internal/ssa.s                4201992 4209504 +7512   +0.179% 
cmd/compile/internal/gc.s                 1555029 1555011 -18     -0.001% 
total                                     9792875 9799640 +6765   +0.069% 


Change-Id: If4a857c0953a766578e68aa299b112a20d9b2b86
Reviewed-on: https://go-review.googlesource.com/c/go/+/213704
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-20 18:37:38 +00:00
Josh Bleecher Snyder
6817210edf cmd/compile: mark amd64 HMUL ops as not commutative
HMUL is commutative. However, it has asymmetric register requirements.
There are existing rewrite rules to place arguments in preferable slots.

Due to a bug, the existing rulegen commutativity engine doesn't generate
the commuted form of the HMUL rules.
The commuted form of those rewrite rules cause infinite loops.
In order to fix the rulegen commutativity bug,
we need to choose between eliminating
those rewrite rules and marking HMUL ops as not commutative.

This change chooses the latter, since doing so yields better
optimization results on std+cmd.

Removing the rewrite rules yields only text size regressions:

file                                before  after   Δ       %
runtime.s                           477257  477269  +12     +0.003%
time.s                              83552   83612   +60     +0.072%
encoding/asn1.s                     57378   57382   +4      +0.007%
cmd/go/internal/modfetch/codehost.s 89822   89829   +7      +0.008%
cmd/internal/test2json.s            9459    9466    +7      +0.074%
cmd/go/internal/test.s              57665   57678   +13     +0.023%

Marking HMUL as not commutative actually yields (mostly) improvements:

file                               before   after    Δ       %
runtime.s                          477257   477247   -10     -0.002%
math.s                             35985    35992    +7      +0.019%
strconv.s                          53486    53462    -24     -0.045%
syscall.s                          82483    82446    -37     -0.045%
time.s                             83552    83561    +9      +0.011%
os.s                               52691    52684    -7      -0.013%
archive/zip.s                      42285    42272    -13     -0.031%
encoding/asn1.s                    57378    57329    -49     -0.085%
encoding/base64.s                  12156    12094    -62     -0.510%
net.s                              296286   296276   -10     -0.003%
encoding/base32.s                  9720     9658     -62     -0.638%
net/http.s                         560931   560907   -24     -0.004%
net/smtp.s                         14421    14411    -10     -0.069%
cmd/vendor/golang.org/x/sys/unix.s 74307    74266    -41     -0.055%

The regressions are minor, and are in functions math.cbrt,
time.Time.String, and time.Date.

Change-Id: I9f6d9ee71654e5b70381cac77b0ac26011f4ea12
Reviewed-on: https://go-review.googlesource.com/c/go/+/213701
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-20 14:58:45 +00:00
Joel Sing
98d2717499 cmd/compile: implement compiler for riscv64
Based on riscv-go port.

Updates #27532

Change-Id: Ia329daa243db63ff334053b8807ea96b97ce3acf
Reviewed-on: https://go-review.googlesource.com/c/go/+/204631
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-01-18 14:41:40 +00:00
Cherry Zhang
7673884a7f cmd/compile: don't fuse branches with side effects
Count Values with side effects but no use as live, and don't fuse
branches that contain such Values. (This can happen e.g. when it
is followed by an infinite loop.) Otherwise this may lead to
miscompilation (side effect fired at wrong condition) or ICE (two
stores live simultaneously).

Fixes #36005.

Change-Id: If202eae4b37cb7f0311d6ca120ffa46609925157
Reviewed-on: https://go-review.googlesource.com/c/go/+/210179
Reviewed-by: Keith Randall <khr@golang.org>
2019-12-06 00:57:11 +00:00
Michael Munday
b3885dbc93 cmd/compile, runtime: intrinsify atomic And8 and Or8 on s390x
Intrinsify these functions to match other platforms. Update the
sequence of instructions used in the assembly implementations to
match the intrinsics.

Also, add a micro benchmark so we can more easily measure the
performance of these two functions:

name            old time/op  new time/op  delta
And8-8          5.33ns ± 7%  2.55ns ± 8%  -52.12%  (p=0.000 n=20+20)
And8Parallel-8  7.39ns ± 5%  3.74ns ± 4%  -49.34%  (p=0.000 n=20+20)
Or8-8           4.84ns ±15%  2.64ns ±11%  -45.50%  (p=0.000 n=20+20)
Or8Parallel-8   7.27ns ± 3%  3.84ns ± 4%  -47.10%  (p=0.000 n=19+20)

By using a 'rotate then xor selected bits' instruction combined with
either a 'load and and' or a 'load and or' instruction we can
implement And8 and Or8 with far fewer instructions. Replacing
'compare and swap' with atomic instructions may also improve
performance when there is contention.

Change-Id: I28bb8032052b73ae8ccdf6e4c612d2877085fa01
Reviewed-on: https://go-review.googlesource.com/c/go/+/204277
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-11-11 15:23:59 +00:00
Cherry Zhang
ceca99bdeb cmd/compile, cmd/internal/obj/ppc64: mark unsafe points
We'll use CTR as a scratch register for call injection. Mark code
sequences that use CTR as unsafe for async preemption. Currently
it is only used in LoweredZero and LoweredMove. It is unfortunate
that they are nonpreemptible. But I think it is still better than
using LR for call injection and marking all leaf functions
nonpreemptible.

Also mark the prologue of large frame functions nonpreemptible,
as we write below SP.

Change-Id: I05a75431499f3f4b2f23651a7b17f7fcf2afbe06
Reviewed-on: https://go-review.googlesource.com/c/go/+/203823
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-11-07 19:20:35 +00:00
Cherry Zhang
a96cfa75c6 cmd/compile: mark unsafe points for MIPS and MIPS64
Mark atomic LL/SC loops as unsafe for async preemption, as they
use REGTMP.

Change-Id: I5be7f93ad3ee337049ec7c3efd6fdc30eef87d97
Reviewed-on: https://go-review.googlesource.com/c/go/+/203719
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-07 19:18:59 +00:00
Russ Cox
543c6d2e0d math, cmd/compile: rename Fma to FMA
This API was added for #25819, where it was discussed as math.FMA.
The commit adding it used math.Fma, presumably for consistency
with the rest of the unusual names in package math
(Sincos, Acosh, Erfcinv, Float32bits, etc).

I believe that using an idiomatic Go name is more important here
than consistency with these other names, most of which are historical
baggage from C's standard library.

Early additions like Float32frombits happened before "uppercase for export"
(so they were originally like "float32frombits") and they were not properly
reconsidered when we uppercased the symbols to export them.
That's a mistake we live with.

The names of functions we have added since then, and even a few
that were legacy, are more properly Go-cased, such as IsNaN, IsInf,
and RoundToEven, rather than Isnan, Isinf, and Roundtoeven.
And also constants like MaxFloat32.

For new API, we should keep using proper Go-cased symbols
instead of minimally-upper-cased-C symbols.

So math.FMA, not math.Fma.

This API has not yet been released, so this change does not break
the compatibility promise.

This CL also modifies cmd/compile, since the compiler knows
the name of the function. I could have stopped at changing the
string constants, but it seemed to make more sense to use a
consistent casing everywhere.

Change-Id: I0f6f3407f41e99bfa8239467345c33945088896e
Reviewed-on: https://go-review.googlesource.com/c/go/+/205317
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-11-07 14:51:06 +00:00
Cherry Zhang
4a7ed1fab7 cmd/compile: mark architecture-specific unsafe points
Introduce a mechanism for marking architecture-specific Ops
unsafe. And mark ones that use REGTMP on ARM64, as for async
preemption we will be using REGTMP as a temporary register in the
injected call.

Change-Id: I8ff22e87d8f9cb10d02a2f0af7c12ad6d7d58f54
Reviewed-on: https://go-review.googlesource.com/c/go/+/203459
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-05 02:55:11 +00:00
Austin Clements
ec10e6f364 cmd/compile: fix missing lowering of atomic {Load,Store}8
CL 203284 added a compiler intrinsics from atomic Load8 and Store8 on
several architectures, but missed the lowering on MIPS. This CL fixes
that.

Updates #10958, #24543.

Change-Id: I82e88971554fe8c33ad2bf195a633c44b9ac4cf7
Reviewed-on: https://go-review.googlesource.com/c/go/+/203977
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-10-29 13:48:29 +00:00
Austin Clements
97592b3c14 cmd/compile: intrinsics for runtime/internal/atomic.Store8
For #10958, #24543, but makes sense on its own.

Change-Id: I2a87dab66b82a1863e4b6512b1f8def51463ce2a
Reviewed-on: https://go-review.googlesource.com/c/go/+/203284
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-29 03:18:55 +00:00
smasher164
58b031949b cmd/compile: add fma intrinsic for arm
This change introduces an arm intrinsic that generates the FMULAD
instruction for the fused-multiply-add operation on systems that
support it. System support is detected via cpu.ARM.HasVFPv4. A rewrite
rule translates the generic intrinsic to FMULAD.

Updates #25819.

Change-Id: I8459e5dd1cdbdca35f88a78dbeb7d387f1e20efa
Reviewed-on: https://go-review.googlesource.com/c/go/+/142117
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-21 17:42:47 +00:00
smasher164
7a6da218b1 cmd/compile: add fma intrinsic for amd64
To permit ssa-level optimization, this change introduces an amd64 intrinsic
that generates the VFMADD231SD instruction for the fused-multiply-add
operation on systems that support it. System support is detected via
cpu.X86.HasFMA. A rewrite rule can then translate the generic ssa intrinsic
("Fma") to VFMADD231SD.

The benchmark compares the software implementation (old) with the intrinsic
(new).

name   old time/op  new time/op  delta
Fma-4  27.2ns ± 1%   1.0ns ± 9%  -96.48%  (p=0.008 n=5+5)

Updates #25819.

Change-Id: I966655e5f96817a5d06dff5942418a3915b09584
Reviewed-on: https://go-review.googlesource.com/c/go/+/137156
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-21 16:42:10 +00:00
smasher164
33425ab8db cmd/compile: introduce generic ssa intrinsic for fused-multiply-add
In order to make math.FMA a compiler intrinsic for ISAs like ARM64,
PPC64[le], and S390X, a generic 3-argument opcode "Fma" is provided and
rewritten as

    ARM64: (Fma x y z) -> (FMADDD z x y)
    PPC64: (Fma x y z) -> (FMADD x y z)
    S390X: (Fma x y z) -> (FMADD z x y)

Updates #25819.

Change-Id: Ie5bc628311e6feeb28ddf9adaa6e702c8c291efa
Reviewed-on: https://go-review.googlesource.com/c/go/+/131959
Run-TryBot: Akhil Indurti <aindurti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-21 16:24:15 +00:00
Ben Shi
11d7775c9f cmd/compile: remove some nacl SSA rules
Updates golang/go#30439

Change-Id: I7ef5301fbd650d26a37a1241ddf7ca1ccd58b89d
Reviewed-on: https://go-review.googlesource.com/c/go/+/200941
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-10-15 16:45:31 +00:00
Michael Munday
ac8966aa58 cmd/compile/internal/ssa: fix block AuxIntType lookup
Avoid an out-of-range error when calling LongString on a generic
block.

Change-Id: I33ca88940d899bc71e3155bc63d2aa925cf83230
Reviewed-on: https://go-review.googlesource.com/c/go/+/200737
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2019-10-11 16:30:20 +00:00
Lynn Boger
816ff44479 cmd/compile: use vsx loads and stores for LoweredMove, LoweredZero on ppc64x
This improves the code generated for LoweredMove and LoweredZero by
using LXVD2X and STXVD2X to move 16 bytes at a time. These instructions
are now used if the size to be moved or zeroed is >= 64. These same
instructions have already been used in the asm implementations for
memmove and memclr.

Some examples where this shows an improvement on power8:

MakeSlice/Byte                                  27.3ns ± 1%     25.2ns ± 0%    -7.69%
MakeSlice/Int16                                 40.2ns ± 0%     35.2ns ± 0%   -12.39%
MakeSlice/Int                                   94.9ns ± 1%     77.9ns ± 0%   -17.92%
MakeSlice/Ptr                                    129ns ± 1%      103ns ± 0%   -20.16%
MakeSlice/Struct/24                              176ns ± 1%      131ns ± 0%   -25.67%
MakeSlice/Struct/32                              200ns ± 1%      142ns ± 0%   -29.09%
MakeSlice/Struct/40                              220ns ± 2%      156ns ± 0%   -28.82%
GrowSlice/Byte                                  81.4ns ± 0%     73.4ns ± 0%    -9.88%
GrowSlice/Int16                                  118ns ± 1%       98ns ± 0%   -17.03%
GrowSlice/Int                                    178ns ± 1%      134ns ± 1%   -24.65%
GrowSlice/Ptr                                    249ns ± 4%      212ns ± 0%   -14.94%
GrowSlice/Struct/24                              294ns ± 5%      215ns ± 0%   -27.08%
GrowSlice/Struct/32                              315ns ± 1%      248ns ± 0%   -21.49%
GrowSlice/Struct/40                              382ns ± 4%      289ns ± 1%   -24.38%
ExtendSlice/IntSlice                             109ns ± 1%       90ns ± 1%   -17.51%
ExtendSlice/PointerSlice                         142ns ± 2%      118ns ± 0%   -16.75%
ExtendSlice/NoGrow                              6.02ns ± 0%     5.88ns ± 0%    -2.33%
Append                                          27.2ns ± 0%     27.6ns ± 0%    +1.38%
AppendGrowByte                                  4.20ms ± 3%     2.60ms ± 0%   -38.18%
AppendGrowString                                 134ms ± 3%      102ms ± 2%   -23.62%
AppendSlice/1Bytes                              5.65ns ± 0%     5.67ns ± 0%    +0.35%
AppendSlice/4Bytes                              6.40ns ± 0%     6.55ns ± 0%    +2.34%
AppendSlice/7Bytes                              8.74ns ± 0%     8.84ns ± 0%    +1.14%
AppendSlice/8Bytes                              5.68ns ± 0%     5.70ns ± 0%    +0.40%
AppendSlice/15Bytes                             9.31ns ± 0%     9.39ns ± 0%    +0.86%
AppendSlice/16Bytes                             14.0ns ± 0%      5.8ns ± 0%   -58.32%
AppendSlice/32Bytes                             5.72ns ± 0%     5.68ns ± 0%    -0.66%
AppendSliceLarge/1024Bytes                       918ns ± 8%      615ns ± 1%   -33.00%
AppendSliceLarge/4096Bytes                      3.25µs ± 1%     1.92µs ± 1%   -40.84%
AppendSliceLarge/16384Bytes                     8.70µs ± 2%     4.69µs ± 0%   -46.08%
AppendSliceLarge/65536Bytes                     18.1µs ± 3%      7.9µs ± 0%   -56.30%
AppendSliceLarge/262144Bytes                    69.8µs ± 2%     25.9µs ± 0%   -62.91%
AppendSliceLarge/1048576Bytes                    258µs ± 1%       93µs ± 0%   -63.96%

Change-Id: I21625dbe231a2029ddb9f7d73f5a6417b35c1e49
Reviewed-on: https://go-review.googlesource.com/c/go/+/199639
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-08 16:41:02 +00:00
Michael Munday
6ec4c71eef cmd/compile: add SSA rules for s390x compare-and-branch instructions
This commit adds SSA rules for the s390x combined compare-and-branch
instructions. These have a shorter encoding than separate compare
and branch instructions and they also don't clobber the condition
code (a.k.a. flag register) reducing pressure on the flag allocator.

I have deleted the 'loop_test.go' file and replaced it with a new
codegen test which performs a wider range of checks.

Object sizes from compilebench:

name                      old object-bytes  new object-bytes  delta
Template                        562kB ± 0%        561kB ± 0%   -0.28%  (p=0.000 n=10+10)
Unicode                         217kB ± 0%        217kB ± 0%   -0.17%  (p=0.000 n=10+10)
GoTypes                        2.03MB ± 0%       2.02MB ± 0%   -0.59%  (p=0.000 n=10+10)
Compiler                       8.16MB ± 0%       8.11MB ± 0%   -0.62%  (p=0.000 n=10+10)
SSA                            27.4MB ± 0%       27.0MB ± 0%   -1.45%  (p=0.000 n=10+10)
Flate                           356kB ± 0%        356kB ± 0%   -0.12%  (p=0.000 n=10+10)
GoParser                        438kB ± 0%        436kB ± 0%   -0.51%  (p=0.000 n=10+10)
Reflect                        1.37MB ± 0%       1.37MB ± 0%   -0.42%  (p=0.000 n=10+10)
Tar                             485kB ± 0%        483kB ± 0%   -0.39%  (p=0.000 n=10+10)
XML                             630kB ± 0%        621kB ± 0%   -1.45%  (p=0.000 n=10+10)
[Geo mean]                     1.14MB            1.13MB        -0.60%

name                      old text-bytes    new text-bytes    delta
HelloSize                       763kB ± 0%        754kB ± 0%   -1.30%  (p=0.000 n=10+10)
CmdGoSize                      10.7MB ± 0%       10.6MB ± 0%   -0.91%  (p=0.000 n=10+10)
[Geo mean]                     2.86MB            2.82MB        -1.10%

Change-Id: Ibca55d9c0aa1254aee69433731ab5d26a43a7c18
Reviewed-on: https://go-review.googlesource.com/c/go/+/198037
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-08 10:03:04 +00:00
Michael Munday
cf03238020 cmd/compile: use numeric condition code masks on s390x
Prior to this CL conditional branches on s390x always used an
extended mnemonic such as BNE, BLT and so on to represent branch
instructions with different condition code masks. This CL adds
support for numeric condition code masks to the s390x SSA backend
so that we can encode the condition under which a Block's
successor is chosen as a field in that Block rather than in its
type.

This change will be useful as we come to add support for combined
compare-and-branch instructions. Rather than trying to add extended
mnemonics for every possible combination of mask and compare-and-
branch instruction we can instead use a single mnemonic for each
instruction.

Change-Id: Idb7458f187b50906877d683695c291dff5279553
Reviewed-on: https://go-review.googlesource.com/c/go/+/197178
Reviewed-by: Keith Randall <khr@golang.org>
2019-09-26 14:47:12 +00:00
Cherry Zhang
55924135ee cmd/compile: fix register masks of ANDCC et al. on PPC64
PPC64's ANDCC, ORCC, XORCC SSA ops produce a flags value, which
should not have register mask of an integer register.

Fixes #34468.

Change-Id: Ic762e423b20275fd9f8118dae7951c258d59738c
Reviewed-on: https://go-review.googlesource.com/c/go/+/196960
Reviewed-by: Keith Randall <khr@golang.org>
2019-09-23 21:16:33 +00:00
Richard Musiol
1c50fcf853 cmd/compile: add 32 bit float registers/variables on wasm
Before this change, wasm only used float variables with a size of 64 bit
and applied rounding to 32 bit precision where necessary. This change
adds proper 32 bit float variables.

Reduces the size of pkg/js_wasm by 254 bytes.

Change-Id: Ieabe846a8cb283d66def3cdf11e2523b3b31f345
Reviewed-on: https://go-review.googlesource.com/c/go/+/195117
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-09-19 20:26:22 +00:00
Lynn Boger
7987238d9c cmd/asm,cmd/compile: clean up isel codegen on ppc64x
This cleans up the isel code generation in ssa for ppc64x.

Current there is no isel op and the isel code is only
generated from pseudo ops in ppc64/ssa.go, and only using
operands with values 0 or 1. When the isel is generated,
there is always a load of 1 into the temp register before it.

This change implements the isel op so it can be used in PPC64.rules,
and can recognize operand values other than 0 or 1. This also
eliminates the forced load of 1, so it will be loaded only if
needed.

This will make the isel code generation consistent with other ops,
and allow future rule changes that can take advantage of having
a more general purpose isel rule.

Change-Id: I363e1dbd3f7f5dfecb53187ad51cce409a8d1f8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/195057
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2019-09-18 15:54:32 +00:00
Ruixin Bao
98aa97806b cmd/compile: add math/bits.Mul64 intrinsic on s390x
This change adds an intrinsic for Mul64 on s390x. To achieve that,
a new assembly instruction, MLGR, is introduced in s390x/asmz.go. This assembly
instruction directly uses an existing instruction on Z and supports multiplication
of two 64 bit unsigned integer and stores the result in two separate registers.

In this case, we require the multiplcand to be stored in register R3 and
the output result (the high and low 64 bit of the product) to be stored in
R2 and R3 respectively.

A test case is also added.

Benchmark:
name      old time/op  new time/op  delta
Mul-18    11.1ns ± 0%   1.4ns ± 0%  -87.39%  (p=0.002 n=8+10)
Mul32-18  2.07ns ± 0%  2.07ns ± 0%     ~     (all equal)
Mul64-18  11.1ns ± 1%   1.4ns ± 0%  -87.42%  (p=0.000 n=10+10)

Change-Id: Ieca6ad1f61fff9a48a31d50bbd3f3c6d9e6675c1
Reviewed-on: https://go-review.googlesource.com/c/go/+/194572
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-09-13 09:04:48 +00:00
Michael Munday
5c5f217b63 cmd/compile: improve s390x sign/zero extension removal
This CL gets rid of the MOVDreg and MOVDnop SSA operations on
s390x. They were originally inserted to help avoid situations
where a sign/zero extension was elided but a spill invalidated
the optimization. It's not really clear we need to do this though
(amd64 doesn't have these ops for example) so long as we are
careful when removing sign/zero extensions. Also, the MOVDreg
technique doesn't work if the register is spilled before the
MOVDreg op (I haven't seen that in practice).

Removing these ops reduces the complexity of the rules and also
allows us to unblock optimizations. For example, the compiler can
now merge the loads in binary.{Big,Little}Endian.PutUint16 which
it wasn't able to do before. This CL reduces the size of the .text
section in the go tool by about 4.7KB (0.09%).

Change-Id: Icaddae7f2e4f9b2debb6fabae845adb3f73b41db
Reviewed-on: https://go-review.googlesource.com/c/go/+/173897
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-09-10 13:17:24 +00:00
Brian Kessler
b003afe4fe cmd/compile: intrinsify RotateLeft32 on wasm
wasm has 32-bit versions of all integer operations. This change
lowers RotateLeft32 to i32.rotl on wasm and intrinsifies the math/bits
call.  Benchmarking on amd64 under node.js this is ~25% faster.

node v10.15.3/amd64
name          old time/op  new time/op  delta
RotateLeft    8.37ns ± 1%  8.28ns ± 0%   -1.05%  (p=0.029 n=4+4)
RotateLeft8   11.9ns ± 1%  11.8ns ± 0%     ~     (p=0.167 n=5+5)
RotateLeft16  11.8ns ± 0%  11.8ns ± 0%     ~     (all equal)
RotateLeft32  11.9ns ± 1%   8.7ns ± 0%  -26.32%  (p=0.008 n=5+5)
RotateLeft64  8.31ns ± 1%  8.43ns ± 2%     ~     (p=0.063 n=5+5)

Updates #31265

Change-Id: I5b8e155978faeea536c4f6427ac9564d2f096a46
Reviewed-on: https://go-review.googlesource.com/c/go/+/182359
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Richard Musiol <neelance@gmail.com>
2019-08-31 17:03:04 +00:00
Meng Zhuo
307544f427 runtime, cmd/compile: implement and use DUFFCOPY on MIPS64
OS: Linux loongson 3.10.84 mips64el
CPU: Loongson 3A3000 quad core

name                   old time/op    new time/op    delta
BinaryTree17              23.5s ± 1%     23.2s ± 0%  -1.12%  (p=0.008 n=5+5)
Fannkuch11                10.2s ± 0%     10.1s ± 0%  -0.19%  (p=0.008 n=5+5)
FmtFprintfEmpty           450ns ± 0%     446ns ± 1%  -0.89%  (p=0.024 n=5+5)
FmtFprintfString          722ns ± 1%     721ns ± 1%    ~     (p=0.762 n=5+5)
FmtFprintfInt             693ns ± 2%     691ns ± 2%    ~     (p=0.889 n=5+5)
FmtFprintfIntInt          912ns ± 1%     911ns ± 0%    ~     (p=0.722 n=5+5)
FmtFprintfPrefixedInt    1.35µs ± 2%    1.35µs ± 2%    ~     (p=1.000 n=5+5)
FmtFprintfFloat          1.79µs ± 0%    1.78µs ± 0%    ~     (p=0.683 n=5+5)
FmtManyArgs              3.46µs ± 1%    3.48µs ± 1%    ~     (p=0.246 n=5+5)
GobDecode                48.8ms ± 1%    48.6ms ± 0%    ~     (p=0.222 n=5+5)
GobEncode                37.7ms ± 1%    37.4ms ± 1%    ~     (p=0.095 n=5+5)
Gzip                      1.72s ± 1%     1.72s ± 0%    ~     (p=0.905 n=5+4)
Gunzip                    342ms ± 0%     342ms ± 0%    ~     (p=0.421 n=5+5)
HTTPClientServer          219µs ± 1%     219µs ± 1%    ~     (p=1.000 n=5+5)
JSONEncode               89.1ms ± 1%    89.4ms ± 1%    ~     (p=0.222 n=5+5)
JSONDecode                292ms ± 1%     291ms ± 0%    ~     (p=0.421 n=5+5)
Mandelbrot200            15.7ms ± 0%    15.6ms ± 0%    ~     (p=0.690 n=5+5)
GoParse                  19.5ms ± 1%    19.6ms ± 1%    ~     (p=0.310 n=5+5)
RegexpMatchEasy0_32       534ns ± 1%     529ns ± 1%    ~     (p=0.056 n=5+5)
RegexpMatchEasy0_1K      2.75µs ± 0%    2.74µs ± 0%  -0.46%  (p=0.008 n=5+5)
RegexpMatchEasy1_32       572ns ± 2%     565ns ± 3%    ~     (p=0.310 n=5+5)
RegexpMatchEasy1_1K      4.15µs ± 0%    4.15µs ± 1%    ~     (p=0.548 n=5+5)
RegexpMatchMedium_32     31.2ns ± 0%    31.1ns ± 0%  -0.45%  (p=0.016 n=5+4)
RegexpMatchMedium_1K      235µs ± 1%     235µs ± 0%    ~     (p=1.000 n=5+5)
RegexpMatchHard_32       13.9µs ± 1%    13.5µs ± 1%  -2.74%  (p=0.008 n=5+5)
RegexpMatchHard_1K        416µs ± 2%     410µs ± 2%    ~     (p=0.056 n=5+5)
Revcomp                   6.36s ± 0%     6.34s ± 0%  -0.31%  (p=0.008 n=5+5)
Template                  352ms ± 1%     353ms ± 0%  +0.45%  (p=0.032 n=5+5)
TimeParse                2.04µs ± 4%    2.01µs ± 0%    ~     (p=0.056 n=5+5)
TimeFormat               2.97µs ± 0%    2.97µs ± 0%    ~     (p=1.000 n=5+5)

name                   old speed      new speed      delta
GobDecode              15.7MB/s ± 1%  15.8MB/s ± 0%    ~     (p=0.206 n=5+5)
GobEncode              20.4MB/s ± 1%  20.5MB/s ± 1%    ~     (p=0.056 n=5+5)
Gzip                   11.3MB/s ± 1%  11.3MB/s ± 0%    ~     (p=0.841 n=5+4)
Gunzip                 56.7MB/s ± 0%  56.8MB/s ± 0%    ~     (p=0.389 n=5+5)
JSONEncode             21.8MB/s ± 1%  21.7MB/s ± 1%    ~     (p=0.246 n=5+5)
JSONDecode             6.66MB/s ± 0%  6.67MB/s ± 0%    ~     (p=0.857 n=4+5)
GoParse                2.97MB/s ± 1%  2.96MB/s ± 1%    ~     (p=0.238 n=5+5)
RegexpMatchEasy0_32    59.9MB/s ± 1%  60.5MB/s ± 1%  +0.92%  (p=0.032 n=5+5)
RegexpMatchEasy0_1K     372MB/s ± 0%   374MB/s ± 0%  +0.46%  (p=0.008 n=5+5)
RegexpMatchEasy1_32    56.0MB/s ± 2%  56.7MB/s ± 3%    ~     (p=0.310 n=5+5)
RegexpMatchEasy1_1K     247MB/s ± 0%   247MB/s ± 1%    ~     (p=0.548 n=5+5)
RegexpMatchMedium_32   32.0MB/s ± 0%  32.1MB/s ± 0%    ~     (p=0.135 n=5+5)
RegexpMatchMedium_1K   4.35MB/s ± 1%  4.35MB/s ± 1%    ~     (p=0.825 n=5+5)
RegexpMatchHard_32     2.30MB/s ± 1%  2.37MB/s ± 1%  +2.78%  (p=0.008 n=5+5)
RegexpMatchHard_1K     2.47MB/s ± 1%  2.50MB/s ± 2%    ~     (p=0.095 n=5+5)
Revcomp                40.0MB/s ± 0%  40.1MB/s ± 0%  +0.31%  (p=0.016 n=5+5)
Template               5.51MB/s ± 1%  5.49MB/s ± 0%    ~     (p=0.190 n=5+5)

Change-Id: I540a2e4e7992376ce04f93b332f64fc3b6071237
Reviewed-on: https://go-review.googlesource.com/c/go/+/185078
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-08-28 15:49:59 +00:00
Ben Shi
3cfd003a8a cmd/compile: optimize ARM's math.bits.RotateLeft32
This CL optimizes math.bits.RotateLeft32 to inline
"MOVW Rx@>Ry, Rd" on ARM.

The benchmark results of math/bits show some improvements.
name               old time/op  new time/op  delta
RotateLeft-4       9.42ns ± 0%  6.91ns ± 0%  -26.66%  (p=0.000 n=40+33)
RotateLeft8-4      8.79ns ± 0%  8.79ns ± 0%   -0.04%  (p=0.000 n=40+31)
RotateLeft16-4     8.79ns ± 0%  8.79ns ± 0%   -0.04%  (p=0.000 n=40+32)
RotateLeft32-4     8.16ns ± 0%  7.54ns ± 0%   -7.68%  (p=0.000 n=40+40)
RotateLeft64-4     15.7ns ± 0%  15.7ns ± 0%     ~     (all equal)

updates #31265

Change-Id: I77bc1c2c702d5323fc7cad5264a8e2d5666bf712
Reviewed-on: https://go-review.googlesource.com/c/go/+/188697
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-08-28 15:41:58 +00:00
Ben Shi
c683ab8128 cmd/compile: optimize ARM's math.Abs
This CL optimizes math.Abs to an inline ABSD instruction on ARM.

The benchmark results of src/math/ show big improvements.
name                   old time/op  new time/op  delta
Acos-4                  181ns ± 0%   182ns ± 0%   +0.30%  (p=0.000 n=40+40)
Acosh-4                 202ns ± 0%   202ns ± 0%     ~     (all equal)
Asin-4                  163ns ± 0%   163ns ± 0%     ~     (all equal)
Asinh-4                 242ns ± 0%   242ns ± 0%     ~     (all equal)
Atan-4                  120ns ± 0%   121ns ± 0%   +0.83%  (p=0.000 n=40+40)
Atanh-4                 202ns ± 0%   202ns ± 0%     ~     (all equal)
Atan2-4                 173ns ± 0%   173ns ± 0%     ~     (all equal)
Cbrt-4                 1.06µs ± 0%  1.06µs ± 0%   +0.09%  (p=0.000 n=39+37)
Ceil-4                 72.9ns ± 0%  72.8ns ± 0%     ~     (p=0.237 n=40+40)
Copysign-4             13.2ns ± 0%  13.2ns ± 0%     ~     (all equal)
Cos-4                   193ns ± 0%   183ns ± 0%   -5.18%  (p=0.000 n=40+40)
Cosh-4                  254ns ± 0%   239ns ± 0%   -5.91%  (p=0.000 n=40+40)
Erf-4                   112ns ± 0%   112ns ± 0%     ~     (all equal)
Erfc-4                  117ns ± 0%   117ns ± 0%     ~     (all equal)
Erfinv-4                127ns ± 0%   127ns ± 1%     ~     (p=0.492 n=40+40)
Erfcinv-4               128ns ± 0%   128ns ± 0%     ~     (all equal)
Exp-4                   212ns ± 0%   206ns ± 0%   -3.05%  (p=0.000 n=40+40)
ExpGo-4                 216ns ± 0%   209ns ± 0%   -3.24%  (p=0.000 n=40+40)
Expm1-4                 142ns ± 0%   142ns ± 0%     ~     (all equal)
Exp2-4                  191ns ± 0%   184ns ± 0%   -3.45%  (p=0.000 n=40+40)
Exp2Go-4                194ns ± 0%   187ns ± 0%   -3.61%  (p=0.000 n=40+40)
Abs-4                  14.4ns ± 0%   6.3ns ± 0%  -56.39%  (p=0.000 n=38+39)
Dim-4                  12.6ns ± 0%  12.6ns ± 0%     ~     (all equal)
Floor-4                49.6ns ± 0%  49.6ns ± 0%     ~     (all equal)
Max-4                  27.6ns ± 0%  27.6ns ± 0%     ~     (all equal)
Min-4                  27.0ns ± 0%  27.0ns ± 0%     ~     (all equal)
Mod-4                   349ns ± 0%   305ns ± 1%  -12.55%  (p=0.000 n=33+40)
Frexp-4                54.0ns ± 0%  47.1ns ± 0%  -12.78%  (p=0.000 n=38+38)
Gamma-4                 242ns ± 0%   234ns ± 0%   -3.16%  (p=0.000 n=36+40)
Hypot-4                84.8ns ± 0%  67.8ns ± 0%  -20.05%  (p=0.000 n=31+35)
HypotGo-4              88.5ns ± 0%  71.6ns ± 0%  -19.12%  (p=0.000 n=40+38)
Ilogb-4                45.8ns ± 0%  38.9ns ± 0%  -15.12%  (p=0.000 n=40+32)
J0-4                    821ns ± 0%   802ns ± 0%   -2.33%  (p=0.000 n=33+40)
J1-4                    816ns ± 0%   807ns ± 0%   -1.05%  (p=0.000 n=40+29)
Jn-4                   1.67µs ± 0%  1.65µs ± 0%   -1.45%  (p=0.000 n=40+39)
Ldexp-4                61.5ns ± 0%  54.6ns ± 0%  -11.27%  (p=0.000 n=40+32)
Lgamma-4                188ns ± 0%   188ns ± 0%     ~     (all equal)
Log-4                   154ns ± 0%   147ns ± 0%   -4.78%  (p=0.000 n=40+40)
Logb-4                 50.9ns ± 0%  42.7ns ± 0%  -16.11%  (p=0.000 n=34+39)
Log1p-4                 160ns ± 0%   159ns ± 0%     ~     (p=0.828 n=40+40)
Log10-4                 173ns ± 0%   166ns ± 0%   -4.05%  (p=0.000 n=40+40)
Log2-4                 65.3ns ± 0%  58.4ns ± 0%  -10.57%  (p=0.000 n=37+37)
Modf-4                 36.4ns ± 0%  36.4ns ± 0%     ~     (all equal)
Nextafter32-4          36.4ns ± 0%  36.4ns ± 0%     ~     (all equal)
Nextafter64-4          32.7ns ± 0%  32.6ns ± 0%     ~     (p=0.375 n=40+40)
PowInt-4                300ns ± 0%   277ns ± 0%   -7.78%  (p=0.000 n=40+40)
PowFrac-4               676ns ± 0%   635ns ± 0%   -6.00%  (p=0.000 n=40+35)
Pow10Pos-4             17.6ns ± 0%  17.6ns ± 0%     ~     (all equal)
Pow10Neg-4             22.0ns ± 0%  22.0ns ± 0%     ~     (all equal)
Round-4                30.1ns ± 0%  30.1ns ± 0%     ~     (all equal)
RoundToEven-4          38.9ns ± 0%  38.9ns ± 0%     ~     (all equal)
Remainder-4             291ns ± 0%   263ns ± 0%   -9.62%  (p=0.000 n=40+40)
Signbit-4              11.3ns ± 0%  11.3ns ± 0%     ~     (all equal)
Sin-4                   185ns ± 0%   185ns ± 0%     ~     (all equal)
Sincos-4                230ns ± 0%   230ns ± 0%     ~     (all equal)
Sinh-4                  253ns ± 0%   246ns ± 0%   -2.77%  (p=0.000 n=39+39)
SqrtIndirect-4         41.4ns ± 0%  41.4ns ± 0%     ~     (all equal)
SqrtLatency-4          13.8ns ± 0%  13.8ns ± 0%     ~     (all equal)
SqrtIndirectLatency-4  37.0ns ± 0%  37.0ns ± 0%     ~     (p=0.632 n=40+40)
SqrtGoLatency-4         911ns ± 0%   911ns ± 0%   +0.08%  (p=0.000 n=40+40)
SqrtPrime-4            13.2µs ± 0%  13.2µs ± 0%   +0.01%  (p=0.038 n=38+40)
Tan-4                   205ns ± 0%   205ns ± 0%     ~     (all equal)
Tanh-4                  264ns ± 0%   247ns ± 0%   -6.44%  (p=0.000 n=39+32)
Trunc-4                45.2ns ± 0%  45.2ns ± 0%     ~     (all equal)
Y0-4                    796ns ± 0%   792ns ± 0%   -0.55%  (p=0.000 n=35+40)
Y1-4                    804ns ± 0%   797ns ± 0%   -0.82%  (p=0.000 n=24+40)
Yn-4                   1.64µs ± 0%  1.62µs ± 0%   -1.27%  (p=0.000 n=40+39)
Float64bits-4          8.16ns ± 0%  8.16ns ± 0%   +0.04%  (p=0.000 n=35+40)
Float64frombits-4      10.7ns ± 0%  10.7ns ± 0%     ~     (all equal)
Float32bits-4          7.53ns ± 0%  7.53ns ± 0%     ~     (p=0.760 n=40+40)
Float32frombits-4      6.91ns ± 0%  6.91ns ± 0%   -0.04%  (p=0.002 n=32+38)
[Geo mean]              111ns        106ns        -3.98%

Change-Id: I54f4fd7f5160db020b430b556bde59cc0fdb996d
Reviewed-on: https://go-review.googlesource.com/c/go/+/188678
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-08-28 15:41:28 +00:00
Cherry Zhang
4ea7aa7cf3 cmd/compile, runtime: use R20, R21 in ARM64's Duff's devices
Currently we use R16 and R17 for ARM64's Duff's devices.
According to ARM64 ABI, R16 and R17 can be used by the (external)
linker as scratch registers in trampolines. So don't use these
registers to pass information across functions.

It seems unlikely that calling Duff's devices would need a
trampoline in normal cases. But it could happen if the call
target is out of the 128 MB direct jump limit.

The choice of R20 and R21 is kind of arbitrary. The register
allocator allocates from low-numbered registers. High numbered
registers are chosen so it is unlikely to hold a live value and
forces a spill.

Fixes #32773.

Change-Id: Id22d555b5afeadd4efcf62797d1580d641c39218
Reviewed-on: https://go-review.googlesource.com/c/go/+/183842
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2019-06-26 16:01:47 +00:00
Michael Munday
ac8dbe7747 cmd/compile, runtime: make atomic loads/stores sequentially consistent on s390x
The z/Architecture does not guarantee that a load following a store
will not be reordered with that store, unless they access the same
address. Therefore if we want to ensure the sequential consistency
of atomic loads and stores we need to perform serialization
operations after atomic stores.

We do not need to serialize in the runtime when using StoreRel[ease]
and LoadAcq[uire]. The z/Architecture already provides sufficient
ordering guarantees for these operations.

name              old time/op  new time/op  delta
AtomicLoad64-16   0.51ns ± 0%  0.51ns ± 0%     ~     (all equal)
AtomicStore64-16  0.51ns ± 0%  0.60ns ± 9%  +16.47%  (p=0.000 n=17+20)
AtomicLoad-16     0.51ns ± 0%  0.51ns ± 0%     ~     (all equal)
AtomicStore-16    0.51ns ± 0%  0.60ns ± 9%  +16.50%  (p=0.000 n=18+20)

Fixes #32428.

Change-Id: I88d19a4010c46070e4fff4b41587efe4c628d4d9
Reviewed-on: https://go-review.googlesource.com/c/go/+/180439
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-06-06 16:15:43 +00:00
Austin Clements
4a4e05b0b1 cmd/compile,runtime/internal/atomic: add Load8
Change-Id: Id52a5730cf9207ee7ccebac4ef12791dc5720e7c
Reviewed-on: https://go-review.googlesource.com/c/go/+/172283
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-05-03 19:25:37 +00:00
Michael Munday
2c1b5130aa cmd/compile: add math/bits.{Add,Sub}64 intrinsics on s390x
This CL adds intrinsics for the 64-bit addition and subtraction
functions in math/bits. These intrinsics use the condition code
to propagate the carry or borrow bit.

To make the carry chains more efficient I've removed the
'clobberFlags' property from most of the load and store
operations. Originally these ops did clobber flags when using
offsets that didn't fit in a signed 20-bit integer, however
that is no longer true.

As with other platforms the intrinsics are faster when executed
in a chain rather than a loop because currently we need to spill
and restore the carry bit between each loop iteration. We may
be able to reduce the need to do this on s390x (e.g. by using
compare-and-branch instructions that do not clobber flags) in the
future.

name           old time/op  new time/op  delta
Add64          1.21ns ± 2%  2.03ns ± 2%  +67.18%  (p=0.000 n=7+10)
Add64multiple  2.98ns ± 3%  1.03ns ± 0%  -65.39%  (p=0.000 n=10+9)
Sub64          1.23ns ± 4%  2.03ns ± 1%  +64.85%  (p=0.000 n=10+10)
Sub64multiple  3.73ns ± 4%  1.04ns ± 1%  -72.28%  (p=0.000 n=10+8)

Change-Id: I913bbd5e19e6b95bef52f5bc4f14d6fe40119083
Reviewed-on: https://go-review.googlesource.com/c/go/+/174303
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-05-03 10:41:15 +00:00
Carlos Eduardo Seo
50ad09418e cmd/compile: intrinsify math/bits.Add64 for ppc64x
This change creates an intrinsic for Add64 for ppc64x and adds a
testcase for it.

name               old time/op  new time/op  delta
Add64-160          1.90ns ±40%  2.29ns ± 0%     ~     (p=0.119 n=5+5)
Add64multiple-160  6.69ns ± 2%  2.45ns ± 4%  -63.47%  (p=0.016 n=4+5)

Change-Id: I9abe6fb023fdf62eea3c9b46a1820f60bb0a7f97
Reviewed-on: https://go-review.googlesource.com/c/go/+/173758
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2019-04-28 23:51:04 +00:00
erifan01
f8f265b9cf cmd/compile: intrinsify math/bits.Sub64 for arm64
This CL instrinsifies Sub64 with arm64 instruction sequence NEGS, SBCS,
NGC and NEG, and optimzes the case of borrowing chains.

Benchmarks:
name              old time/op       new time/op       delta
Sub-64            2.500000ns +- 0%  2.048000ns +- 1%  -18.08%  (p=0.000 n=10+10)
Sub32-64          2.500000ns +- 0%  2.500000ns +- 0%     ~     (all equal)
Sub64-64          2.500000ns +- 0%  2.080000ns +- 0%  -16.80%  (p=0.000 n=10+7)
Sub64multiple-64  7.090000ns +- 0%  2.090000ns +- 0%  -70.52%  (p=0.000 n=10+10)

Change-Id: I3d2664e009a9635e13b55d2c4567c7b34c2c0655
Reviewed-on: https://go-review.googlesource.com/c/go/+/159018
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-04-22 14:40:20 +00:00
Richard Musiol
cf8cc7f63c cmd/compile: add saturating conversions on wasm
This change adds the GOWASM option "satconv" to enable the generation
of experimental saturating (non-trapping) float-to-int conversions.
It improves the performance of the conversion by 42%.

Previously the conversions had already been augmented with helper
functions to have saturating behavior. Now Wasm.rules is always using
the new operation names and wasm/ssa.go is falling back to the helpers
if the feature is not enabled.

The feature is in phase 4 of the WebAssembly proposal process:
https://github.com/WebAssembly/meetings/blob/master/process/phases.md

More information on the feature can be found at:
https://github.com/WebAssembly/nontrapping-float-to-int-conversions/blob/master/proposals/nontrapping-float-to-int-conversion/Overview.md

Change-Id: Ic6c3688017054ede804b02b6b0ffd4a02ef33ad7
Reviewed-on: https://go-review.googlesource.com/c/go/+/170119
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-04-04 16:10:12 +00:00
Richard Musiol
4d23cbc671 cmd/compile: add sign-extension operators on wasm
This change adds the GOWASM option "signext" to enable
the generation of experimental sign-extension operators.

The feature is in phase 4 of the WebAssembly proposal process:
https://github.com/WebAssembly/meetings/blob/master/process/phases.md

More information on the feature can be found at:
https://github.com/WebAssembly/sign-extension-ops/blob/master/proposals/sign-extension-ops/Overview.md

Change-Id: I6b30069390a8699fbecd9fb4d1d61e13c59b0333
Reviewed-on: https://go-review.googlesource.com/c/go/+/168882
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-28 20:23:05 +00:00
erifan01
d0cbf9bf53 cmd/compile: follow up intrinsifying math/bits.Add64 for arm64
This CL deals with the additional comments of CL 159017.

Change-Id: I4ad3c60c834646d58dc0c544c741b92bfe83fb8b
Reviewed-on: https://go-review.googlesource.com/c/go/+/168857
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-22 15:09:47 +00:00