Stowage/go - Remotebranch.eu

Stowage/go

mirror of https://github.com/golang/go.git synced 2025-12-08 06:10:04 +00:00

Author	SHA1	Message	Date
Joel Sing	2e918c3aab	cmd/compile: provide Load8/Store8 atomic intrinsics on riscv64 Updates #36765 Change-Id: Ieeb6bbc54e4841a1348ad50e80342ec4bc675e07 Reviewed-on: https://go-review.googlesource.com/c/go/+/223557 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2020-03-17 06:38:32 +00:00
Joel Sing	26154f31ad	cmd/compile: use NEG/NEGW pseudo-instructions on riscv64 Also rewrite subtraction of zero to NEG/NEGW. Change-Id: I216e286d1860055f2a07fe2f772cd50f366ea097 Reviewed-on: https://go-review.googlesource.com/c/go/+/221691 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-03-15 08:19:07 +00:00
Joel Sing	7b2f0ba5b9	cmd/compile: use NOT pseudo-instruction on riscv64 Change-Id: I24a72c3fb8d72a47cfded4b523c5d7aa2d40419d Reviewed-on: https://go-review.googlesource.com/c/go/+/221690 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-03-15 08:18:17 +00:00
Russ Cox	877ef86bec	cmd/compile: add spectre mitigation mode enabled by -spectre This commit adds a new cmd/compile flag -spectre, which accepts a comma-separated list of possible Spectre mitigations to apply, or the empty string (none), or "all". The only known mitigation right now is "index", which uses conditional moves to ensure that x86-64 CPUs do not speculate past index bounds checks. Speculating past index bounds checks may be problematic on systems running privileged servers that accept requests from untrusted users who can execute their own programs on the same machine. (And some more constraints that make it even more unlikely in practice.) The cases this protects against are analogous to the ones Microsoft explains in the "Array out of bounds load/store feeding ..." sections here: https://docs.microsoft.com/en-us/cpp/security/developer-guidance-speculative-execution?view=vs-2019#array-out-of-bounds-load-feeding-an-indirect-branch Change-Id: Ib7532d7e12466b17e04c4e2075c2a456dc98f610 Reviewed-on: https://go-review.googlesource.com/c/go/+/222660 Reviewed-by: Keith Randall <khr@golang.org>	2020-03-13 19:05:46 +00:00
Joel Sing	cc6a8bd0d7	cmd/compile: add zero store operations for riscv64 This allows for zero stores to be performed using the zero register, rather than loading a separate register with zero. Change-Id: Ic81d8dbcdacbb2ca2c3f77682ff5ad7cdc33d18d Reviewed-on: https://go-review.googlesource.com/c/go/+/221684 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2020-03-05 11:56:33 +00:00
Josh Bleecher Snyder	37fc092be1	cmd/compile: remove duplicate ppc64 rules Const64 gets lowered to MOVDconst. Change rules using interior Const64 to use MOVDconst instead, to be less dependent on rule application order. As a result of doing this, some of the rules end up being exact duplicates; remove those. We had those exact duplicates because of the order dependency; ppc64 had no way to optimize away shifts by a constant if the initial lowering didn't catch it. Add those optimizations as well. The outcome is the same, but this makes the overall rules more robust. Change-Id: Iadd97a9fe73d52358d571d022ace145e506d160b Reviewed-on: https://go-review.googlesource.com/c/go/+/220877 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>	2020-03-02 21:59:19 +00:00
Ruixin(Peter) Bao	2962c96c9f	cmd/compile: lower float to uint conversions on s390x Add rules for lowering float <-> unsigned int on s390x. During compilation, Cvt64Uto64F rule triggers around 80 times, Cvt64Fto64U rule triggers around 20 times, Cvt64Uto32F rule triggers around 5 times. Change-Id: If4c9d128b9132fce8c0bea9abc09cb43a5df7989 Reviewed-on: https://go-review.googlesource.com/c/go/+/209177 Reviewed-by: Michael Munday <mike.munday@ibm.com> Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2020-02-29 21:37:47 +00:00
Josh Bleecher Snyder	d889f0cb10	cmd/compile: use correct types in phiopt We try to preserve type correctness of generic ops. phiopt modified a bool to be an int without a conversion. Add a conversion. There are a few random fluctations in the generated code as a result, but nothing noteworthy or systematic. no binary size changes file before after Δ % math.s 35966 35961 -5 -0.014% debug/dwarf.s 108141 108147 +6 +0.006% crypto/dsa.s 6047 6044 -3 -0.050% image/png.s 42882 42885 +3 +0.007% go/parser.s 80281 80278 -3 -0.004% cmd/internal/obj.s 115116 115113 -3 -0.003% go/types.s 322130 322118 -12 -0.004% cmd/internal/obj/arm64.s 151679 151685 +6 +0.004% go/internal/gccgoimporter.s 56487 56493 +6 +0.011% cmd/test2json.s 1650 1647 -3 -0.182% cmd/link/internal/loadelf.s 35442 35443 +1 +0.003% cmd/go/internal/work.s 305039 305035 -4 -0.001% cmd/link/internal/ld.s 544835 544834 -1 -0.000% net/http.s 558777 558774 -3 -0.001% cmd/compile/internal/ssa.s 3926551 3926994 +443 +0.011% cmd/compile/internal/gc.s 1552320 1552321 +1 +0.000% total 18862241 18862670 +429 +0.002% Change-Id: I4289e773be6be534ea3f907d68f614441b8f9b46 Reviewed-on: https://go-review.googlesource.com/c/go/+/221607 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Keith Randall <khr@golang.org>	2020-02-29 17:02:29 +00:00
Josh Bleecher Snyder	2cf3ebaf3d	cmd/compile: add dedicated ARM64BitField aux type The goal here is improved AuxInt printing in ssa.html. Instead of displaying an inscrutable encoded integer, it displays something like v25 (28) = UBFX <int> [lsb=4,width=8] v52 which is much nicer for debugging. Change-Id: I40713ff7f4a857c4557486cdf73c2dff137511ca Reviewed-on: https://go-review.googlesource.com/c/go/+/221420 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-02-28 14:52:13 +00:00
Joel Sing	8955a56da0	cmd/compile: improve SignExt32to64 on riscv64 SignExt32to64 can be implemented with a single ADDIW instruction, rather than the two shifts that are in use currently. Change-Id: Ie1bbaef4018f1ba5162773fc64fa5a887457cfc9 Reviewed-on: https://go-review.googlesource.com/c/go/+/220922 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2020-02-28 14:33:28 +00:00
Joel Sing	c27dd0c9e5	cmd/compile: improve Eq32/Neq32 on riscv64 Use SUBW to perform a 32-bit subtraction, rather than zero extending from 32 to 64 bits. This reduces Eq32 and Neq32 to two instructions, rather than the four instructions required previously. Change-Id: Ib2798324881e9db842c864e91a0c1b1e48c4b67b Reviewed-on: https://go-review.googlesource.com/c/go/+/220921 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-02-26 17:59:57 +00:00
Michael Munday	cb74dcc172	cmd/compile: remove Greater* and Geq* generic integer ops The generic Greater and Geq ops can always be replaced with the Less and Leq ops. This CL therefore removes them. This simplifies the compiler since it reduces the number of operations that need handling in both code and in rewrite rules. This will be especially true when adding control flow optimizations such as the integer-in-range optimizations in CL 165998. Change-Id: If0648b2b19998ac1bddccbf251283f3be4ec3040 Reviewed-on: https://go-review.googlesource.com/c/go/+/220417 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-02-26 13:11:53 +00:00
Josh Bleecher Snyder	1894842b75	cmd/compile: allow values with aux Sym to fault on nil args And use this newfound power to more precisely describe some PPC64 ops. Change-Id: Idb2b669d74fbab5f3508edf19f7e3347306b0daf Reviewed-on: https://go-review.googlesource.com/c/go/+/217002 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2020-02-22 15:33:55 +00:00
Josh Bleecher Snyder	f53f987ebf	cmd/compile: merge more shifts into stores Updates #36223 (Might fix #36223. I'm not sure whether there are more outstanding.) This helps a bit, but not as much as I'd expected/hoped. file before after Δ % runtime.s 477286 477256 -30 -0.006% bytes.s 31089 31085 -4 -0.013% time.s 83561 83547 -14 -0.017% strings.s 43284 43280 -4 -0.009% compress/flate.s 51374 51295 -79 -0.154% math/big.s 184283 184256 -27 -0.015% crypto/elliptic.s 51649 51577 -72 -0.139% crypto/sha512.s 8661 8644 -17 -0.196% crypto/sha1.s 6975 6959 -16 -0.229% crypto/sha256.s 6412 6393 -19 -0.296% vendor/golang.org/x/text/unicode/bidi.s 27158 27146 -12 -0.044% vendor/golang.org/x/text/unicode/norm.s 66802 66788 -14 -0.021% net/http.s 560936 560929 -7 -0.001% text/template.s 96475 96467 -8 -0.008% go/parser.s 80284 80280 -4 -0.005% text/tabwriter.s 9618 9611 -7 -0.073% go/printer.s 78502 78499 -3 -0.004% go/types.s 321815 321807 -8 -0.002% internal/xcoff.s 23175 23171 -4 -0.017% image/jpeg.s 36609 36587 -22 -0.060% cmd/vendor/golang.org/x/arch/x86/x86asm.s 81274 81001 -273 -0.336% cmd/internal/obj.s 115184 115126 -58 -0.050% cmd/internal/obj/arm64.s 151502 151487 -15 -0.010% cmd/internal/obj/s390x.s 128054 128046 -8 -0.006% cmd/internal/obj/wasm.s 44295 44291 -4 -0.009% cmd/compile/internal/ssa.s 4201992 4209504 +7512 +0.179% cmd/compile/internal/gc.s 1555029 1555011 -18 -0.001% total 9792875 9799640 +6765 +0.069% Change-Id: If4a857c0953a766578e68aa299b112a20d9b2b86 Reviewed-on: https://go-review.googlesource.com/c/go/+/213704 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-02-20 18:37:38 +00:00
Josh Bleecher Snyder	6817210edf	cmd/compile: mark amd64 HMUL ops as not commutative HMUL is commutative. However, it has asymmetric register requirements. There are existing rewrite rules to place arguments in preferable slots. Due to a bug, the existing rulegen commutativity engine doesn't generate the commuted form of the HMUL rules. The commuted form of those rewrite rules cause infinite loops. In order to fix the rulegen commutativity bug, we need to choose between eliminating those rewrite rules and marking HMUL ops as not commutative. This change chooses the latter, since doing so yields better optimization results on std+cmd. Removing the rewrite rules yields only text size regressions: file before after Δ % runtime.s 477257 477269 +12 +0.003% time.s 83552 83612 +60 +0.072% encoding/asn1.s 57378 57382 +4 +0.007% cmd/go/internal/modfetch/codehost.s 89822 89829 +7 +0.008% cmd/internal/test2json.s 9459 9466 +7 +0.074% cmd/go/internal/test.s 57665 57678 +13 +0.023% Marking HMUL as not commutative actually yields (mostly) improvements: file before after Δ % runtime.s 477257 477247 -10 -0.002% math.s 35985 35992 +7 +0.019% strconv.s 53486 53462 -24 -0.045% syscall.s 82483 82446 -37 -0.045% time.s 83552 83561 +9 +0.011% os.s 52691 52684 -7 -0.013% archive/zip.s 42285 42272 -13 -0.031% encoding/asn1.s 57378 57329 -49 -0.085% encoding/base64.s 12156 12094 -62 -0.510% net.s 296286 296276 -10 -0.003% encoding/base32.s 9720 9658 -62 -0.638% net/http.s 560931 560907 -24 -0.004% net/smtp.s 14421 14411 -10 -0.069% cmd/vendor/golang.org/x/sys/unix.s 74307 74266 -41 -0.055% The regressions are minor, and are in functions math.cbrt, time.Time.String, and time.Date. Change-Id: I9f6d9ee71654e5b70381cac77b0ac26011f4ea12 Reviewed-on: https://go-review.googlesource.com/c/go/+/213701 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Keith Randall <khr@golang.org>	2020-02-20 14:58:45 +00:00
Joel Sing	98d2717499	cmd/compile: implement compiler for riscv64 Based on riscv-go port. Updates #27532 Change-Id: Ia329daa243db63ff334053b8807ea96b97ce3acf Reviewed-on: https://go-review.googlesource.com/c/go/+/204631 Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-01-18 14:41:40 +00:00
Cherry Zhang	7673884a7f	cmd/compile: don't fuse branches with side effects Count Values with side effects but no use as live, and don't fuse branches that contain such Values. (This can happen e.g. when it is followed by an infinite loop.) Otherwise this may lead to miscompilation (side effect fired at wrong condition) or ICE (two stores live simultaneously). Fixes #36005. Change-Id: If202eae4b37cb7f0311d6ca120ffa46609925157 Reviewed-on: https://go-review.googlesource.com/c/go/+/210179 Reviewed-by: Keith Randall <khr@golang.org>	2019-12-06 00:57:11 +00:00
Michael Munday	b3885dbc93	cmd/compile, runtime: intrinsify atomic And8 and Or8 on s390x Intrinsify these functions to match other platforms. Update the sequence of instructions used in the assembly implementations to match the intrinsics. Also, add a micro benchmark so we can more easily measure the performance of these two functions: name old time/op new time/op delta And8-8 5.33ns ± 7% 2.55ns ± 8% -52.12% (p=0.000 n=20+20) And8Parallel-8 7.39ns ± 5% 3.74ns ± 4% -49.34% (p=0.000 n=20+20) Or8-8 4.84ns ±15% 2.64ns ±11% -45.50% (p=0.000 n=20+20) Or8Parallel-8 7.27ns ± 3% 3.84ns ± 4% -47.10% (p=0.000 n=19+20) By using a 'rotate then xor selected bits' instruction combined with either a 'load and and' or a 'load and or' instruction we can implement And8 and Or8 with far fewer instructions. Replacing 'compare and swap' with atomic instructions may also improve performance when there is contention. Change-Id: I28bb8032052b73ae8ccdf6e4c612d2877085fa01 Reviewed-on: https://go-review.googlesource.com/c/go/+/204277 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2019-11-11 15:23:59 +00:00
Cherry Zhang	ceca99bdeb	cmd/compile, cmd/internal/obj/ppc64: mark unsafe points We'll use CTR as a scratch register for call injection. Mark code sequences that use CTR as unsafe for async preemption. Currently it is only used in LoweredZero and LoweredMove. It is unfortunate that they are nonpreemptible. But I think it is still better than using LR for call injection and marking all leaf functions nonpreemptible. Also mark the prologue of large frame functions nonpreemptible, as we write below SP. Change-Id: I05a75431499f3f4b2f23651a7b17f7fcf2afbe06 Reviewed-on: https://go-review.googlesource.com/c/go/+/203823 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-11-07 19:20:35 +00:00
Cherry Zhang	a96cfa75c6	cmd/compile: mark unsafe points for MIPS and MIPS64 Mark atomic LL/SC loops as unsafe for async preemption, as they use REGTMP. Change-Id: I5be7f93ad3ee337049ec7c3efd6fdc30eef87d97 Reviewed-on: https://go-review.googlesource.com/c/go/+/203719 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2019-11-07 19:18:59 +00:00
Russ Cox	543c6d2e0d	math, cmd/compile: rename Fma to FMA This API was added for #25819, where it was discussed as math.FMA. The commit adding it used math.Fma, presumably for consistency with the rest of the unusual names in package math (Sincos, Acosh, Erfcinv, Float32bits, etc). I believe that using an idiomatic Go name is more important here than consistency with these other names, most of which are historical baggage from C's standard library. Early additions like Float32frombits happened before "uppercase for export" (so they were originally like "float32frombits") and they were not properly reconsidered when we uppercased the symbols to export them. That's a mistake we live with. The names of functions we have added since then, and even a few that were legacy, are more properly Go-cased, such as IsNaN, IsInf, and RoundToEven, rather than Isnan, Isinf, and Roundtoeven. And also constants like MaxFloat32. For new API, we should keep using proper Go-cased symbols instead of minimally-upper-cased-C symbols. So math.FMA, not math.Fma. This API has not yet been released, so this change does not break the compatibility promise. This CL also modifies cmd/compile, since the compiler knows the name of the function. I could have stopped at changing the string constants, but it seemed to make more sense to use a consistent casing everywhere. Change-Id: I0f6f3407f41e99bfa8239467345c33945088896e Reviewed-on: https://go-review.googlesource.com/c/go/+/205317 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2019-11-07 14:51:06 +00:00
Cherry Zhang	4a7ed1fab7	cmd/compile: mark architecture-specific unsafe points Introduce a mechanism for marking architecture-specific Ops unsafe. And mark ones that use REGTMP on ARM64, as for async preemption we will be using REGTMP as a temporary register in the injected call. Change-Id: I8ff22e87d8f9cb10d02a2f0af7c12ad6d7d58f54 Reviewed-on: https://go-review.googlesource.com/c/go/+/203459 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Austin Clements <austin@google.com>	2019-11-05 02:55:11 +00:00
Austin Clements	ec10e6f364	cmd/compile: fix missing lowering of atomic {Load,Store}8 CL 203284 added a compiler intrinsics from atomic Load8 and Store8 on several architectures, but missed the lowering on MIPS. This CL fixes that. Updates #10958, #24543. Change-Id: I82e88971554fe8c33ad2bf195a633c44b9ac4cf7 Reviewed-on: https://go-review.googlesource.com/c/go/+/203977 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-10-29 13:48:29 +00:00
Austin Clements	97592b3c14	cmd/compile: intrinsics for runtime/internal/atomic.Store8 For #10958, #24543, but makes sense on its own. Change-Id: I2a87dab66b82a1863e4b6512b1f8def51463ce2a Reviewed-on: https://go-review.googlesource.com/c/go/+/203284 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-10-29 03:18:55 +00:00
smasher164	58b031949b	cmd/compile: add fma intrinsic for arm This change introduces an arm intrinsic that generates the FMULAD instruction for the fused-multiply-add operation on systems that support it. System support is detected via cpu.ARM.HasVFPv4. A rewrite rule translates the generic intrinsic to FMULAD. Updates #25819. Change-Id: I8459e5dd1cdbdca35f88a78dbeb7d387f1e20efa Reviewed-on: https://go-review.googlesource.com/c/go/+/142117 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-10-21 17:42:47 +00:00
smasher164	7a6da218b1	cmd/compile: add fma intrinsic for amd64 To permit ssa-level optimization, this change introduces an amd64 intrinsic that generates the VFMADD231SD instruction for the fused-multiply-add operation on systems that support it. System support is detected via cpu.X86.HasFMA. A rewrite rule can then translate the generic ssa intrinsic ("Fma") to VFMADD231SD. The benchmark compares the software implementation (old) with the intrinsic (new). name old time/op new time/op delta Fma-4 27.2ns ± 1% 1.0ns ± 9% -96.48% (p=0.008 n=5+5) Updates #25819. Change-Id: I966655e5f96817a5d06dff5942418a3915b09584 Reviewed-on: https://go-review.googlesource.com/c/go/+/137156 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-10-21 16:42:10 +00:00
smasher164	33425ab8db	cmd/compile: introduce generic ssa intrinsic for fused-multiply-add In order to make math.FMA a compiler intrinsic for ISAs like ARM64, PPC64[le], and S390X, a generic 3-argument opcode "Fma" is provided and rewritten as ARM64: (Fma x y z) -> (FMADDD z x y) PPC64: (Fma x y z) -> (FMADD x y z) S390X: (Fma x y z) -> (FMADD z x y) Updates #25819. Change-Id: Ie5bc628311e6feeb28ddf9adaa6e702c8c291efa Reviewed-on: https://go-review.googlesource.com/c/go/+/131959 Run-TryBot: Akhil Indurti <aindurti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-10-21 16:24:15 +00:00
Ben Shi	11d7775c9f	cmd/compile: remove some nacl SSA rules Updates golang/go#30439 Change-Id: I7ef5301fbd650d26a37a1241ddf7ca1ccd58b89d Reviewed-on: https://go-review.googlesource.com/c/go/+/200941 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2019-10-15 16:45:31 +00:00
Michael Munday	ac8966aa58	cmd/compile/internal/ssa: fix block AuxIntType lookup Avoid an out-of-range error when calling LongString on a generic block. Change-Id: I33ca88940d899bc71e3155bc63d2aa925cf83230 Reviewed-on: https://go-review.googlesource.com/c/go/+/200737 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc>	2019-10-11 16:30:20 +00:00
Lynn Boger	816ff44479	cmd/compile: use vsx loads and stores for LoweredMove, LoweredZero on ppc64x This improves the code generated for LoweredMove and LoweredZero by using LXVD2X and STXVD2X to move 16 bytes at a time. These instructions are now used if the size to be moved or zeroed is >= 64. These same instructions have already been used in the asm implementations for memmove and memclr. Some examples where this shows an improvement on power8: MakeSlice/Byte 27.3ns ± 1% 25.2ns ± 0% -7.69% MakeSlice/Int16 40.2ns ± 0% 35.2ns ± 0% -12.39% MakeSlice/Int 94.9ns ± 1% 77.9ns ± 0% -17.92% MakeSlice/Ptr 129ns ± 1% 103ns ± 0% -20.16% MakeSlice/Struct/24 176ns ± 1% 131ns ± 0% -25.67% MakeSlice/Struct/32 200ns ± 1% 142ns ± 0% -29.09% MakeSlice/Struct/40 220ns ± 2% 156ns ± 0% -28.82% GrowSlice/Byte 81.4ns ± 0% 73.4ns ± 0% -9.88% GrowSlice/Int16 118ns ± 1% 98ns ± 0% -17.03% GrowSlice/Int 178ns ± 1% 134ns ± 1% -24.65% GrowSlice/Ptr 249ns ± 4% 212ns ± 0% -14.94% GrowSlice/Struct/24 294ns ± 5% 215ns ± 0% -27.08% GrowSlice/Struct/32 315ns ± 1% 248ns ± 0% -21.49% GrowSlice/Struct/40 382ns ± 4% 289ns ± 1% -24.38% ExtendSlice/IntSlice 109ns ± 1% 90ns ± 1% -17.51% ExtendSlice/PointerSlice 142ns ± 2% 118ns ± 0% -16.75% ExtendSlice/NoGrow 6.02ns ± 0% 5.88ns ± 0% -2.33% Append 27.2ns ± 0% 27.6ns ± 0% +1.38% AppendGrowByte 4.20ms ± 3% 2.60ms ± 0% -38.18% AppendGrowString 134ms ± 3% 102ms ± 2% -23.62% AppendSlice/1Bytes 5.65ns ± 0% 5.67ns ± 0% +0.35% AppendSlice/4Bytes 6.40ns ± 0% 6.55ns ± 0% +2.34% AppendSlice/7Bytes 8.74ns ± 0% 8.84ns ± 0% +1.14% AppendSlice/8Bytes 5.68ns ± 0% 5.70ns ± 0% +0.40% AppendSlice/15Bytes 9.31ns ± 0% 9.39ns ± 0% +0.86% AppendSlice/16Bytes 14.0ns ± 0% 5.8ns ± 0% -58.32% AppendSlice/32Bytes 5.72ns ± 0% 5.68ns ± 0% -0.66% AppendSliceLarge/1024Bytes 918ns ± 8% 615ns ± 1% -33.00% AppendSliceLarge/4096Bytes 3.25µs ± 1% 1.92µs ± 1% -40.84% AppendSliceLarge/16384Bytes 8.70µs ± 2% 4.69µs ± 0% -46.08% AppendSliceLarge/65536Bytes 18.1µs ± 3% 7.9µs ± 0% -56.30% AppendSliceLarge/262144Bytes 69.8µs ± 2% 25.9µs ± 0% -62.91% AppendSliceLarge/1048576Bytes 258µs ± 1% 93µs ± 0% -63.96% Change-Id: I21625dbe231a2029ddb9f7d73f5a6417b35c1e49 Reviewed-on: https://go-review.googlesource.com/c/go/+/199639 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-10-08 16:41:02 +00:00
Michael Munday	6ec4c71eef	cmd/compile: add SSA rules for s390x compare-and-branch instructions This commit adds SSA rules for the s390x combined compare-and-branch instructions. These have a shorter encoding than separate compare and branch instructions and they also don't clobber the condition code (a.k.a. flag register) reducing pressure on the flag allocator. I have deleted the 'loop_test.go' file and replaced it with a new codegen test which performs a wider range of checks. Object sizes from compilebench: name old object-bytes new object-bytes delta Template 562kB ± 0% 561kB ± 0% -0.28% (p=0.000 n=10+10) Unicode 217kB ± 0% 217kB ± 0% -0.17% (p=0.000 n=10+10) GoTypes 2.03MB ± 0% 2.02MB ± 0% -0.59% (p=0.000 n=10+10) Compiler 8.16MB ± 0% 8.11MB ± 0% -0.62% (p=0.000 n=10+10) SSA 27.4MB ± 0% 27.0MB ± 0% -1.45% (p=0.000 n=10+10) Flate 356kB ± 0% 356kB ± 0% -0.12% (p=0.000 n=10+10) GoParser 438kB ± 0% 436kB ± 0% -0.51% (p=0.000 n=10+10) Reflect 1.37MB ± 0% 1.37MB ± 0% -0.42% (p=0.000 n=10+10) Tar 485kB ± 0% 483kB ± 0% -0.39% (p=0.000 n=10+10) XML 630kB ± 0% 621kB ± 0% -1.45% (p=0.000 n=10+10) [Geo mean] 1.14MB 1.13MB -0.60% name old text-bytes new text-bytes delta HelloSize 763kB ± 0% 754kB ± 0% -1.30% (p=0.000 n=10+10) CmdGoSize 10.7MB ± 0% 10.6MB ± 0% -0.91% (p=0.000 n=10+10) [Geo mean] 2.86MB 2.82MB -1.10% Change-Id: Ibca55d9c0aa1254aee69433731ab5d26a43a7c18 Reviewed-on: https://go-review.googlesource.com/c/go/+/198037 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-10-08 10:03:04 +00:00
Michael Munday	cf03238020	cmd/compile: use numeric condition code masks on s390x Prior to this CL conditional branches on s390x always used an extended mnemonic such as BNE, BLT and so on to represent branch instructions with different condition code masks. This CL adds support for numeric condition code masks to the s390x SSA backend so that we can encode the condition under which a Block's successor is chosen as a field in that Block rather than in its type. This change will be useful as we come to add support for combined compare-and-branch instructions. Rather than trying to add extended mnemonics for every possible combination of mask and compare-and- branch instruction we can instead use a single mnemonic for each instruction. Change-Id: Idb7458f187b50906877d683695c291dff5279553 Reviewed-on: https://go-review.googlesource.com/c/go/+/197178 Reviewed-by: Keith Randall <khr@golang.org>	2019-09-26 14:47:12 +00:00
Cherry Zhang	55924135ee	cmd/compile: fix register masks of ANDCC et al. on PPC64 PPC64's ANDCC, ORCC, XORCC SSA ops produce a flags value, which should not have register mask of an integer register. Fixes #34468. Change-Id: Ic762e423b20275fd9f8118dae7951c258d59738c Reviewed-on: https://go-review.googlesource.com/c/go/+/196960 Reviewed-by: Keith Randall <khr@golang.org>	2019-09-23 21:16:33 +00:00
Richard Musiol	1c50fcf853	cmd/compile: add 32 bit float registers/variables on wasm Before this change, wasm only used float variables with a size of 64 bit and applied rounding to 32 bit precision where necessary. This change adds proper 32 bit float variables. Reduces the size of pkg/js_wasm by 254 bytes. Change-Id: Ieabe846a8cb283d66def3cdf11e2523b3b31f345 Reviewed-on: https://go-review.googlesource.com/c/go/+/195117 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-09-19 20:26:22 +00:00
Lynn Boger	7987238d9c	cmd/asm,cmd/compile: clean up isel codegen on ppc64x This cleans up the isel code generation in ssa for ppc64x. Current there is no isel op and the isel code is only generated from pseudo ops in ppc64/ssa.go, and only using operands with values 0 or 1. When the isel is generated, there is always a load of 1 into the temp register before it. This change implements the isel op so it can be used in PPC64.rules, and can recognize operand values other than 0 or 1. This also eliminates the forced load of 1, so it will be loaded only if needed. This will make the isel code generation consistent with other ops, and allow future rule changes that can take advantage of having a more general purpose isel rule. Change-Id: I363e1dbd3f7f5dfecb53187ad51cce409a8d1f8d Reviewed-on: https://go-review.googlesource.com/c/go/+/195057 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>	2019-09-18 15:54:32 +00:00
Ruixin Bao	98aa97806b	cmd/compile: add math/bits.Mul64 intrinsic on s390x This change adds an intrinsic for Mul64 on s390x. To achieve that, a new assembly instruction, MLGR, is introduced in s390x/asmz.go. This assembly instruction directly uses an existing instruction on Z and supports multiplication of two 64 bit unsigned integer and stores the result in two separate registers. In this case, we require the multiplcand to be stored in register R3 and the output result (the high and low 64 bit of the product) to be stored in R2 and R3 respectively. A test case is also added. Benchmark: name old time/op new time/op delta Mul-18 11.1ns ± 0% 1.4ns ± 0% -87.39% (p=0.002 n=8+10) Mul32-18 2.07ns ± 0% 2.07ns ± 0% ~ (all equal) Mul64-18 11.1ns ± 1% 1.4ns ± 0% -87.42% (p=0.000 n=10+10) Change-Id: Ieca6ad1f61fff9a48a31d50bbd3f3c6d9e6675c1 Reviewed-on: https://go-review.googlesource.com/c/go/+/194572 Reviewed-by: Michael Munday <mike.munday@ibm.com> Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-09-13 09:04:48 +00:00
Michael Munday	5c5f217b63	cmd/compile: improve s390x sign/zero extension removal This CL gets rid of the MOVDreg and MOVDnop SSA operations on s390x. They were originally inserted to help avoid situations where a sign/zero extension was elided but a spill invalidated the optimization. It's not really clear we need to do this though (amd64 doesn't have these ops for example) so long as we are careful when removing sign/zero extensions. Also, the MOVDreg technique doesn't work if the register is spilled before the MOVDreg op (I haven't seen that in practice). Removing these ops reduces the complexity of the rules and also allows us to unblock optimizations. For example, the compiler can now merge the loads in binary.{Big,Little}Endian.PutUint16 which it wasn't able to do before. This CL reduces the size of the .text section in the go tool by about 4.7KB (0.09%). Change-Id: Icaddae7f2e4f9b2debb6fabae845adb3f73b41db Reviewed-on: https://go-review.googlesource.com/c/go/+/173897 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-09-10 13:17:24 +00:00
Brian Kessler	b003afe4fe	cmd/compile: intrinsify RotateLeft32 on wasm wasm has 32-bit versions of all integer operations. This change lowers RotateLeft32 to i32.rotl on wasm and intrinsifies the math/bits call. Benchmarking on amd64 under node.js this is ~25% faster. node v10.15.3/amd64 name old time/op new time/op delta RotateLeft 8.37ns ± 1% 8.28ns ± 0% -1.05% (p=0.029 n=4+4) RotateLeft8 11.9ns ± 1% 11.8ns ± 0% ~ (p=0.167 n=5+5) RotateLeft16 11.8ns ± 0% 11.8ns ± 0% ~ (all equal) RotateLeft32 11.9ns ± 1% 8.7ns ± 0% -26.32% (p=0.008 n=5+5) RotateLeft64 8.31ns ± 1% 8.43ns ± 2% ~ (p=0.063 n=5+5) Updates #31265 Change-Id: I5b8e155978faeea536c4f6427ac9564d2f096a46 Reviewed-on: https://go-review.googlesource.com/c/go/+/182359 Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Richard Musiol <neelance@gmail.com>	2019-08-31 17:03:04 +00:00
Meng Zhuo	307544f427	runtime, cmd/compile: implement and use DUFFCOPY on MIPS64 OS: Linux loongson 3.10.84 mips64el CPU: Loongson 3A3000 quad core name old time/op new time/op delta BinaryTree17 23.5s ± 1% 23.2s ± 0% -1.12% (p=0.008 n=5+5) Fannkuch11 10.2s ± 0% 10.1s ± 0% -0.19% (p=0.008 n=5+5) FmtFprintfEmpty 450ns ± 0% 446ns ± 1% -0.89% (p=0.024 n=5+5) FmtFprintfString 722ns ± 1% 721ns ± 1% ~ (p=0.762 n=5+5) FmtFprintfInt 693ns ± 2% 691ns ± 2% ~ (p=0.889 n=5+5) FmtFprintfIntInt 912ns ± 1% 911ns ± 0% ~ (p=0.722 n=5+5) FmtFprintfPrefixedInt 1.35µs ± 2% 1.35µs ± 2% ~ (p=1.000 n=5+5) FmtFprintfFloat 1.79µs ± 0% 1.78µs ± 0% ~ (p=0.683 n=5+5) FmtManyArgs 3.46µs ± 1% 3.48µs ± 1% ~ (p=0.246 n=5+5) GobDecode 48.8ms ± 1% 48.6ms ± 0% ~ (p=0.222 n=5+5) GobEncode 37.7ms ± 1% 37.4ms ± 1% ~ (p=0.095 n=5+5) Gzip 1.72s ± 1% 1.72s ± 0% ~ (p=0.905 n=5+4) Gunzip 342ms ± 0% 342ms ± 0% ~ (p=0.421 n=5+5) HTTPClientServer 219µs ± 1% 219µs ± 1% ~ (p=1.000 n=5+5) JSONEncode 89.1ms ± 1% 89.4ms ± 1% ~ (p=0.222 n=5+5) JSONDecode 292ms ± 1% 291ms ± 0% ~ (p=0.421 n=5+5) Mandelbrot200 15.7ms ± 0% 15.6ms ± 0% ~ (p=0.690 n=5+5) GoParse 19.5ms ± 1% 19.6ms ± 1% ~ (p=0.310 n=5+5) RegexpMatchEasy0_32 534ns ± 1% 529ns ± 1% ~ (p=0.056 n=5+5) RegexpMatchEasy0_1K 2.75µs ± 0% 2.74µs ± 0% -0.46% (p=0.008 n=5+5) RegexpMatchEasy1_32 572ns ± 2% 565ns ± 3% ~ (p=0.310 n=5+5) RegexpMatchEasy1_1K 4.15µs ± 0% 4.15µs ± 1% ~ (p=0.548 n=5+5) RegexpMatchMedium_32 31.2ns ± 0% 31.1ns ± 0% -0.45% (p=0.016 n=5+4) RegexpMatchMedium_1K 235µs ± 1% 235µs ± 0% ~ (p=1.000 n=5+5) RegexpMatchHard_32 13.9µs ± 1% 13.5µs ± 1% -2.74% (p=0.008 n=5+5) RegexpMatchHard_1K 416µs ± 2% 410µs ± 2% ~ (p=0.056 n=5+5) Revcomp 6.36s ± 0% 6.34s ± 0% -0.31% (p=0.008 n=5+5) Template 352ms ± 1% 353ms ± 0% +0.45% (p=0.032 n=5+5) TimeParse 2.04µs ± 4% 2.01µs ± 0% ~ (p=0.056 n=5+5) TimeFormat 2.97µs ± 0% 2.97µs ± 0% ~ (p=1.000 n=5+5) name old speed new speed delta GobDecode 15.7MB/s ± 1% 15.8MB/s ± 0% ~ (p=0.206 n=5+5) GobEncode 20.4MB/s ± 1% 20.5MB/s ± 1% ~ (p=0.056 n=5+5) Gzip 11.3MB/s ± 1% 11.3MB/s ± 0% ~ (p=0.841 n=5+4) Gunzip 56.7MB/s ± 0% 56.8MB/s ± 0% ~ (p=0.389 n=5+5) JSONEncode 21.8MB/s ± 1% 21.7MB/s ± 1% ~ (p=0.246 n=5+5) JSONDecode 6.66MB/s ± 0% 6.67MB/s ± 0% ~ (p=0.857 n=4+5) GoParse 2.97MB/s ± 1% 2.96MB/s ± 1% ~ (p=0.238 n=5+5) RegexpMatchEasy0_32 59.9MB/s ± 1% 60.5MB/s ± 1% +0.92% (p=0.032 n=5+5) RegexpMatchEasy0_1K 372MB/s ± 0% 374MB/s ± 0% +0.46% (p=0.008 n=5+5) RegexpMatchEasy1_32 56.0MB/s ± 2% 56.7MB/s ± 3% ~ (p=0.310 n=5+5) RegexpMatchEasy1_1K 247MB/s ± 0% 247MB/s ± 1% ~ (p=0.548 n=5+5) RegexpMatchMedium_32 32.0MB/s ± 0% 32.1MB/s ± 0% ~ (p=0.135 n=5+5) RegexpMatchMedium_1K 4.35MB/s ± 1% 4.35MB/s ± 1% ~ (p=0.825 n=5+5) RegexpMatchHard_32 2.30MB/s ± 1% 2.37MB/s ± 1% +2.78% (p=0.008 n=5+5) RegexpMatchHard_1K 2.47MB/s ± 1% 2.50MB/s ± 2% ~ (p=0.095 n=5+5) Revcomp 40.0MB/s ± 0% 40.1MB/s ± 0% +0.31% (p=0.016 n=5+5) Template 5.51MB/s ± 1% 5.49MB/s ± 0% ~ (p=0.190 n=5+5) Change-Id: I540a2e4e7992376ce04f93b332f64fc3b6071237 Reviewed-on: https://go-review.googlesource.com/c/go/+/185078 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-08-28 15:49:59 +00:00
Ben Shi	3cfd003a8a	cmd/compile: optimize ARM's math.bits.RotateLeft32 This CL optimizes math.bits.RotateLeft32 to inline "MOVW Rx@>Ry, Rd" on ARM. The benchmark results of math/bits show some improvements. name old time/op new time/op delta RotateLeft-4 9.42ns ± 0% 6.91ns ± 0% -26.66% (p=0.000 n=40+33) RotateLeft8-4 8.79ns ± 0% 8.79ns ± 0% -0.04% (p=0.000 n=40+31) RotateLeft16-4 8.79ns ± 0% 8.79ns ± 0% -0.04% (p=0.000 n=40+32) RotateLeft32-4 8.16ns ± 0% 7.54ns ± 0% -7.68% (p=0.000 n=40+40) RotateLeft64-4 15.7ns ± 0% 15.7ns ± 0% ~ (all equal) updates #31265 Change-Id: I77bc1c2c702d5323fc7cad5264a8e2d5666bf712 Reviewed-on: https://go-review.googlesource.com/c/go/+/188697 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-08-28 15:41:58 +00:00
Ben Shi	c683ab8128	cmd/compile: optimize ARM's math.Abs This CL optimizes math.Abs to an inline ABSD instruction on ARM. The benchmark results of src/math/ show big improvements. name old time/op new time/op delta Acos-4 181ns ± 0% 182ns ± 0% +0.30% (p=0.000 n=40+40) Acosh-4 202ns ± 0% 202ns ± 0% ~ (all equal) Asin-4 163ns ± 0% 163ns ± 0% ~ (all equal) Asinh-4 242ns ± 0% 242ns ± 0% ~ (all equal) Atan-4 120ns ± 0% 121ns ± 0% +0.83% (p=0.000 n=40+40) Atanh-4 202ns ± 0% 202ns ± 0% ~ (all equal) Atan2-4 173ns ± 0% 173ns ± 0% ~ (all equal) Cbrt-4 1.06µs ± 0% 1.06µs ± 0% +0.09% (p=0.000 n=39+37) Ceil-4 72.9ns ± 0% 72.8ns ± 0% ~ (p=0.237 n=40+40) Copysign-4 13.2ns ± 0% 13.2ns ± 0% ~ (all equal) Cos-4 193ns ± 0% 183ns ± 0% -5.18% (p=0.000 n=40+40) Cosh-4 254ns ± 0% 239ns ± 0% -5.91% (p=0.000 n=40+40) Erf-4 112ns ± 0% 112ns ± 0% ~ (all equal) Erfc-4 117ns ± 0% 117ns ± 0% ~ (all equal) Erfinv-4 127ns ± 0% 127ns ± 1% ~ (p=0.492 n=40+40) Erfcinv-4 128ns ± 0% 128ns ± 0% ~ (all equal) Exp-4 212ns ± 0% 206ns ± 0% -3.05% (p=0.000 n=40+40) ExpGo-4 216ns ± 0% 209ns ± 0% -3.24% (p=0.000 n=40+40) Expm1-4 142ns ± 0% 142ns ± 0% ~ (all equal) Exp2-4 191ns ± 0% 184ns ± 0% -3.45% (p=0.000 n=40+40) Exp2Go-4 194ns ± 0% 187ns ± 0% -3.61% (p=0.000 n=40+40) Abs-4 14.4ns ± 0% 6.3ns ± 0% -56.39% (p=0.000 n=38+39) Dim-4 12.6ns ± 0% 12.6ns ± 0% ~ (all equal) Floor-4 49.6ns ± 0% 49.6ns ± 0% ~ (all equal) Max-4 27.6ns ± 0% 27.6ns ± 0% ~ (all equal) Min-4 27.0ns ± 0% 27.0ns ± 0% ~ (all equal) Mod-4 349ns ± 0% 305ns ± 1% -12.55% (p=0.000 n=33+40) Frexp-4 54.0ns ± 0% 47.1ns ± 0% -12.78% (p=0.000 n=38+38) Gamma-4 242ns ± 0% 234ns ± 0% -3.16% (p=0.000 n=36+40) Hypot-4 84.8ns ± 0% 67.8ns ± 0% -20.05% (p=0.000 n=31+35) HypotGo-4 88.5ns ± 0% 71.6ns ± 0% -19.12% (p=0.000 n=40+38) Ilogb-4 45.8ns ± 0% 38.9ns ± 0% -15.12% (p=0.000 n=40+32) J0-4 821ns ± 0% 802ns ± 0% -2.33% (p=0.000 n=33+40) J1-4 816ns ± 0% 807ns ± 0% -1.05% (p=0.000 n=40+29) Jn-4 1.67µs ± 0% 1.65µs ± 0% -1.45% (p=0.000 n=40+39) Ldexp-4 61.5ns ± 0% 54.6ns ± 0% -11.27% (p=0.000 n=40+32) Lgamma-4 188ns ± 0% 188ns ± 0% ~ (all equal) Log-4 154ns ± 0% 147ns ± 0% -4.78% (p=0.000 n=40+40) Logb-4 50.9ns ± 0% 42.7ns ± 0% -16.11% (p=0.000 n=34+39) Log1p-4 160ns ± 0% 159ns ± 0% ~ (p=0.828 n=40+40) Log10-4 173ns ± 0% 166ns ± 0% -4.05% (p=0.000 n=40+40) Log2-4 65.3ns ± 0% 58.4ns ± 0% -10.57% (p=0.000 n=37+37) Modf-4 36.4ns ± 0% 36.4ns ± 0% ~ (all equal) Nextafter32-4 36.4ns ± 0% 36.4ns ± 0% ~ (all equal) Nextafter64-4 32.7ns ± 0% 32.6ns ± 0% ~ (p=0.375 n=40+40) PowInt-4 300ns ± 0% 277ns ± 0% -7.78% (p=0.000 n=40+40) PowFrac-4 676ns ± 0% 635ns ± 0% -6.00% (p=0.000 n=40+35) Pow10Pos-4 17.6ns ± 0% 17.6ns ± 0% ~ (all equal) Pow10Neg-4 22.0ns ± 0% 22.0ns ± 0% ~ (all equal) Round-4 30.1ns ± 0% 30.1ns ± 0% ~ (all equal) RoundToEven-4 38.9ns ± 0% 38.9ns ± 0% ~ (all equal) Remainder-4 291ns ± 0% 263ns ± 0% -9.62% (p=0.000 n=40+40) Signbit-4 11.3ns ± 0% 11.3ns ± 0% ~ (all equal) Sin-4 185ns ± 0% 185ns ± 0% ~ (all equal) Sincos-4 230ns ± 0% 230ns ± 0% ~ (all equal) Sinh-4 253ns ± 0% 246ns ± 0% -2.77% (p=0.000 n=39+39) SqrtIndirect-4 41.4ns ± 0% 41.4ns ± 0% ~ (all equal) SqrtLatency-4 13.8ns ± 0% 13.8ns ± 0% ~ (all equal) SqrtIndirectLatency-4 37.0ns ± 0% 37.0ns ± 0% ~ (p=0.632 n=40+40) SqrtGoLatency-4 911ns ± 0% 911ns ± 0% +0.08% (p=0.000 n=40+40) SqrtPrime-4 13.2µs ± 0% 13.2µs ± 0% +0.01% (p=0.038 n=38+40) Tan-4 205ns ± 0% 205ns ± 0% ~ (all equal) Tanh-4 264ns ± 0% 247ns ± 0% -6.44% (p=0.000 n=39+32) Trunc-4 45.2ns ± 0% 45.2ns ± 0% ~ (all equal) Y0-4 796ns ± 0% 792ns ± 0% -0.55% (p=0.000 n=35+40) Y1-4 804ns ± 0% 797ns ± 0% -0.82% (p=0.000 n=24+40) Yn-4 1.64µs ± 0% 1.62µs ± 0% -1.27% (p=0.000 n=40+39) Float64bits-4 8.16ns ± 0% 8.16ns ± 0% +0.04% (p=0.000 n=35+40) Float64frombits-4 10.7ns ± 0% 10.7ns ± 0% ~ (all equal) Float32bits-4 7.53ns ± 0% 7.53ns ± 0% ~ (p=0.760 n=40+40) Float32frombits-4 6.91ns ± 0% 6.91ns ± 0% -0.04% (p=0.002 n=32+38) [Geo mean] 111ns 106ns -3.98% Change-Id: I54f4fd7f5160db020b430b556bde59cc0fdb996d Reviewed-on: https://go-review.googlesource.com/c/go/+/188678 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-08-28 15:41:28 +00:00
Cherry Zhang	4ea7aa7cf3	cmd/compile, runtime: use R20, R21 in ARM64's Duff's devices Currently we use R16 and R17 for ARM64's Duff's devices. According to ARM64 ABI, R16 and R17 can be used by the (external) linker as scratch registers in trampolines. So don't use these registers to pass information across functions. It seems unlikely that calling Duff's devices would need a trampoline in normal cases. But it could happen if the call target is out of the 128 MB direct jump limit. The choice of R20 and R21 is kind of arbitrary. The register allocator allocates from low-numbered registers. High numbered registers are chosen so it is unlikely to hold a live value and forces a spill. Fixes #32773. Change-Id: Id22d555b5afeadd4efcf62797d1580d641c39218 Reviewed-on: https://go-review.googlesource.com/c/go/+/183842 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2019-06-26 16:01:47 +00:00
Michael Munday	ac8dbe7747	cmd/compile, runtime: make atomic loads/stores sequentially consistent on s390x The z/Architecture does not guarantee that a load following a store will not be reordered with that store, unless they access the same address. Therefore if we want to ensure the sequential consistency of atomic loads and stores we need to perform serialization operations after atomic stores. We do not need to serialize in the runtime when using StoreRel[ease] and LoadAcq[uire]. The z/Architecture already provides sufficient ordering guarantees for these operations. name old time/op new time/op delta AtomicLoad64-16 0.51ns ± 0% 0.51ns ± 0% ~ (all equal) AtomicStore64-16 0.51ns ± 0% 0.60ns ± 9% +16.47% (p=0.000 n=17+20) AtomicLoad-16 0.51ns ± 0% 0.51ns ± 0% ~ (all equal) AtomicStore-16 0.51ns ± 0% 0.60ns ± 9% +16.50% (p=0.000 n=18+20) Fixes #32428. Change-Id: I88d19a4010c46070e4fff4b41587efe4c628d4d9 Reviewed-on: https://go-review.googlesource.com/c/go/+/180439 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2019-06-06 16:15:43 +00:00
Austin Clements	4a4e05b0b1	cmd/compile,runtime/internal/atomic: add Load8 Change-Id: Id52a5730cf9207ee7ccebac4ef12791dc5720e7c Reviewed-on: https://go-review.googlesource.com/c/go/+/172283 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2019-05-03 19:25:37 +00:00
Michael Munday	2c1b5130aa	cmd/compile: add math/bits.{Add,Sub}64 intrinsics on s390x This CL adds intrinsics for the 64-bit addition and subtraction functions in math/bits. These intrinsics use the condition code to propagate the carry or borrow bit. To make the carry chains more efficient I've removed the 'clobberFlags' property from most of the load and store operations. Originally these ops did clobber flags when using offsets that didn't fit in a signed 20-bit integer, however that is no longer true. As with other platforms the intrinsics are faster when executed in a chain rather than a loop because currently we need to spill and restore the carry bit between each loop iteration. We may be able to reduce the need to do this on s390x (e.g. by using compare-and-branch instructions that do not clobber flags) in the future. name old time/op new time/op delta Add64 1.21ns ± 2% 2.03ns ± 2% +67.18% (p=0.000 n=7+10) Add64multiple 2.98ns ± 3% 1.03ns ± 0% -65.39% (p=0.000 n=10+9) Sub64 1.23ns ± 4% 2.03ns ± 1% +64.85% (p=0.000 n=10+10) Sub64multiple 3.73ns ± 4% 1.04ns ± 1% -72.28% (p=0.000 n=10+8) Change-Id: I913bbd5e19e6b95bef52f5bc4f14d6fe40119083 Reviewed-on: https://go-review.googlesource.com/c/go/+/174303 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-05-03 10:41:15 +00:00
Carlos Eduardo Seo	50ad09418e	cmd/compile: intrinsify math/bits.Add64 for ppc64x This change creates an intrinsic for Add64 for ppc64x and adds a testcase for it. name old time/op new time/op delta Add64-160 1.90ns ±40% 2.29ns ± 0% ~ (p=0.119 n=5+5) Add64multiple-160 6.69ns ± 2% 2.45ns ± 4% -63.47% (p=0.016 n=4+5) Change-Id: I9abe6fb023fdf62eea3c9b46a1820f60bb0a7f97 Reviewed-on: https://go-review.googlesource.com/c/go/+/173758 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>	2019-04-28 23:51:04 +00:00
erifan01	f8f265b9cf	cmd/compile: intrinsify math/bits.Sub64 for arm64 This CL instrinsifies Sub64 with arm64 instruction sequence NEGS, SBCS, NGC and NEG, and optimzes the case of borrowing chains. Benchmarks: name old time/op new time/op delta Sub-64 2.500000ns +- 0% 2.048000ns +- 1% -18.08% (p=0.000 n=10+10) Sub32-64 2.500000ns +- 0% 2.500000ns +- 0% ~ (all equal) Sub64-64 2.500000ns +- 0% 2.080000ns +- 0% -16.80% (p=0.000 n=10+7) Sub64multiple-64 7.090000ns +- 0% 2.090000ns +- 0% -70.52% (p=0.000 n=10+10) Change-Id: I3d2664e009a9635e13b55d2c4567c7b34c2c0655 Reviewed-on: https://go-review.googlesource.com/c/go/+/159018 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-04-22 14:40:20 +00:00
Richard Musiol	cf8cc7f63c	cmd/compile: add saturating conversions on wasm This change adds the GOWASM option "satconv" to enable the generation of experimental saturating (non-trapping) float-to-int conversions. It improves the performance of the conversion by 42%. Previously the conversions had already been augmented with helper functions to have saturating behavior. Now Wasm.rules is always using the new operation names and wasm/ssa.go is falling back to the helpers if the feature is not enabled. The feature is in phase 4 of the WebAssembly proposal process: https://github.com/WebAssembly/meetings/blob/master/process/phases.md More information on the feature can be found at: https://github.com/WebAssembly/nontrapping-float-to-int-conversions/blob/master/proposals/nontrapping-float-to-int-conversion/Overview.md Change-Id: Ic6c3688017054ede804b02b6b0ffd4a02ef33ad7 Reviewed-on: https://go-review.googlesource.com/c/go/+/170119 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-04-04 16:10:12 +00:00
Richard Musiol	4d23cbc671	cmd/compile: add sign-extension operators on wasm This change adds the GOWASM option "signext" to enable the generation of experimental sign-extension operators. The feature is in phase 4 of the WebAssembly proposal process: https://github.com/WebAssembly/meetings/blob/master/process/phases.md More information on the feature can be found at: https://github.com/WebAssembly/sign-extension-ops/blob/master/proposals/sign-extension-ops/Overview.md Change-Id: I6b30069390a8699fbecd9fb4d1d61e13c59b0333 Reviewed-on: https://go-review.googlesource.com/c/go/+/168882 Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-03-28 20:23:05 +00:00
erifan01	d0cbf9bf53	cmd/compile: follow up intrinsifying math/bits.Add64 for arm64 This CL deals with the additional comments of CL 159017. Change-Id: I4ad3c60c834646d58dc0c544c741b92bfe83fb8b Reviewed-on: https://go-review.googlesource.com/c/go/+/168857 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-03-22 15:09:47 +00:00

1 2 3 4 5 ...

418 commits