go/src/math
Lynn Boger 216714e44f math/big: improve performance of mulAddVWW on ppc64x
This changes the assembly implementation on ppc64x
to improve performance by reordering some instructions.
It also eliminates an unnecessary move by changing an
ADDZE to use the correct target register.

Improvement on power9:

MulAddVWW/1         6.89ns ± 0%    7.30ns ± 0%   +5.95%  (p=1.000 n=1+1)
MulAddVWW/2         8.04ns ± 0%    8.06ns ± 0%   +0.25%  (p=1.000 n=1+1)
MulAddVWW/3         9.39ns ± 0%    9.39ns ± 0%     ~     (all equal)
MulAddVWW/4         9.76ns ± 0%    9.48ns ± 0%   -2.87%  (p=1.000 n=1+1)
MulAddVWW/5         10.5ns ± 0%    10.3ns ± 0%   -1.90%  (p=1.000 n=1+1)
MulAddVWW/10        15.4ns ± 0%    14.9ns ± 0%   -3.25%  (p=1.000 n=1+1)
MulAddVWW/100        149ns ± 0%     125ns ± 0%  -16.11%  (p=1.000 n=1+1)
MulAddVWW/1000      1.42µs ± 0%    1.28µs ± 0%   -9.74%  (p=1.000 n=1+1)
MulAddVWW/10000     14.2µs ± 0%    12.8µs ± 0%   -9.73%  (p=1.000 n=1+1)
MulAddVWW/100000     144µs ± 0%     129µs ± 0%  -10.10%  (p=1.000 n=1+1)

Change-Id: I0ae7002a69783ca19d7a4e3e42042ae75dc60069
Reviewed-on: https://go-review.googlesource.com/c/go/+/248721
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
2020-08-18 20:25:26 +00:00
..
big math/big: improve performance of mulAddVWW on ppc64x 2020-08-18 20:25:26 +00:00
bits cmd/compile: clean up codegen for branch-on-carry on s390x 2020-04-22 20:11:06 +00:00
cmplx math/cmplx: handle special cases 2020-05-01 03:16:37 +00:00
rand math/rand: update comment to avoid use of ^ for exponentiation 2019-12-04 21:14:24 +00:00
abs.go cmd/compile,math: improve code generation for math.Abs 2017-08-25 19:15:01 +00:00
acos_s390x.s math: use s390x mnemonics rather than binary encodings 2018-08-20 17:42:08 +00:00
acosh.go math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
acosh_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
all_test.go math: correct Atan2(±y,+∞) = ±0 on s390x 2020-03-25 04:06:34 +00:00
arith_s390x.go math: simplify hasVX checking on s390x 2020-04-27 20:06:57 +00:00
arith_s390x_test.go math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
asin.go build: move package sources from src/pkg to src 2014-09-08 00:08:51 -04:00
asin_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
asin_s390x.s math: use s390x mnemonics rather than binary encodings 2018-08-20 17:42:08 +00:00
asinh.go math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
asinh_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
atan.go build: move package sources from src/pkg to src 2014-09-08 00:08:51 -04:00
atan2.go build: move package sources from src/pkg to src 2014-09-08 00:08:51 -04:00
atan2_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan2_s390x.s math: correct Atan2(±y,+∞) = ±0 on s390x 2020-03-25 04:06:34 +00:00
atan_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
atanh.go math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
atanh_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
bits.go math: add RoundToEven function 2017-10-24 22:33:09 +00:00
cbrt.go math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
cbrt_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
const.go math: change oeis.org urls to https 2017-08-08 08:56:40 +00:00
copysign.go build: move package sources from src/pkg to src 2014-09-08 00:08:51 -04:00
cosh_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
dim.go math: remove asm version of Dim 2017-11-30 21:00:33 +00:00
dim_amd64.s math: remove asm version of Dim 2017-11-30 21:00:33 +00:00
dim_arm64.s math: remove asm version of Dim 2017-11-30 21:00:33 +00:00
dim_riscv64.s math: implement Min/Max in riscv64 assembly 2020-05-04 17:29:13 +00:00
dim_s390x.s math: optimize dim and remove s390x assembly implementation 2017-10-30 19:05:51 +00:00
erf.go math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
erf_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
erfc_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
erfinv.go all: update comment URLs from HTTP to HTTPS, where possible 2018-06-01 21:52:00 +00:00
example_test.go math: add function examples. 2020-05-02 20:22:19 +00:00
exp.go math: fix inaccurate result of Exp(1) 2017-08-17 09:01:27 +00:00
exp2_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
exp_amd64.s math: fix dead link to springerlink (now link.springer) 2020-05-29 14:33:50 +00:00
exp_arm64.s math: optimize Exp and Exp2 on arm64 2018-03-27 19:55:02 +00:00
exp_asm.go all: remove nacl (part 3, more amd64p32) 2019-10-10 22:38:38 +00:00
exp_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
expm1.go all: unindent some big chunks of code 2017-08-18 06:59:48 +00:00
expm1_386.s all: this big patch remove whitespace from assembly files 2018-10-03 15:28:51 +00:00
expm1_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
export_s390x_test.go math: use SIMD to accelerate additional scalar math functions on s390x 2017-05-08 19:52:30 +00:00
export_test.go math: use constant rather than variable for exported test threshold 2018-12-13 06:33:18 +00:00
floor.go math: add RoundToEven function 2017-10-24 22:33:09 +00:00
floor_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
floor_amd64.s cmd/compile: intrinsify math.{Trunc/Ceil/Floor} on amd64 2017-10-31 19:30:54 +00:00
floor_arm64.s math: add some assembly implementations on ARM64 2016-09-27 23:52:12 +00:00
floor_ppc64x.s math, cmd/internal/obj/ppc64: improve floor, ceil, trunc with asm 2016-09-23 13:03:08 +00:00
floor_s390x.s math: optimize Ceil, Floor and Trunc on s390x 2016-08-26 17:27:13 +00:00
floor_wasm.s math, math/big: add wasm architecture 2018-05-08 13:29:22 +00:00
fma.go math, cmd/compile: rename Fma to FMA 2019-11-07 14:51:06 +00:00
frexp.go build: move package sources from src/pkg to src 2014-09-08 00:08:51 -04:00
frexp_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
gamma.go math: speed up Gamma(+Inf) 2016-10-18 22:12:03 +00:00
huge_test.go math/cmplx: implement Payne-Hanek range reduction 2020-03-14 04:12:41 +00:00
hypot.go math: use Abs rather than if x < 0 { x = -x } 2018-02-13 20:12:23 +00:00
hypot_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
hypot_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
j0.go all: s/cancelation/cancellation/ 2019-04-16 20:27:15 +00:00
j1.go all: s/cancelation/cancellation/ 2019-04-16 20:27:15 +00:00
jn.go math: use Sincos instead of Sin and Cos in Jn and Yn 2019-03-25 22:41:37 +00:00
ldexp.go math: fix Ldexp when result is below ldexp(2, -1075) 2018-03-29 23:14:13 +00:00
ldexp_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
lgamma.go go/printer, gofmt: tuned table alignment for better results 2018-04-04 13:39:34 -07:00
log.go all: single space after period. 2016-03-02 00:13:47 +00:00
log1p.go math: simplify the code 2020-08-15 02:20:42 +00:00
log1p_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log1p_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
log10.go math: fix Log2 test failures on ppc64 (and s390) 2015-07-15 05:35:22 +00:00
log10_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log10_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
log_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log_amd64.s math: speed up Log on amd64 2017-03-29 20:36:29 +00:00
log_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
logb.go build: move package sources from src/pkg to src 2014-09-08 00:08:51 -04:00
mod.go math: use Abs in Mod rather than if x < 0 { x = -x} 2018-10-04 17:32:44 +00:00
mod_386.s build: move package sources from src/pkg to src 2014-09-08 00:08:51 -04:00
modf.go all: single space after period. 2016-03-02 00:13:47 +00:00
modf_386.s all: fix assembly vet issues 2016-08-25 18:52:31 +00:00
modf_arm64.s all: minor vet fixes 2016-10-24 17:27:37 +00:00
modf_ppc64x.s math: implement asm modf for ppc64x 2017-11-02 13:24:32 +00:00
nextafter.go math: change Nextafter64 to Nextafter in the description of Nextafter 2015-02-17 14:29:18 +00:00
pow.go math: use Abs in Pow rather than if x < 0 { x = -x } 2018-10-04 17:33:04 +00:00
pow10.go math: speed up and improve accuracy of Pow10 2017-02-22 19:17:04 +00:00
pow_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
remainder.go math: fix math.Remainder(-x,x) (for Inf > x > 0) 2019-03-15 14:52:51 +00:00
remainder_386.s build: move package sources from src/pkg to src 2014-09-08 00:08:51 -04:00
signbit.go all: use "reports whether" consistently in the few places that didn't 2018-11-02 22:47:58 +00:00
sin.go src, misc: apply gofmt 2019-02-19 20:38:28 +00:00
sin_s390x.s cmd/asm, math: add s390x floating point test instructions 2018-04-03 16:08:04 +00:00
sincos.go src, misc: apply gofmt 2019-02-19 20:38:28 +00:00
sinh.go math,net: omit explicit true tag expr in switch 2018-08-20 22:15:59 +00:00
sinh_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
sqrt.go math: delete unused function sqrtC 2016-03-03 02:29:09 +00:00
sqrt_386.s all: this big patch remove whitespace from assembly files 2018-10-03 15:28:51 +00:00
sqrt_amd64.s math: make sqrt smaller on AMD64 2016-09-29 15:56:52 +00:00
sqrt_arm.s all: this big patch remove whitespace from assembly files 2018-10-03 15:28:51 +00:00
sqrt_arm64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sqrt_mipsx.s runtime/cgo, math: don't use FP instructions for soft-float mips{,le} 2017-11-30 17:12:32 +00:00
sqrt_ppc64x.s all: make copyright headers consistent with one space after period 2016-05-02 13:43:18 +00:00
sqrt_riscv64.s math: implement Sqrt in assembly for riscv64 2020-02-25 16:43:26 +00:00
sqrt_s390x.s math: add functions and stubs for s390x 2016-04-06 23:35:56 +00:00
sqrt_wasm.s math, math/big: add wasm architecture 2018-05-08 13:29:22 +00:00
stubs_386.s math: consolidate assembly stub implementations 2019-04-23 14:50:16 +00:00
stubs_amd64.s math: consolidate assembly stub implementations 2019-04-23 14:50:16 +00:00
stubs_arm.s math: consolidate assembly stub implementations 2019-04-23 14:50:16 +00:00
stubs_arm64.s math: consolidate assembly stub implementations 2019-04-23 14:50:16 +00:00
stubs_mips64x.s math: consolidate assembly stub implementations 2019-04-23 14:50:16 +00:00
stubs_mipsx.s math: consolidate assembly stub implementations 2019-04-23 14:50:16 +00:00
stubs_ppc64x.s math: consolidate assembly stub implementations 2019-04-23 14:50:16 +00:00
stubs_riscv64.s math: implement Min/Max in riscv64 assembly 2020-05-04 17:29:13 +00:00
stubs_s390x.s math: simplify hasVX checking on s390x 2020-04-27 20:06:57 +00:00
stubs_wasm.s math: consolidate assembly stub implementations 2019-04-23 14:50:16 +00:00
tan.go src, misc: apply gofmt 2019-02-19 20:38:28 +00:00
tan_s390x.s math: use s390x mnemonics rather than binary encodings 2018-08-20 17:42:08 +00:00
tanh.go src, misc: apply gofmt 2019-02-19 20:38:28 +00:00
tanh_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
trig_reduce.go math/cmplx: implement Payne-Hanek range reduction 2020-03-14 04:12:41 +00:00
unsafe.go math: document sign bit correspondence for floating-point/bits conversions 2018-12-06 22:27:54 +00:00