Martin Möhrmann
5bb59b6d16
Revert "compile: prefer an AND instead of SHR+SHL instructions"
...
This reverts commit 9ec7074a94 .
Reason for revert: broke s390x (copysign, abs) and arm64 (bitfield) tests.
Change-Id: I16c1b389c062e8c4aa5de079f1d46c9b25b0db52
Reviewed-on: https://go-review.googlesource.com/c/go/+/193850
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Agniva De Sarker <agniva.quicksilver@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-09-09 07:33:25 +00:00
Martin Möhrmann
9ec7074a94
compile: prefer an AND instead of SHR+SHL instructions
...
On modern 64bit CPUs a SHR, SHL or AND instruction take 1 cycle to execute.
A pair of shifts that operate on the same register will take 2 cycles
and needs to wait for the input register value to be available.
Large constants used to mask the high bits of a register with an AND
instruction can not be encoded as an immediate in the AND instruction
on amd64 and therefore need to be loaded into a register with a MOV
instruction.
However that MOV instruction is not dependent on the output register and
on many CPUs does not compete with the AND or shift instructions for
execution ports.
Using a pair of shifts to mask high bits instead of an AND to mask high
bits of a register has a shorter encoding and uses one less general
purpose register but is slower due to taking one clock cycle longer
if there is no register pressure that would make the AND variant need to
generate a spill.
For example the instructions emitted for (x & 1 << 63) before this CL are:
48c1ea3f SHRQ $0x3f, DX
48c1e23f SHLQ $0x3f, DX
after this CL the instructions are the same as GCC and LLVM use:
48b80000000000000080 MOVQ $0x8000000000000000, AX
4821d0 ANDQ DX, AX
Some platforms such as arm64 already have SSA optimization rules to fuse
two shift instructions back into an AND.
Removing the general rule to rewrite AND to SHR+SHL speeds up this benchmark:
var GlobalU uint
func BenchmarkAndHighBits(b *testing.B) {
x := uint(0)
for i := 0; i < b.N; i++ {
x &= 1 << 63
}
GlobalU = x
}
amd64/darwin on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz:
name old time/op new time/op delta
AndHighBits-4 0.61ns ± 6% 0.42ns ± 6% -31.42% (p=0.000 n=25+25):
Updates #33826
Updates #32781
Change-Id: I862d3587446410c447b9a7265196b57f85358633
Reviewed-on: https://go-review.googlesource.com/c/go/+/191780
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-09-09 06:49:17 +00:00
Ben Shi
c683ab8128
cmd/compile: optimize ARM's math.Abs
...
This CL optimizes math.Abs to an inline ABSD instruction on ARM.
The benchmark results of src/math/ show big improvements.
name old time/op new time/op delta
Acos-4 181ns ± 0% 182ns ± 0% +0.30% (p=0.000 n=40+40)
Acosh-4 202ns ± 0% 202ns ± 0% ~ (all equal)
Asin-4 163ns ± 0% 163ns ± 0% ~ (all equal)
Asinh-4 242ns ± 0% 242ns ± 0% ~ (all equal)
Atan-4 120ns ± 0% 121ns ± 0% +0.83% (p=0.000 n=40+40)
Atanh-4 202ns ± 0% 202ns ± 0% ~ (all equal)
Atan2-4 173ns ± 0% 173ns ± 0% ~ (all equal)
Cbrt-4 1.06µs ± 0% 1.06µs ± 0% +0.09% (p=0.000 n=39+37)
Ceil-4 72.9ns ± 0% 72.8ns ± 0% ~ (p=0.237 n=40+40)
Copysign-4 13.2ns ± 0% 13.2ns ± 0% ~ (all equal)
Cos-4 193ns ± 0% 183ns ± 0% -5.18% (p=0.000 n=40+40)
Cosh-4 254ns ± 0% 239ns ± 0% -5.91% (p=0.000 n=40+40)
Erf-4 112ns ± 0% 112ns ± 0% ~ (all equal)
Erfc-4 117ns ± 0% 117ns ± 0% ~ (all equal)
Erfinv-4 127ns ± 0% 127ns ± 1% ~ (p=0.492 n=40+40)
Erfcinv-4 128ns ± 0% 128ns ± 0% ~ (all equal)
Exp-4 212ns ± 0% 206ns ± 0% -3.05% (p=0.000 n=40+40)
ExpGo-4 216ns ± 0% 209ns ± 0% -3.24% (p=0.000 n=40+40)
Expm1-4 142ns ± 0% 142ns ± 0% ~ (all equal)
Exp2-4 191ns ± 0% 184ns ± 0% -3.45% (p=0.000 n=40+40)
Exp2Go-4 194ns ± 0% 187ns ± 0% -3.61% (p=0.000 n=40+40)
Abs-4 14.4ns ± 0% 6.3ns ± 0% -56.39% (p=0.000 n=38+39)
Dim-4 12.6ns ± 0% 12.6ns ± 0% ~ (all equal)
Floor-4 49.6ns ± 0% 49.6ns ± 0% ~ (all equal)
Max-4 27.6ns ± 0% 27.6ns ± 0% ~ (all equal)
Min-4 27.0ns ± 0% 27.0ns ± 0% ~ (all equal)
Mod-4 349ns ± 0% 305ns ± 1% -12.55% (p=0.000 n=33+40)
Frexp-4 54.0ns ± 0% 47.1ns ± 0% -12.78% (p=0.000 n=38+38)
Gamma-4 242ns ± 0% 234ns ± 0% -3.16% (p=0.000 n=36+40)
Hypot-4 84.8ns ± 0% 67.8ns ± 0% -20.05% (p=0.000 n=31+35)
HypotGo-4 88.5ns ± 0% 71.6ns ± 0% -19.12% (p=0.000 n=40+38)
Ilogb-4 45.8ns ± 0% 38.9ns ± 0% -15.12% (p=0.000 n=40+32)
J0-4 821ns ± 0% 802ns ± 0% -2.33% (p=0.000 n=33+40)
J1-4 816ns ± 0% 807ns ± 0% -1.05% (p=0.000 n=40+29)
Jn-4 1.67µs ± 0% 1.65µs ± 0% -1.45% (p=0.000 n=40+39)
Ldexp-4 61.5ns ± 0% 54.6ns ± 0% -11.27% (p=0.000 n=40+32)
Lgamma-4 188ns ± 0% 188ns ± 0% ~ (all equal)
Log-4 154ns ± 0% 147ns ± 0% -4.78% (p=0.000 n=40+40)
Logb-4 50.9ns ± 0% 42.7ns ± 0% -16.11% (p=0.000 n=34+39)
Log1p-4 160ns ± 0% 159ns ± 0% ~ (p=0.828 n=40+40)
Log10-4 173ns ± 0% 166ns ± 0% -4.05% (p=0.000 n=40+40)
Log2-4 65.3ns ± 0% 58.4ns ± 0% -10.57% (p=0.000 n=37+37)
Modf-4 36.4ns ± 0% 36.4ns ± 0% ~ (all equal)
Nextafter32-4 36.4ns ± 0% 36.4ns ± 0% ~ (all equal)
Nextafter64-4 32.7ns ± 0% 32.6ns ± 0% ~ (p=0.375 n=40+40)
PowInt-4 300ns ± 0% 277ns ± 0% -7.78% (p=0.000 n=40+40)
PowFrac-4 676ns ± 0% 635ns ± 0% -6.00% (p=0.000 n=40+35)
Pow10Pos-4 17.6ns ± 0% 17.6ns ± 0% ~ (all equal)
Pow10Neg-4 22.0ns ± 0% 22.0ns ± 0% ~ (all equal)
Round-4 30.1ns ± 0% 30.1ns ± 0% ~ (all equal)
RoundToEven-4 38.9ns ± 0% 38.9ns ± 0% ~ (all equal)
Remainder-4 291ns ± 0% 263ns ± 0% -9.62% (p=0.000 n=40+40)
Signbit-4 11.3ns ± 0% 11.3ns ± 0% ~ (all equal)
Sin-4 185ns ± 0% 185ns ± 0% ~ (all equal)
Sincos-4 230ns ± 0% 230ns ± 0% ~ (all equal)
Sinh-4 253ns ± 0% 246ns ± 0% -2.77% (p=0.000 n=39+39)
SqrtIndirect-4 41.4ns ± 0% 41.4ns ± 0% ~ (all equal)
SqrtLatency-4 13.8ns ± 0% 13.8ns ± 0% ~ (all equal)
SqrtIndirectLatency-4 37.0ns ± 0% 37.0ns ± 0% ~ (p=0.632 n=40+40)
SqrtGoLatency-4 911ns ± 0% 911ns ± 0% +0.08% (p=0.000 n=40+40)
SqrtPrime-4 13.2µs ± 0% 13.2µs ± 0% +0.01% (p=0.038 n=38+40)
Tan-4 205ns ± 0% 205ns ± 0% ~ (all equal)
Tanh-4 264ns ± 0% 247ns ± 0% -6.44% (p=0.000 n=39+32)
Trunc-4 45.2ns ± 0% 45.2ns ± 0% ~ (all equal)
Y0-4 796ns ± 0% 792ns ± 0% -0.55% (p=0.000 n=35+40)
Y1-4 804ns ± 0% 797ns ± 0% -0.82% (p=0.000 n=24+40)
Yn-4 1.64µs ± 0% 1.62µs ± 0% -1.27% (p=0.000 n=40+39)
Float64bits-4 8.16ns ± 0% 8.16ns ± 0% +0.04% (p=0.000 n=35+40)
Float64frombits-4 10.7ns ± 0% 10.7ns ± 0% ~ (all equal)
Float32bits-4 7.53ns ± 0% 7.53ns ± 0% ~ (p=0.760 n=40+40)
Float32frombits-4 6.91ns ± 0% 6.91ns ± 0% -0.04% (p=0.002 n=32+38)
[Geo mean] 111ns 106ns -3.98%
Change-Id: I54f4fd7f5160db020b430b556bde59cc0fdb996d
Reviewed-on: https://go-review.googlesource.com/c/go/+/188678
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-08-28 15:41:28 +00:00
Richard Musiol
5ee1b84959
math, math/bits: add intrinsics for wasm
...
This commit adds compiler intrinsics for the packages math and
math/bits on the wasm architecture for better performance.
benchmark old ns/op new ns/op delta
BenchmarkCeil 8.31 3.21 -61.37%
BenchmarkCopysign 5.24 3.88 -25.95%
BenchmarkAbs 5.42 3.34 -38.38%
BenchmarkFloor 8.29 3.18 -61.64%
BenchmarkRoundToEven 9.76 3.26 -66.60%
BenchmarkSqrtLatency 8.13 4.88 -39.98%
BenchmarkSqrtPrime 5246 3535 -32.62%
BenchmarkTrunc 8.29 3.15 -62.00%
BenchmarkLeadingZeros 13.0 4.23 -67.46%
BenchmarkLeadingZeros8 4.65 4.42 -4.95%
BenchmarkLeadingZeros16 7.60 4.38 -42.37%
BenchmarkLeadingZeros32 10.7 4.48 -58.13%
BenchmarkLeadingZeros64 12.9 4.31 -66.59%
BenchmarkTrailingZeros 6.52 4.04 -38.04%
BenchmarkTrailingZeros8 4.57 4.14 -9.41%
BenchmarkTrailingZeros16 6.69 4.16 -37.82%
BenchmarkTrailingZeros32 6.97 4.23 -39.31%
BenchmarkTrailingZeros64 6.59 4.00 -39.30%
BenchmarkOnesCount 7.93 3.30 -58.39%
BenchmarkOnesCount8 3.56 3.19 -10.39%
BenchmarkOnesCount16 4.85 3.19 -34.23%
BenchmarkOnesCount32 7.27 3.19 -56.12%
BenchmarkOnesCount64 8.08 3.28 -59.41%
BenchmarkRotateLeft 4.88 3.80 -22.13%
BenchmarkRotateLeft64 5.03 3.63 -27.83%
Change-Id: Ic1e0c2984878be8defb6eb7eb6ee63765c793222
Reviewed-on: https://go-review.googlesource.com/c/go/+/165177
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-03-14 19:46:19 +00:00
Lynn Boger
39fa301bdc
test/codegen: enable more tests for ppc64/ppc64le
...
Adding cases for ppc64,ppc64le to the codegen tests
where appropriate.
Change-Id: Idf8cbe88a4ab4406a4ef1ea777bd15a58b68f3ed
Reviewed-on: https://go-review.googlesource.com/c/142557
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-16 19:00:53 +00:00
fanzha02
a19a83c8ef
cmd/compile: optimize math.Float64(32)bits and math.Float64(32)frombits on arm64
...
Use float <-> int register moves without conversion instead of stores
and loads to move float <-> int values.
Math package benchmark results.
name old time/op new time/op delta
Acosh 153ns ± 0% 147ns ± 0% -3.92% (p=0.000 n=10+10)
Asinh 183ns ± 0% 177ns ± 0% -3.28% (p=0.000 n=10+10)
Atanh 157ns ± 0% 155ns ± 0% -1.27% (p=0.000 n=10+10)
Atan2 118ns ± 0% 117ns ± 1% -0.59% (p=0.003 n=10+10)
Cbrt 119ns ± 0% 114ns ± 0% -4.20% (p=0.000 n=10+10)
Copysign 7.51ns ± 0% 6.51ns ± 0% -13.32% (p=0.000 n=9+10)
Cos 73.1ns ± 0% 70.6ns ± 0% -3.42% (p=0.000 n=10+10)
Cosh 119ns ± 0% 121ns ± 0% +1.68% (p=0.000 n=10+9)
ExpGo 154ns ± 0% 149ns ± 0% -3.05% (p=0.000 n=9+10)
Expm1 101ns ± 0% 99ns ± 0% -1.88% (p=0.000 n=10+10)
Exp2Go 150ns ± 0% 146ns ± 0% -2.67% (p=0.000 n=10+10)
Abs 7.01ns ± 0% 6.01ns ± 0% -14.27% (p=0.000 n=10+9)
Mod 234ns ± 0% 212ns ± 0% -9.40% (p=0.000 n=9+10)
Frexp 34.5ns ± 0% 30.0ns ± 0% -13.04% (p=0.000 n=10+10)
Gamma 112ns ± 0% 111ns ± 0% -0.89% (p=0.000 n=10+10)
Hypot 73.6ns ± 0% 68.6ns ± 0% -6.79% (p=0.000 n=10+10)
HypotGo 77.1ns ± 0% 72.1ns ± 0% -6.49% (p=0.000 n=10+10)
Ilogb 31.0ns ± 0% 28.0ns ± 0% -9.68% (p=0.000 n=10+10)
J0 437ns ± 0% 434ns ± 0% -0.62% (p=0.000 n=10+10)
J1 433ns ± 0% 431ns ± 0% -0.46% (p=0.000 n=10+10)
Jn 927ns ± 0% 922ns ± 0% -0.54% (p=0.000 n=10+10)
Ldexp 41.5ns ± 0% 37.0ns ± 0% -10.84% (p=0.000 n=9+10)
Log 124ns ± 0% 118ns ± 0% -4.84% (p=0.000 n=10+9)
Logb 34.0ns ± 0% 32.0ns ± 0% -5.88% (p=0.000 n=10+10)
Log1p 110ns ± 0% 108ns ± 0% -1.82% (p=0.000 n=10+10)
Log10 136ns ± 0% 132ns ± 0% -2.94% (p=0.000 n=10+10)
Log2 51.6ns ± 0% 47.1ns ± 0% -8.72% (p=0.000 n=10+10)
Nextafter32 33.0ns ± 0% 30.5ns ± 0% -7.58% (p=0.000 n=10+10)
Nextafter64 29.0ns ± 0% 26.5ns ± 0% -8.62% (p=0.000 n=10+10)
PowInt 169ns ± 0% 160ns ± 0% -5.33% (p=0.000 n=10+10)
PowFrac 375ns ± 0% 361ns ± 0% -3.73% (p=0.000 n=10+10)
RoundToEven 14.0ns ± 0% 12.5ns ± 0% -10.71% (p=0.000 n=10+10)
Remainder 206ns ± 0% 192ns ± 0% -6.80% (p=0.000 n=10+9)
Signbit 6.01ns ± 0% 5.51ns ± 0% -8.32% (p=0.000 n=10+9)
Sin 70.1ns ± 0% 69.6ns ± 0% -0.71% (p=0.000 n=10+10)
Sincos 99.1ns ± 0% 99.6ns ± 0% +0.50% (p=0.000 n=9+10)
SqrtGoLatency 178ns ± 0% 146ns ± 0% -17.70% (p=0.000 n=8+10)
SqrtPrime 9.19µs ± 0% 9.20µs ± 0% +0.01% (p=0.000 n=9+9)
Tanh 125ns ± 1% 127ns ± 0% +1.36% (p=0.000 n=10+10)
Y0 428ns ± 0% 426ns ± 0% -0.47% (p=0.000 n=10+10)
Y1 431ns ± 0% 429ns ± 0% -0.46% (p=0.000 n=10+9)
Yn 906ns ± 0% 901ns ± 0% -0.55% (p=0.000 n=10+10)
Float64bits 4.50ns ± 0% 3.50ns ± 0% -22.22% (p=0.000 n=10+10)
Float64frombits 4.00ns ± 0% 3.50ns ± 0% -12.50% (p=0.000 n=10+9)
Float32bits 4.50ns ± 0% 3.50ns ± 0% -22.22% (p=0.002 n=8+10)
Float32frombits 4.00ns ± 0% 3.50ns ± 0% -12.50% (p=0.000 n=10+10)
Change-Id: Iba829e15d5624962fe0c699139ea783efeefabc2
Reviewed-on: https://go-review.googlesource.com/129715
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-09-17 20:49:04 +00:00
erifan01
8149db4f64
cmd/compile: intrinsify math.RoundToEven and math.Abs on arm64
...
math.RoundToEven can be done by one arm64 instruction FRINTND, intrinsify it to improve performance.
The current pure Go implementation of the function Abs is translated into five instructions on arm64:
str, ldr, and, str, ldr. The intrinsic implementation requires only one instruction, so in terms of
performance, intrinsify it is worthwhile.
Benchmarks:
name old time/op new time/op delta
Abs-8 3.50ns ± 0% 1.50ns ± 0% -57.14% (p=0.000 n=10+10)
RoundToEven-8 9.26ns ± 0% 1.50ns ± 0% -83.80% (p=0.000 n=10+10)
Change-Id: I9456b26ab282b544dfac0154fc86f17aed96ac3d
Reviewed-on: https://go-review.googlesource.com/116535
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-09-13 14:52:51 +00:00
fanzha02
d5377c2026
test: fix the wrong test of math.Copysign(c, -1) for arm64
...
The CL 132915 added the wrong codegen test for math.Copysign(c, -1),
it should test that AND is not emitted. This CL fixes this error.
Change-Id: Ida1d3d54ebfc7f238abccbc1f70f914e1b5bfd91
Reviewed-on: https://go-review.googlesource.com/134815
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-09-12 15:34:20 +00:00
fanzha02
2e5c32518c
cmd/compile: optimize math.Copysign on arm64
...
Add rewrite rules to optimize math.Copysign() when the second
argument is negative floating point constant.
For example, math.Copysign(c, -2): The previous compile output is
"AND $9223372036854775807, R0, R0; ORR $-9223372036854775808, R0, R0".
The optimized compile output is "ORR $-9223372036854775808, R0, R0"
Math package benchmark results.
name old time/op new time/op delta
Copysign-8 2.61ns ± 2% 2.49ns ± 0% -4.55% (p=0.000 n=10+10)
Cos-8 43.0ns ± 0% 41.5ns ± 0% -3.49% (p=0.000 n=10+10)
Cosh-8 98.6ns ± 0% 98.1ns ± 0% -0.51% (p=0.000 n=10+10)
ExpGo-8 107ns ± 0% 105ns ± 0% -1.87% (p=0.000 n=10+10)
Exp2Go-8 100ns ± 0% 100ns ± 0% +0.39% (p=0.000 n=10+8)
Max-8 6.56ns ± 2% 6.45ns ± 1% -1.63% (p=0.002 n=10+10)
Min-8 6.66ns ± 3% 6.47ns ± 2% -2.82% (p=0.006 n=10+10)
Mod-8 107ns ± 1% 104ns ± 1% -2.72% (p=0.000 n=10+10)
Frexp-8 11.5ns ± 1% 11.0ns ± 0% -4.56% (p=0.000 n=8+10)
HypotGo-8 19.4ns ± 0% 19.4ns ± 0% +0.36% (p=0.019 n=10+10)
Ilogb-8 8.63ns ± 0% 8.51ns ± 0% -1.36% (p=0.000 n=10+10)
Jn-8 584ns ± 0% 585ns ± 0% +0.17% (p=0.000 n=7+8)
Ldexp-8 13.8ns ± 0% 13.5ns ± 0% -2.17% (p=0.002 n=8+10)
Logb-8 10.2ns ± 0% 9.9ns ± 0% -2.65% (p=0.000 n=10+7)
Nextafter64-8 7.54ns ± 0% 7.51ns ± 0% -0.37% (p=0.000 n=10+10)
Remainder-8 73.5ns ± 1% 70.4ns ± 1% -4.27% (p=0.000 n=10+10)
SqrtGoLatency-8 79.6ns ± 0% 76.2ns ± 0% -4.30% (p=0.000 n=9+10)
Yn-8 582ns ± 0% 579ns ± 0% -0.52% (p=0.000 n=10+10)
Change-Id: I0c9cd1ea87435e7b8bab94b4e79e6e29785f25b1
Reviewed-on: https://go-review.googlesource.com/132915
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-09-06 19:57:25 +00:00
Milan Knezevic
2959128dc5
cmd/compile: add softfloat support to mips64{,le}
...
mips64 softfloat support is based on mips implementation and introduces
new enviroment variable GOMIPS64.
GOMIPS64 is a GOARCH=mips64{,le} specific option, for a choice between
hard-float and soft-float. Valid values are 'hardfloat' (default) and
'softfloat'. It is passed to the assembler as
'GOMIPS64_{hardfloat,softfloat}'.
Change-Id: I7f73078627f7cb37c588a38fb5c997fe09c56134
Reviewed-on: https://go-review.googlesource.com/108475
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-27 14:50:17 +00:00
Carlos Eduardo Seo
ebb67d993a
cmd/compile, cmd/internal/obj/ppc64: make math.Round an intrinsic on ppc64x
...
This change implements math.Round as an intrinsic on ppc64x so it can be
done using a single instruction.
benchmark old ns/op new ns/op delta
BenchmarkRound-16 2.60 0.69 -73.46%
Change-Id: I9408363e96201abdfc73ced7bcd5f0c29db006a8
Reviewed-on: https://go-review.googlesource.com/109395
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-04-26 14:12:09 +00:00
Giovanni Bajo
284ba47b49
test: run codegen tests on all supported architecture variants
...
This CL makes the codegen testsuite automatically test all
architecture variants for architecture specified in tests. For
instance, if a test file specifies a "arm" test, it will be
automatically run on all GOARM variants (5,6,7), to increase
the coverage.
The CL also introduces a syntax to specify only a specific
variant (eg: "arm/7") in case the test makes sense only there.
The same syntax also allows to specify the operating system
in case it matters (eg: "plan9/386/sse2").
Fixes #24658
Change-Id: I2eba8b918f51bb6a77a8431a309f8b71af07ea22
Reviewed-on: https://go-review.googlesource.com/107315
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-15 20:02:43 +00:00
Giovanni Bajo
79112707bb
cmd/compile: add patterns for bit set/clear/complement on amd64
...
This patch completes implementation of BT(Q|L), and adds support
for BT(S|R|C)(Q|L).
Example of code changes from time.(*Time).addSec:
if t.wall&hasMonotonic != 0 {
0x1073465 488b08 MOVQ 0(AX), CX
0x1073468 4889ca MOVQ CX, DX
0x107346b 48c1e93f SHRQ $0x3f, CX
0x107346f 48c1e13f SHLQ $0x3f, CX
0x1073473 48f7c1ffffffff TESTQ $-0x1, CX
0x107347a 746b JE 0x10734e7
if t.wall&hasMonotonic != 0 {
0x1073435 488b08 MOVQ 0(AX), CX
0x1073438 480fbae13f BTQ $0x3f, CX
0x107343d 7363 JAE 0x10734a2
Another example:
t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
0x10734c8 4881e1ffffff3f ANDQ $0x3fffffff, CX
0x10734cf 48c1e61e SHLQ $0x1e, SI
0x10734d3 4809ce ORQ CX, SI
0x10734d6 48b90000000000000080 MOVQ $0x8000000000000000, CX
0x10734e0 4809f1 ORQ SI, CX
0x10734e3 488908 MOVQ CX, 0(AX)
t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
0x107348b 4881e2ffffff3f ANDQ $0x3fffffff, DX
0x1073492 48c1e61e SHLQ $0x1e, SI
0x1073496 4809f2 ORQ SI, DX
0x1073499 480fbaea3f BTSQ $0x3f, DX
0x107349e 488910 MOVQ DX, 0(AX)
Go1 benchmarks seem unaffected, and I would be surprised
otherwise:
name old time/op new time/op delta
BinaryTree17-4 2.64s ± 4% 2.56s ± 9% -2.92% (p=0.008 n=9+9)
Fannkuch11-4 2.90s ± 1% 2.95s ± 3% +1.76% (p=0.010 n=10+9)
FmtFprintfEmpty-4 35.3ns ± 1% 34.5ns ± 2% -2.34% (p=0.004 n=9+8)
FmtFprintfString-4 57.0ns ± 1% 58.4ns ± 5% +2.52% (p=0.029 n=9+10)
FmtFprintfInt-4 59.8ns ± 3% 59.8ns ± 6% ~ (p=0.565 n=10+10)
FmtFprintfIntInt-4 93.9ns ± 3% 91.2ns ± 5% -2.94% (p=0.014 n=10+9)
FmtFprintfPrefixedInt-4 107ns ± 6% 104ns ± 6% ~ (p=0.099 n=10+10)
FmtFprintfFloat-4 187ns ± 3% 188ns ± 3% ~ (p=0.505 n=10+9)
FmtManyArgs-4 410ns ± 1% 415ns ± 6% ~ (p=0.649 n=8+10)
GobDecode-4 5.30ms ± 3% 5.27ms ± 3% ~ (p=0.436 n=10+10)
GobEncode-4 4.62ms ± 5% 4.47ms ± 2% -3.24% (p=0.001 n=9+10)
Gzip-4 197ms ± 4% 193ms ± 3% ~ (p=0.123 n=10+10)
Gunzip-4 30.4ms ± 3% 30.1ms ± 3% ~ (p=0.481 n=10+10)
HTTPClientServer-4 76.3µs ± 1% 76.0µs ± 1% ~ (p=0.236 n=8+9)
JSONEncode-4 10.5ms ± 9% 10.3ms ± 3% ~ (p=0.280 n=10+10)
JSONDecode-4 42.3ms ±10% 41.3ms ± 2% ~ (p=0.053 n=9+10)
Mandelbrot200-4 3.80ms ± 2% 3.72ms ± 2% -2.15% (p=0.001 n=9+10)
GoParse-4 2.88ms ±10% 2.81ms ± 2% ~ (p=0.247 n=10+10)
RegexpMatchEasy0_32-4 69.5ns ± 4% 68.6ns ± 2% ~ (p=0.171 n=10+10)
RegexpMatchEasy0_1K-4 165ns ± 3% 162ns ± 3% ~ (p=0.137 n=10+10)
RegexpMatchEasy1_32-4 65.7ns ± 6% 64.4ns ± 2% -2.02% (p=0.037 n=10+10)
RegexpMatchEasy1_1K-4 278ns ± 2% 279ns ± 3% ~ (p=0.991 n=8+9)
RegexpMatchMedium_32-4 99.3ns ± 3% 98.5ns ± 4% ~ (p=0.457 n=10+9)
RegexpMatchMedium_1K-4 30.1µs ± 1% 30.4µs ± 2% ~ (p=0.173 n=8+10)
RegexpMatchHard_32-4 1.40µs ± 2% 1.41µs ± 4% ~ (p=0.565 n=10+10)
RegexpMatchHard_1K-4 42.5µs ± 1% 41.5µs ± 3% -2.13% (p=0.002 n=8+9)
Revcomp-4 332ms ± 4% 328ms ± 5% ~ (p=0.720 n=9+10)
Template-4 48.3ms ± 2% 49.6ms ± 3% +2.56% (p=0.002 n=8+10)
TimeParse-4 252ns ± 2% 249ns ± 3% ~ (p=0.116 n=9+10)
TimeFormat-4 262ns ± 4% 252ns ± 3% -4.01% (p=0.000 n=9+10)
name old speed new speed delta
GobDecode-4 145MB/s ± 3% 146MB/s ± 3% ~ (p=0.436 n=10+10)
GobEncode-4 166MB/s ± 5% 172MB/s ± 2% +3.28% (p=0.001 n=9+10)
Gzip-4 98.6MB/s ± 4% 100.4MB/s ± 3% ~ (p=0.123 n=10+10)
Gunzip-4 639MB/s ± 3% 645MB/s ± 3% ~ (p=0.481 n=10+10)
JSONEncode-4 185MB/s ± 8% 189MB/s ± 3% ~ (p=0.280 n=10+10)
JSONDecode-4 46.0MB/s ± 9% 47.0MB/s ± 2% +2.21% (p=0.046 n=9+10)
GoParse-4 20.1MB/s ± 9% 20.6MB/s ± 2% ~ (p=0.239 n=10+10)
RegexpMatchEasy0_32-4 460MB/s ± 4% 467MB/s ± 2% ~ (p=0.165 n=10+10)
RegexpMatchEasy0_1K-4 6.19GB/s ± 3% 6.28GB/s ± 3% ~ (p=0.165 n=10+10)
RegexpMatchEasy1_32-4 487MB/s ± 5% 497MB/s ± 2% +2.00% (p=0.043 n=10+10)
RegexpMatchEasy1_1K-4 3.67GB/s ± 2% 3.67GB/s ± 3% ~ (p=0.963 n=8+9)
RegexpMatchMedium_32-4 10.1MB/s ± 3% 10.1MB/s ± 4% ~ (p=0.435 n=10+9)
RegexpMatchMedium_1K-4 34.0MB/s ± 1% 33.7MB/s ± 2% ~ (p=0.173 n=8+10)
RegexpMatchHard_32-4 22.9MB/s ± 2% 22.7MB/s ± 4% ~ (p=0.565 n=10+10)
RegexpMatchHard_1K-4 24.0MB/s ± 3% 24.7MB/s ± 3% +2.64% (p=0.001 n=9+9)
Revcomp-4 766MB/s ± 4% 775MB/s ± 5% ~ (p=0.720 n=9+10)
Template-4 40.2MB/s ± 2% 39.2MB/s ± 3% -2.47% (p=0.002 n=8+10)
The rules match ~1800 times during all.bash.
Fixes #18943
Change-Id: I64be1ada34e89c486dfd935bf429b35652117ed4
Reviewed-on: https://go-review.googlesource.com/94766
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-24 02:38:50 +00:00
Giovanni Bajo
89ae7045f3
test: convert all math-related tests from asm_test
...
Change-Id: If542f0b5c5754e6eb2f9b302fe5a148ba9a57338
Reviewed-on: https://go-review.googlesource.com/98443
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-04 16:52:33 +00:00