Ben Shi
5aeecc4530
cmd/compile: optimize arm64's code with more shifted operations
...
This CL optimizes arm64's NEG/MVN/TST/CMN with a shifted operand.
1. The total size of pkg/android_arm64 decreases about 0.2KB, excluding
cmd/compile/ .
2. The go1 benchmark shows no regression, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 16.4s ± 1% 16.4s ± 1% ~ (p=0.914 n=29+29)
Fannkuch11-4 8.72s ± 0% 8.72s ± 0% ~ (p=0.274 n=30+29)
FmtFprintfEmpty-4 174ns ± 0% 174ns ± 0% ~ (all equal)
FmtFprintfString-4 370ns ± 0% 370ns ± 0% ~ (all equal)
FmtFprintfInt-4 419ns ± 0% 419ns ± 0% ~ (all equal)
FmtFprintfIntInt-4 672ns ± 1% 675ns ± 2% ~ (p=0.217 n=28+30)
FmtFprintfPrefixedInt-4 806ns ± 0% 806ns ± 0% ~ (p=0.402 n=30+28)
FmtFprintfFloat-4 1.09µs ± 0% 1.09µs ± 0% +0.02% (p=0.011 n=22+27)
FmtManyArgs-4 2.67µs ± 0% 2.68µs ± 0% ~ (p=0.279 n=29+30)
GobDecode-4 33.1ms ± 1% 33.1ms ± 0% ~ (p=0.052 n=28+29)
GobEncode-4 29.6ms ± 0% 29.6ms ± 0% +0.08% (p=0.013 n=28+29)
Gzip-4 1.38s ± 2% 1.39s ± 2% ~ (p=0.071 n=29+29)
Gunzip-4 139ms ± 0% 139ms ± 0% ~ (p=0.265 n=29+29)
HTTPClientServer-4 789µs ± 4% 785µs ± 4% ~ (p=0.206 n=29+28)
JSONEncode-4 49.7ms ± 0% 49.6ms ± 0% -0.24% (p=0.000 n=30+30)
JSONDecode-4 266ms ± 1% 267ms ± 1% +0.34% (p=0.000 n=30+30)
Mandelbrot200-4 16.6ms ± 0% 16.6ms ± 0% ~ (p=0.835 n=28+30)
GoParse-4 15.9ms ± 0% 15.8ms ± 0% -0.29% (p=0.000 n=27+30)
RegexpMatchEasy0_32-4 380ns ± 0% 381ns ± 0% +0.18% (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4 1.18µs ± 0% 1.19µs ± 0% +0.23% (p=0.000 n=30+30)
RegexpMatchEasy1_32-4 357ns ± 0% 358ns ± 0% +0.28% (p=0.000 n=29+29)
RegexpMatchEasy1_1K-4 2.04µs ± 0% 2.04µs ± 0% +0.06% (p=0.006 n=30+30)
RegexpMatchMedium_32-4 589ns ± 0% 590ns ± 0% +0.24% (p=0.000 n=28+30)
RegexpMatchMedium_1K-4 162µs ± 0% 162µs ± 0% -0.01% (p=0.027 n=26+29)
RegexpMatchHard_32-4 9.58µs ± 0% 9.58µs ± 0% ~ (p=0.935 n=30+30)
RegexpMatchHard_1K-4 287µs ± 0% 287µs ± 0% ~ (p=0.387 n=29+30)
Revcomp-4 2.50s ± 0% 2.50s ± 0% -0.10% (p=0.020 n=28+28)
Template-4 310ms ± 0% 310ms ± 1% ~ (p=0.406 n=30+30)
TimeParse-4 1.68µs ± 0% 1.68µs ± 0% +0.03% (p=0.014 n=30+17)
TimeFormat-4 1.65µs ± 0% 1.66µs ± 0% +0.32% (p=0.000 n=27+29)
[Geo mean] 247µs 247µs +0.05%
name old speed new speed delta
GobDecode-4 23.2MB/s ± 0% 23.2MB/s ± 0% -0.08% (p=0.032 n=27+29)
GobEncode-4 26.0MB/s ± 0% 25.9MB/s ± 0% -0.10% (p=0.011 n=29+29)
Gzip-4 14.1MB/s ± 2% 14.0MB/s ± 2% ~ (p=0.081 n=29+29)
Gunzip-4 139MB/s ± 0% 139MB/s ± 0% ~ (p=0.290 n=29+29)
JSONEncode-4 39.0MB/s ± 0% 39.1MB/s ± 0% +0.25% (p=0.000 n=29+30)
JSONDecode-4 7.30MB/s ± 1% 7.28MB/s ± 1% -0.33% (p=0.000 n=30+30)
GoParse-4 3.65MB/s ± 0% 3.66MB/s ± 0% +0.29% (p=0.000 n=27+30)
RegexpMatchEasy0_32-4 84.1MB/s ± 0% 84.0MB/s ± 0% -0.17% (p=0.000 n=30+28)
RegexpMatchEasy0_1K-4 864MB/s ± 0% 862MB/s ± 0% -0.24% (p=0.000 n=30+30)
RegexpMatchEasy1_32-4 89.5MB/s ± 0% 89.3MB/s ± 0% -0.18% (p=0.000 n=28+24)
RegexpMatchEasy1_1K-4 502MB/s ± 0% 502MB/s ± 0% -0.05% (p=0.008 n=30+29)
RegexpMatchMedium_32-4 1.70MB/s ± 0% 1.69MB/s ± 0% -0.59% (p=0.000 n=29+30)
RegexpMatchMedium_1K-4 6.31MB/s ± 0% 6.31MB/s ± 0% +0.05% (p=0.005 n=30+26)
RegexpMatchHard_32-4 3.34MB/s ± 0% 3.34MB/s ± 0% ~ (all equal)
RegexpMatchHard_1K-4 3.57MB/s ± 0% 3.57MB/s ± 0% ~ (all equal)
Revcomp-4 102MB/s ± 0% 102MB/s ± 0% +0.10% (p=0.022 n=28+28)
Template-4 6.26MB/s ± 0% 6.26MB/s ± 1% ~ (p=0.768 n=30+30)
[Geo mean] 24.2MB/s 24.1MB/s -0.08%
Change-Id: I494f9db7f8a568a00e9c74ae25086a58b2221683
Reviewed-on: https://go-review.googlesource.com/137976
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-28 15:05:17 +00:00
fanzha02
a19a83c8ef
cmd/compile: optimize math.Float64(32)bits and math.Float64(32)frombits on arm64
...
Use float <-> int register moves without conversion instead of stores
and loads to move float <-> int values.
Math package benchmark results.
name old time/op new time/op delta
Acosh 153ns ± 0% 147ns ± 0% -3.92% (p=0.000 n=10+10)
Asinh 183ns ± 0% 177ns ± 0% -3.28% (p=0.000 n=10+10)
Atanh 157ns ± 0% 155ns ± 0% -1.27% (p=0.000 n=10+10)
Atan2 118ns ± 0% 117ns ± 1% -0.59% (p=0.003 n=10+10)
Cbrt 119ns ± 0% 114ns ± 0% -4.20% (p=0.000 n=10+10)
Copysign 7.51ns ± 0% 6.51ns ± 0% -13.32% (p=0.000 n=9+10)
Cos 73.1ns ± 0% 70.6ns ± 0% -3.42% (p=0.000 n=10+10)
Cosh 119ns ± 0% 121ns ± 0% +1.68% (p=0.000 n=10+9)
ExpGo 154ns ± 0% 149ns ± 0% -3.05% (p=0.000 n=9+10)
Expm1 101ns ± 0% 99ns ± 0% -1.88% (p=0.000 n=10+10)
Exp2Go 150ns ± 0% 146ns ± 0% -2.67% (p=0.000 n=10+10)
Abs 7.01ns ± 0% 6.01ns ± 0% -14.27% (p=0.000 n=10+9)
Mod 234ns ± 0% 212ns ± 0% -9.40% (p=0.000 n=9+10)
Frexp 34.5ns ± 0% 30.0ns ± 0% -13.04% (p=0.000 n=10+10)
Gamma 112ns ± 0% 111ns ± 0% -0.89% (p=0.000 n=10+10)
Hypot 73.6ns ± 0% 68.6ns ± 0% -6.79% (p=0.000 n=10+10)
HypotGo 77.1ns ± 0% 72.1ns ± 0% -6.49% (p=0.000 n=10+10)
Ilogb 31.0ns ± 0% 28.0ns ± 0% -9.68% (p=0.000 n=10+10)
J0 437ns ± 0% 434ns ± 0% -0.62% (p=0.000 n=10+10)
J1 433ns ± 0% 431ns ± 0% -0.46% (p=0.000 n=10+10)
Jn 927ns ± 0% 922ns ± 0% -0.54% (p=0.000 n=10+10)
Ldexp 41.5ns ± 0% 37.0ns ± 0% -10.84% (p=0.000 n=9+10)
Log 124ns ± 0% 118ns ± 0% -4.84% (p=0.000 n=10+9)
Logb 34.0ns ± 0% 32.0ns ± 0% -5.88% (p=0.000 n=10+10)
Log1p 110ns ± 0% 108ns ± 0% -1.82% (p=0.000 n=10+10)
Log10 136ns ± 0% 132ns ± 0% -2.94% (p=0.000 n=10+10)
Log2 51.6ns ± 0% 47.1ns ± 0% -8.72% (p=0.000 n=10+10)
Nextafter32 33.0ns ± 0% 30.5ns ± 0% -7.58% (p=0.000 n=10+10)
Nextafter64 29.0ns ± 0% 26.5ns ± 0% -8.62% (p=0.000 n=10+10)
PowInt 169ns ± 0% 160ns ± 0% -5.33% (p=0.000 n=10+10)
PowFrac 375ns ± 0% 361ns ± 0% -3.73% (p=0.000 n=10+10)
RoundToEven 14.0ns ± 0% 12.5ns ± 0% -10.71% (p=0.000 n=10+10)
Remainder 206ns ± 0% 192ns ± 0% -6.80% (p=0.000 n=10+9)
Signbit 6.01ns ± 0% 5.51ns ± 0% -8.32% (p=0.000 n=10+9)
Sin 70.1ns ± 0% 69.6ns ± 0% -0.71% (p=0.000 n=10+10)
Sincos 99.1ns ± 0% 99.6ns ± 0% +0.50% (p=0.000 n=9+10)
SqrtGoLatency 178ns ± 0% 146ns ± 0% -17.70% (p=0.000 n=8+10)
SqrtPrime 9.19µs ± 0% 9.20µs ± 0% +0.01% (p=0.000 n=9+9)
Tanh 125ns ± 1% 127ns ± 0% +1.36% (p=0.000 n=10+10)
Y0 428ns ± 0% 426ns ± 0% -0.47% (p=0.000 n=10+10)
Y1 431ns ± 0% 429ns ± 0% -0.46% (p=0.000 n=10+9)
Yn 906ns ± 0% 901ns ± 0% -0.55% (p=0.000 n=10+10)
Float64bits 4.50ns ± 0% 3.50ns ± 0% -22.22% (p=0.000 n=10+10)
Float64frombits 4.00ns ± 0% 3.50ns ± 0% -12.50% (p=0.000 n=10+9)
Float32bits 4.50ns ± 0% 3.50ns ± 0% -22.22% (p=0.002 n=8+10)
Float32frombits 4.00ns ± 0% 3.50ns ± 0% -12.50% (p=0.000 n=10+10)
Change-Id: Iba829e15d5624962fe0c699139ea783efeefabc2
Reviewed-on: https://go-review.googlesource.com/129715
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-09-17 20:49:04 +00:00
erifan01
8149db4f64
cmd/compile: intrinsify math.RoundToEven and math.Abs on arm64
...
math.RoundToEven can be done by one arm64 instruction FRINTND, intrinsify it to improve performance.
The current pure Go implementation of the function Abs is translated into five instructions on arm64:
str, ldr, and, str, ldr. The intrinsic implementation requires only one instruction, so in terms of
performance, intrinsify it is worthwhile.
Benchmarks:
name old time/op new time/op delta
Abs-8 3.50ns ± 0% 1.50ns ± 0% -57.14% (p=0.000 n=10+10)
RoundToEven-8 9.26ns ± 0% 1.50ns ± 0% -83.80% (p=0.000 n=10+10)
Change-Id: I9456b26ab282b544dfac0154fc86f17aed96ac3d
Reviewed-on: https://go-review.googlesource.com/116535
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-09-13 14:52:51 +00:00
erifan01
204cc14bdd
cmd/compile: implement non-constant rotates using ROR on arm64
...
Add some rules to match the Go code like:
y &= 63
x << y | x >> (64-y)
or
y &= 63
x >> y | x << (64-y)
as a ROR instruction. Make math/bits.RotateLeft faster on arm64.
Extends CL 132435 to arm64.
Benchmarks of math/bits.RotateLeftxxN:
name old time/op new time/op delta
RotateLeft-8 3.548750ns +- 1% 2.003750ns +- 0% -43.54% (p=0.000 n=8+8)
RotateLeft8-8 3.925000ns +- 0% 3.925000ns +- 0% ~ (p=1.000 n=8+8)
RotateLeft16-8 3.925000ns +- 0% 3.927500ns +- 0% ~ (p=0.608 n=8+8)
RotateLeft32-8 3.925000ns +- 0% 2.002500ns +- 0% -48.98% (p=0.000 n=8+8)
RotateLeft64-8 3.536250ns +- 0% 2.003750ns +- 0% -43.34% (p=0.000 n=8+8)
Change-Id: I77622cd7f39b917427e060647321f5513973232c
Reviewed-on: https://go-review.googlesource.com/122542
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-07 14:52:02 +00:00
Ben Shi
0e9f1de0b7
cmd/compile: optimize arm64's comparison
...
Add more optimization with TST/CMN.
1. A tiny benchmark shows more than 12% improvement.
TSTCMN-4 378µs ± 0% 332µs ± 0% -12.15% (p=0.000 n=30+27)
(https://github.com/benshi001/ugo1/blob/master/tstcmn_test.go )
2. There is little regression in the go1 benchmark, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 19.1s ± 0% 19.1s ± 0% ~ (p=0.994 n=28+29)
Fannkuch11-4 10.0s ± 0% 10.0s ± 0% ~ (p=0.198 n=30+25)
FmtFprintfEmpty-4 233ns ± 0% 233ns ± 0% +0.14% (p=0.002 n=24+30)
FmtFprintfString-4 428ns ± 0% 428ns ± 0% ~ (all equal)
FmtFprintfInt-4 472ns ± 0% 472ns ± 0% ~ (all equal)
FmtFprintfIntInt-4 725ns ± 0% 725ns ± 0% ~ (all equal)
FmtFprintfPrefixedInt-4 889ns ± 0% 888ns ± 0% ~ (p=0.632 n=28+30)
FmtFprintfFloat-4 1.20µs ± 0% 1.20µs ± 0% +0.05% (p=0.001 n=18+30)
FmtManyArgs-4 3.00µs ± 0% 2.99µs ± 0% -0.07% (p=0.001 n=27+30)
GobDecode-4 42.1ms ± 0% 42.2ms ± 0% +0.29% (p=0.000 n=28+28)
GobEncode-4 38.6ms ± 9% 38.8ms ± 9% ~ (p=0.912 n=30+30)
Gzip-4 2.07s ± 1% 2.05s ± 1% -0.64% (p=0.000 n=29+30)
Gunzip-4 175ms ± 0% 175ms ± 0% -0.15% (p=0.001 n=30+30)
HTTPClientServer-4 872µs ± 5% 880µs ± 6% ~ (p=0.196 n=30+29)
JSONEncode-4 88.5ms ± 1% 89.8ms ± 1% +1.49% (p=0.000 n=23+24)
JSONDecode-4 393ms ± 1% 390ms ± 1% -0.89% (p=0.000 n=28+30)
Mandelbrot200-4 19.5ms ± 0% 19.5ms ± 0% ~ (p=0.405 n=29+28)
GoParse-4 19.9ms ± 0% 20.0ms ± 0% +0.27% (p=0.000 n=30+30)
RegexpMatchEasy0_32-4 431ns ± 0% 431ns ± 0% ~ (p=1.000 n=30+30)
RegexpMatchEasy0_1K-4 1.61µs ± 0% 1.61µs ± 0% ~ (p=0.527 n=26+26)
RegexpMatchEasy1_32-4 443ns ± 0% 443ns ± 0% ~ (all equal)
RegexpMatchEasy1_1K-4 2.58µs ± 1% 2.58µs ± 1% ~ (p=0.578 n=27+25)
RegexpMatchMedium_32-4 740ns ± 0% 740ns ± 0% ~ (p=0.357 n=30+30)
RegexpMatchMedium_1K-4 223µs ± 0% 223µs ± 0% +0.16% (p=0.000 n=30+29)
RegexpMatchHard_32-4 12.3µs ± 0% 12.3µs ± 0% ~ (p=0.236 n=27+27)
RegexpMatchHard_1K-4 371µs ± 0% 371µs ± 0% +0.09% (p=0.000 n=30+27)
Revcomp-4 2.85s ± 0% 2.85s ± 0% ~ (p=0.057 n=28+25)
Template-4 408ms ± 1% 409ms ± 1% ~ (p=0.117 n=29+29)
TimeParse-4 1.93µs ± 0% 1.93µs ± 0% ~ (p=0.535 n=29+28)
TimeFormat-4 1.99µs ± 0% 1.99µs ± 0% ~ (p=0.168 n=29+28)
[Geo mean] 306µs 307µs +0.07%
name old speed new speed delta
GobDecode-4 18.3MB/s ± 0% 18.2MB/s ± 0% -0.31% (p=0.000 n=28+29)
GobEncode-4 19.9MB/s ± 8% 19.8MB/s ± 9% ~ (p=0.923 n=30+30)
Gzip-4 9.39MB/s ± 1% 9.45MB/s ± 1% +0.65% (p=0.000 n=29+30)
Gunzip-4 111MB/s ± 0% 111MB/s ± 0% +0.15% (p=0.001 n=30+30)
JSONEncode-4 21.9MB/s ± 1% 21.6MB/s ± 1% -1.45% (p=0.000 n=23+23)
JSONDecode-4 4.94MB/s ± 1% 4.98MB/s ± 1% +0.84% (p=0.000 n=27+30)
GoParse-4 2.91MB/s ± 0% 2.90MB/s ± 0% -0.34% (p=0.000 n=21+22)
RegexpMatchEasy0_32-4 74.1MB/s ± 0% 74.1MB/s ± 0% ~ (p=0.469 n=29+28)
RegexpMatchEasy0_1K-4 634MB/s ± 0% 634MB/s ± 0% ~ (p=0.978 n=24+28)
RegexpMatchEasy1_32-4 72.2MB/s ± 0% 72.2MB/s ± 0% ~ (p=0.064 n=27+29)
RegexpMatchEasy1_1K-4 396MB/s ± 1% 396MB/s ± 1% ~ (p=0.583 n=27+25)
RegexpMatchMedium_32-4 1.35MB/s ± 0% 1.35MB/s ± 0% ~ (all equal)
RegexpMatchMedium_1K-4 4.60MB/s ± 0% 4.59MB/s ± 0% -0.14% (p=0.000 n=30+26)
RegexpMatchHard_32-4 2.61MB/s ± 0% 2.61MB/s ± 0% ~ (all equal)
RegexpMatchHard_1K-4 2.76MB/s ± 0% 2.76MB/s ± 0% ~ (all equal)
Revcomp-4 89.1MB/s ± 0% 89.1MB/s ± 0% ~ (p=0.059 n=28+25)
Template-4 4.75MB/s ± 1% 4.75MB/s ± 1% ~ (p=0.106 n=29+29)
[Geo mean] 18.3MB/s 18.3MB/s -0.07%
Change-Id: I3cd76ce63e84b0c3cebabf9fa3573b76a7343899
Reviewed-on: https://go-review.googlesource.com/124935
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-05 02:51:28 +00:00
Ben Shi
b444215116
cmd/compile: optimize ARM64's code with MADD/MSUB
...
MADD does MUL-ADD in a single instruction, and MSUB does the
similiar simplification for MUL-SUB.
The CL implements the optimization with MADD/MSUB.
1. The total size of pkg/android_arm64/ decreases about 20KB,
excluding cmd/compile/.
2. The go1 benchmark shows a little improvement for RegexpMatchHard_32-4
and Template-4, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 16.3s ± 1% 16.5s ± 1% +1.41% (p=0.000 n=26+28)
Fannkuch11-4 8.79s ± 1% 8.76s ± 0% -0.36% (p=0.000 n=26+28)
FmtFprintfEmpty-4 172ns ± 0% 172ns ± 0% ~ (all equal)
FmtFprintfString-4 362ns ± 1% 364ns ± 0% +0.55% (p=0.000 n=30+30)
FmtFprintfInt-4 416ns ± 0% 416ns ± 0% ~ (p=0.099 n=22+30)
FmtFprintfIntInt-4 655ns ± 1% 660ns ± 1% +0.76% (p=0.000 n=30+30)
FmtFprintfPrefixedInt-4 810ns ± 0% 809ns ± 0% -0.08% (p=0.009 n=29+29)
FmtFprintfFloat-4 1.08µs ± 0% 1.09µs ± 0% +0.61% (p=0.000 n=30+29)
FmtManyArgs-4 2.70µs ± 0% 2.69µs ± 0% -0.23% (p=0.000 n=29+28)
GobDecode-4 32.2ms ± 1% 32.1ms ± 1% -0.39% (p=0.000 n=27+26)
GobEncode-4 27.4ms ± 2% 27.4ms ± 1% ~ (p=0.864 n=28+28)
Gzip-4 1.53s ± 1% 1.52s ± 1% -0.30% (p=0.031 n=29+29)
Gunzip-4 146ms ± 0% 146ms ± 0% -0.14% (p=0.001 n=25+30)
HTTPClientServer-4 1.00ms ± 4% 0.98ms ± 6% -1.65% (p=0.001 n=29+30)
JSONEncode-4 67.3ms ± 1% 67.2ms ± 1% ~ (p=0.520 n=28+28)
JSONDecode-4 329ms ± 5% 330ms ± 4% ~ (p=0.142 n=30+30)
Mandelbrot200-4 17.3ms ± 0% 17.3ms ± 0% ~ (p=0.055 n=26+29)
GoParse-4 16.9ms ± 1% 17.0ms ± 1% +0.82% (p=0.000 n=30+30)
RegexpMatchEasy0_32-4 382ns ± 0% 382ns ± 0% ~ (all equal)
RegexpMatchEasy0_1K-4 1.33µs ± 0% 1.33µs ± 0% -0.25% (p=0.000 n=30+27)
RegexpMatchEasy1_32-4 361ns ± 0% 361ns ± 0% -0.08% (p=0.002 n=30+28)
RegexpMatchEasy1_1K-4 2.11µs ± 0% 2.09µs ± 0% -0.54% (p=0.000 n=30+29)
RegexpMatchMedium_32-4 594ns ± 0% 592ns ± 0% -0.32% (p=0.000 n=30+30)
RegexpMatchMedium_1K-4 173µs ± 0% 172µs ± 0% -0.77% (p=0.000 n=29+27)
RegexpMatchHard_32-4 10.4µs ± 0% 10.1µs ± 0% -3.63% (p=0.000 n=28+27)
RegexpMatchHard_1K-4 306µs ± 0% 301µs ± 0% -1.64% (p=0.000 n=29+30)
Revcomp-4 2.51s ± 1% 2.52s ± 0% +0.18% (p=0.017 n=26+27)
Template-4 394ms ± 3% 382ms ± 3% -3.22% (p=0.000 n=28+28)
TimeParse-4 1.67µs ± 0% 1.67µs ± 0% +0.05% (p=0.030 n=27+30)
TimeFormat-4 1.72µs ± 0% 1.70µs ± 0% -0.79% (p=0.000 n=28+26)
[Geo mean] 259µs 259µs -0.33%
name old speed new speed delta
GobDecode-4 23.8MB/s ± 1% 23.9MB/s ± 1% +0.40% (p=0.001 n=27+26)
GobEncode-4 28.0MB/s ± 2% 28.0MB/s ± 1% ~ (p=0.863 n=28+28)
Gzip-4 12.7MB/s ± 1% 12.7MB/s ± 1% +0.32% (p=0.026 n=29+29)
Gunzip-4 133MB/s ± 0% 133MB/s ± 0% +0.15% (p=0.001 n=24+30)
JSONEncode-4 28.8MB/s ± 1% 28.9MB/s ± 1% ~ (p=0.475 n=28+28)
JSONDecode-4 5.89MB/s ± 4% 5.87MB/s ± 5% ~ (p=0.174 n=29+30)
GoParse-4 3.43MB/s ± 0% 3.40MB/s ± 1% -0.83% (p=0.000 n=28+30)
RegexpMatchEasy0_32-4 83.6MB/s ± 0% 83.6MB/s ± 0% ~ (p=0.848 n=28+29)
RegexpMatchEasy0_1K-4 768MB/s ± 0% 770MB/s ± 0% +0.25% (p=0.000 n=30+27)
RegexpMatchEasy1_32-4 88.5MB/s ± 0% 88.5MB/s ± 0% ~ (p=0.086 n=29+29)
RegexpMatchEasy1_1K-4 486MB/s ± 0% 489MB/s ± 0% +0.54% (p=0.000 n=30+29)
RegexpMatchMedium_32-4 1.68MB/s ± 0% 1.69MB/s ± 0% +0.60% (p=0.000 n=30+23)
RegexpMatchMedium_1K-4 5.90MB/s ± 0% 5.95MB/s ± 0% +0.85% (p=0.000 n=18+20)
RegexpMatchHard_32-4 3.07MB/s ± 0% 3.18MB/s ± 0% +3.72% (p=0.000 n=29+26)
RegexpMatchHard_1K-4 3.35MB/s ± 0% 3.40MB/s ± 0% +1.69% (p=0.000 n=30+30)
Revcomp-4 101MB/s ± 0% 101MB/s ± 0% -0.18% (p=0.018 n=26+27)
Template-4 4.92MB/s ± 4% 5.09MB/s ± 3% +3.31% (p=0.000 n=28+28)
[Geo mean] 22.4MB/s 22.6MB/s +0.62%
Change-Id: I8f304b272785739f57b3c8f736316f658f8c1b2a
Reviewed-on: https://go-review.googlesource.com/129119
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-04 20:41:58 +00:00
Ben Shi
3ca3e89bb6
cmd/compile: optimize arm64 with indexed FP load/store
...
The FP load/store on arm64 have register indexed forms. And this
CL implements this optimization.
1. The total size of pkg/android_arm64 (excluding cmd/compile)
decreases about 400 bytes.
2. There is no regression in the go1 benchmark, the test case
GobEncode even gets slight improvement, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 19.0s ± 0% 19.0s ± 1% ~ (p=0.817 n=29+29)
Fannkuch11-4 9.94s ± 0% 9.95s ± 0% +0.03% (p=0.010 n=24+30)
FmtFprintfEmpty-4 233ns ± 0% 233ns ± 0% ~ (all equal)
FmtFprintfString-4 427ns ± 0% 427ns ± 0% ~ (p=0.649 n=30+30)
FmtFprintfInt-4 471ns ± 0% 471ns ± 0% ~ (all equal)
FmtFprintfIntInt-4 730ns ± 0% 730ns ± 0% ~ (all equal)
FmtFprintfPrefixedInt-4 889ns ± 0% 889ns ± 0% ~ (all equal)
FmtFprintfFloat-4 1.21µs ± 0% 1.21µs ± 0% +0.04% (p=0.012 n=20+30)
FmtManyArgs-4 2.99µs ± 0% 2.99µs ± 0% ~ (p=0.651 n=29+29)
GobDecode-4 42.4ms ± 1% 42.3ms ± 1% -0.27% (p=0.001 n=29+28)
GobEncode-4 37.8ms ±11% 36.0ms ± 0% -4.67% (p=0.000 n=30+26)
Gzip-4 1.98s ± 1% 1.96s ± 1% -1.26% (p=0.000 n=30+30)
Gunzip-4 175ms ± 0% 175ms ± 0% ~ (p=0.988 n=29+29)
HTTPClientServer-4 854µs ± 5% 860µs ± 5% ~ (p=0.236 n=28+29)
JSONEncode-4 88.8ms ± 0% 87.9ms ± 0% -1.00% (p=0.000 n=24+26)
JSONDecode-4 390ms ± 1% 392ms ± 2% +0.48% (p=0.025 n=30+30)
Mandelbrot200-4 19.5ms ± 0% 19.5ms ± 0% ~ (p=0.894 n=24+29)
GoParse-4 20.3ms ± 0% 20.1ms ± 1% -0.94% (p=0.000 n=27+26)
RegexpMatchEasy0_32-4 451ns ± 0% 451ns ± 0% ~ (p=0.578 n=30+30)
RegexpMatchEasy0_1K-4 1.63µs ± 0% 1.63µs ± 0% ~ (p=0.298 n=30+28)
RegexpMatchEasy1_32-4 431ns ± 0% 434ns ± 0% +0.67% (p=0.000 n=30+29)
RegexpMatchEasy1_1K-4 2.60µs ± 0% 2.64µs ± 0% +1.36% (p=0.000 n=28+26)
RegexpMatchMedium_32-4 744ns ± 0% 744ns ± 0% ~ (p=0.474 n=29+29)
RegexpMatchMedium_1K-4 223µs ± 0% 223µs ± 0% -0.08% (p=0.038 n=26+30)
RegexpMatchHard_32-4 12.2µs ± 0% 12.3µs ± 0% +0.27% (p=0.000 n=29+30)
RegexpMatchHard_1K-4 373µs ± 0% 373µs ± 0% ~ (p=0.219 n=29+28)
Revcomp-4 2.84s ± 0% 2.84s ± 0% ~ (p=0.130 n=28+28)
Template-4 394ms ± 1% 392ms ± 1% -0.52% (p=0.001 n=30+30)
TimeParse-4 1.93µs ± 0% 1.93µs ± 0% ~ (p=0.587 n=29+30)
TimeFormat-4 2.00µs ± 0% 2.00µs ± 0% +0.07% (p=0.001 n=28+27)
[Geo mean] 306µs 305µs -0.17%
name old speed new speed delta
GobDecode-4 18.1MB/s ± 1% 18.2MB/s ± 1% +0.27% (p=0.001 n=29+28)
GobEncode-4 20.3MB/s ±10% 21.3MB/s ± 0% +4.64% (p=0.000 n=30+26)
Gzip-4 9.79MB/s ± 1% 9.91MB/s ± 1% +1.28% (p=0.000 n=30+30)
Gunzip-4 111MB/s ± 0% 111MB/s ± 0% ~ (p=0.988 n=29+29)
JSONEncode-4 21.8MB/s ± 0% 22.1MB/s ± 0% +1.02% (p=0.000 n=24+26)
JSONDecode-4 4.97MB/s ± 1% 4.95MB/s ± 2% -0.45% (p=0.031 n=30+30)
GoParse-4 2.85MB/s ± 1% 2.88MB/s ± 1% +1.03% (p=0.000 n=30+26)
RegexpMatchEasy0_32-4 70.9MB/s ± 0% 70.9MB/s ± 0% ~ (p=0.904 n=29+28)
RegexpMatchEasy0_1K-4 627MB/s ± 0% 627MB/s ± 0% ~ (p=0.156 n=30+30)
RegexpMatchEasy1_32-4 74.2MB/s ± 0% 73.7MB/s ± 0% -0.67% (p=0.000 n=30+29)
RegexpMatchEasy1_1K-4 393MB/s ± 0% 388MB/s ± 0% -1.34% (p=0.000 n=28+26)
RegexpMatchMedium_32-4 1.34MB/s ± 0% 1.34MB/s ± 0% ~ (all equal)
RegexpMatchMedium_1K-4 4.59MB/s ± 0% 4.59MB/s ± 0% +0.07% (p=0.035 n=25+30)
RegexpMatchHard_32-4 2.61MB/s ± 0% 2.61MB/s ± 0% -0.11% (p=0.002 n=28+30)
RegexpMatchHard_1K-4 2.75MB/s ± 0% 2.75MB/s ± 0% +0.15% (p=0.001 n=30+24)
Revcomp-4 89.4MB/s ± 0% 89.4MB/s ± 0% ~ (p=0.140 n=28+28)
Template-4 4.93MB/s ± 1% 4.95MB/s ± 1% +0.51% (p=0.001 n=30+30)
[Geo mean] 18.4MB/s 18.4MB/s +0.37%
Change-Id: I9a6b521a971b21cfb51064e8e9b853cef8a1d071
Reviewed-on: https://go-review.googlesource.com/124636
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-08-28 02:37:18 +00:00
Ben Shi
096229b2ec
cmd/compile: add missing type information for some arm/arm64 rules
...
Some indexed load/store rules lack of type information, and this
CL adds that for them.
Change-Id: Icac315ccb83a2f5bf30b056d4667d5b59eb4e5e2
Reviewed-on: https://go-review.googlesource.com/128455
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-08-27 15:22:45 +00:00
Wei Xiao
0a7ac93c27
cmd/compile: improve atomic add intrinsics with ARMv8.1 new instruction
...
ARMv8.1 has added new instruction (LDADDAL) for atomic memory operations. This
CL improves existing atomic add intrinsics with the new instruction. Since the
new instruction is only guaranteed to be present after ARMv8.1, we guard its
usage with a conditional on CPU feature.
Performance result on ARMv8.1 machine:
name old time/op new time/op delta
Xadd-224 1.05µs ± 6% 0.02µs ± 4% -98.06% (p=0.000 n=10+8)
Xadd64-224 1.05µs ± 3% 0.02µs ±13% -98.10% (p=0.000 n=9+10)
[Geo mean] 1.05µs 0.02µs -98.08%
Performance result on ARMv8.0 machine:
name old time/op new time/op delta
Xadd-46 538ns ± 1% 541ns ± 1% +0.62% (p=0.000 n=9+9)
Xadd64-46 505ns ± 1% 508ns ± 0% +0.48% (p=0.003 n=9+8)
[Geo mean] 521ns 524ns +0.55%
Change-Id: If4b5d8d0e2d6f84fe1492a4f5de0789910ad0ee9
Reviewed-on: https://go-review.googlesource.com/81877
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-06-21 14:52:43 +00:00
Cherry Zhang
44b826bb28
cmd/compile: use a different register for updated value in AtomicAnd8/Or8 on ARM64
...
ARM64 manual says it is "constrained unpredictable" if the src
and dst registers of STLXRB are same, although it doesn't seem
to cause any problem on real hardwares so far. Fix by allocating
a different register to hold the updated value for
AtomicAnd8/Or8. We do this by making the ops returns <val,mem>
like AtomicAdd, although val will not be used elsewhere.
Fixes #25823 .
Change-Id: I735b9822f99877b3c7aee67a65e62b7278dc40df
Reviewed-on: https://go-review.googlesource.com/117976
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Wei Xiao <Wei.Xiao@arm.com>
2018-06-12 20:22:50 +00:00
Wei Xiao
bd8a88729c
cmd/compile: intrinsify runtime.getcallerpc on arm64
...
Add a compiler intrinsic for getcallerpc on arm64 for better code generation.
Change-Id: I897e670a2b8ffa1a8c2fdc638f5b2c44bda26318
Reviewed-on: https://go-review.googlesource.com/109276
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-30 13:29:14 +00:00
Ben Shi
aaf73c6d1e
cmd/compile: optimize ARM64 with shifted register indexed load/store
...
ARM64 supports efficient instructions which combine shift, addition, load/store
together. Such as "MOVD (R0)(R1<<3), R2" and "MOVWU R6, (R4)(R1<<2)".
This CL optimizes the compiler to emit such efficient instuctions. And below
is some test data.
1. binary size before/after
binary size change
pkg/linux_arm64 +80.1KB
pkg/tool/linux_arm64 +121.9KB
go -4.3KB
gofmt -64KB
2. go1 benchmark
There is big improvement for the test case Fannkuch11, and slight
improvement for sme others, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 43.9s ± 2% 44.0s ± 2% ~ (p=0.820 n=30+30)
Fannkuch11-4 30.6s ± 2% 24.5s ± 3% -19.93% (p=0.000 n=25+30)
FmtFprintfEmpty-4 500ns ± 0% 499ns ± 0% -0.11% (p=0.000 n=23+25)
FmtFprintfString-4 1.03µs ± 0% 1.04µs ± 3% ~ (p=0.065 n=29+30)
FmtFprintfInt-4 1.15µs ± 3% 1.15µs ± 4% -0.56% (p=0.000 n=30+30)
FmtFprintfIntInt-4 1.80µs ± 5% 1.82µs ± 0% ~ (p=0.094 n=30+24)
FmtFprintfPrefixedInt-4 2.17µs ± 5% 2.20µs ± 0% ~ (p=0.100 n=30+23)
FmtFprintfFloat-4 3.08µs ± 3% 3.09µs ± 4% ~ (p=0.123 n=30+30)
FmtManyArgs-4 7.41µs ± 4% 7.17µs ± 1% -3.26% (p=0.000 n=30+23)
GobDecode-4 93.7ms ± 0% 94.7ms ± 4% ~ (p=0.685 n=24+30)
GobEncode-4 78.7ms ± 7% 77.1ms ± 0% ~ (p=0.729 n=30+23)
Gzip-4 4.01s ± 0% 3.97s ± 5% -1.11% (p=0.037 n=24+30)
Gunzip-4 389ms ± 4% 384ms ± 0% ~ (p=0.155 n=30+23)
HTTPClientServer-4 536µs ± 1% 537µs ± 1% ~ (p=0.236 n=30+30)
JSONEncode-4 179ms ± 1% 182ms ± 6% ~ (p=0.763 n=24+30)
JSONDecode-4 843ms ± 0% 839ms ± 6% -0.42% (p=0.003 n=25+30)
Mandelbrot200-4 46.5ms ± 0% 46.5ms ± 0% +0.02% (p=0.000 n=26+26)
GoParse-4 44.3ms ± 6% 43.3ms ± 0% ~ (p=0.067 n=30+27)
RegexpMatchEasy0_32-4 1.07µs ± 7% 1.07µs ± 4% ~ (p=0.835 n=30+30)
RegexpMatchEasy0_1K-4 5.51µs ± 0% 5.49µs ± 0% -0.35% (p=0.000 n=23+26)
RegexpMatchEasy1_32-4 1.01µs ± 0% 1.02µs ± 4% +0.96% (p=0.014 n=24+30)
RegexpMatchEasy1_1K-4 7.43µs ± 0% 7.18µs ± 0% -3.41% (p=0.000 n=23+24)
RegexpMatchMedium_32-4 1.78µs ± 0% 1.81µs ± 4% +1.47% (p=0.012 n=23+30)
RegexpMatchMedium_1K-4 547µs ± 1% 542µs ± 3% -0.90% (p=0.003 n=24+30)
RegexpMatchHard_32-4 30.4µs ± 0% 29.7µs ± 0% -2.15% (p=0.000 n=19+23)
RegexpMatchHard_1K-4 913µs ± 0% 915µs ± 6% +0.25% (p=0.012 n=24+30)
Revcomp-4 6.32s ± 1% 6.42s ± 4% ~ (p=0.342 n=25+30)
Template-4 868ms ± 6% 878ms ± 6% +1.15% (p=0.000 n=30+30)
TimeParse-4 4.57µs ± 4% 4.59µs ± 3% +0.65% (p=0.010 n=29+30)
TimeFormat-4 4.51µs ± 0% 4.50µs ± 0% -0.27% (p=0.000 n=27+24)
[Geo mean] 695µs 689µs -0.92%
name old speed new speed delta
GobDecode-4 8.19MB/s ± 0% 8.12MB/s ± 4% ~ (p=0.680 n=24+30)
GobEncode-4 9.76MB/s ± 7% 9.96MB/s ± 0% ~ (p=0.616 n=30+23)
Gzip-4 4.84MB/s ± 0% 4.89MB/s ± 4% +1.16% (p=0.030 n=24+30)
Gunzip-4 49.9MB/s ± 4% 50.6MB/s ± 0% ~ (p=0.162 n=30+23)
JSONEncode-4 10.9MB/s ± 1% 10.7MB/s ± 6% ~ (p=0.575 n=24+30)
JSONDecode-4 2.30MB/s ± 0% 2.32MB/s ± 5% +0.72% (p=0.003 n=22+30)
GoParse-4 1.31MB/s ± 6% 1.34MB/s ± 0% +2.26% (p=0.002 n=30+27)
RegexpMatchEasy0_32-4 30.0MB/s ± 6% 30.0MB/s ± 4% ~ (p=1.000 n=30+30)
RegexpMatchEasy0_1K-4 186MB/s ± 0% 187MB/s ± 0% +0.35% (p=0.000 n=23+26)
RegexpMatchEasy1_32-4 31.8MB/s ± 0% 31.5MB/s ± 4% -0.92% (p=0.012 n=25+30)
RegexpMatchEasy1_1K-4 138MB/s ± 0% 143MB/s ± 0% +3.53% (p=0.000 n=23+24)
RegexpMatchMedium_32-4 560kB/s ± 0% 553kB/s ± 4% -1.19% (p=0.005 n=23+30)
RegexpMatchMedium_1K-4 1.87MB/s ± 0% 1.89MB/s ± 3% +1.04% (p=0.002 n=24+30)
RegexpMatchHard_32-4 1.05MB/s ± 0% 1.08MB/s ± 0% +2.40% (p=0.000 n=19+23)
RegexpMatchHard_1K-4 1.12MB/s ± 0% 1.12MB/s ± 5% +0.12% (p=0.006 n=25+30)
Revcomp-4 40.2MB/s ± 1% 39.6MB/s ± 4% ~ (p=0.242 n=25+30)
Template-4 2.24MB/s ± 6% 2.21MB/s ± 6% -1.15% (p=0.000 n=30+30)
[Geo mean] 7.87MB/s 7.91MB/s +0.44%
Change-Id: If374cb7abf83537aa0a176f73c0f736f7800db03
Reviewed-on: https://go-review.googlesource.com/108735
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-27 20:02:05 +00:00
Balaram Makam
f524268c40
cmd/compile: optimize ARM64 code with CMN/TST
...
Use CMN/TST to simplify comparisons. This can reduce the
register pressure by removing single def/use registers for example:
ADDW R0, R1, R8 -> CMNW R1, R0 ; CMN is an alias of ADDS.
CBZW R8, label -> BEQ label ; single def/use of R8 removed.
Little change in performance of go1 benchmark on Amberwing:
name old time/op new time/op delta
RegexpMatchEasy0_32 247ns ± 0% 246ns ± 0% -0.40% (p=0.008 n=5+5)
RegexpMatchEasy0_1K 581ns ± 0% 580ns ± 0% ~ (p=0.079 n=4+5)
RegexpMatchEasy1_32 244ns ± 0% 243ns ± 0% -0.41% (p=0.008 n=5+5)
RegexpMatchEasy1_1K 804ns ± 0% 806ns ± 0% +0.25% (p=0.016 n=5+4)
RegexpMatchMedium_32 313ns ± 0% 311ns ± 0% -0.64% (p=0.008 n=5+5)
RegexpMatchMedium_1K 52.2µs ± 0% 51.9µs ± 0% -0.51% (p=0.008 n=5+5)
RegexpMatchHard_32 2.76µs ± 3% 2.74µs ± 0% ~ (p=0.683 n=5+5)
RegexpMatchHard_1K 78.8µs ± 0% 78.9µs ± 0% +0.04% (p=0.008 n=5+5)
FmtFprintfEmpty 58.6ns ± 0% 57.7ns ± 0% -1.54% (p=0.008 n=5+5)
FmtFprintfString 118ns ± 0% 115ns ± 0% -2.54% (p=0.008 n=5+5)
FmtFprintfInt 119ns ± 0% 119ns ± 0% ~ (all equal)
FmtFprintfIntInt 192ns ± 0% 192ns ± 0% ~ (all equal)
FmtFprintfPrefixedInt 224ns ± 0% 205ns ± 0% -8.48% (p=0.008 n=5+5)
FmtFprintfFloat 336ns ± 0% 333ns ± 1% ~ (p=0.683 n=5+5)
FmtManyArgs 779ns ± 1% 760ns ± 1% -2.41% (p=0.008 n=5+5)
Gzip 437ms ± 0% 436ms ± 0% -0.27% (p=0.008 n=5+5)
HTTPClientServer 90.1µs ± 1% 91.1µs ± 0% +1.19% (p=0.008 n=5+5)
JSONEncode 20.1ms ± 0% 20.2ms ± 1% ~ (p=0.690 n=5+5)
JSONDecode 94.5ms ± 1% 94.1ms ± 1% ~ (p=0.095 n=5+5)
Mandelbrot200 5.37ms ± 0% 5.37ms ± 0% ~ (p=0.421 n=5+5)
TimeParse 450ns ± 0% 446ns ± 0% -0.89% (p=0.000 n=5+4)
TimeFormat 483ns ± 1% 473ns ± 0% -2.19% (p=0.008 n=5+5)
Template 90.6ms ± 0% 89.7ms ± 0% -0.93% (p=0.008 n=5+5)
GoParse 5.97ms ± 0% 6.01ms ± 0% +0.65% (p=0.008 n=5+5)
BinaryTree17 11.8s ± 0% 11.7s ± 0% -0.28% (p=0.016 n=5+5)
Revcomp 669ms ± 0% 669ms ± 0% ~ (p=0.222 n=5+5)
Fannkuch11 3.28s ± 0% 3.34s ± 0% +1.72% (p=0.016 n=4+5)
[Geo mean] 46.6µs 46.3µs -0.74%
name old speed new speed delta
RegexpMatchEasy0_32 129MB/s ± 0% 130MB/s ± 0% +0.32% (p=0.016 n=5+4)
RegexpMatchEasy0_1K 1.76GB/s ± 0% 1.76GB/s ± 0% +0.13% (p=0.016 n=4+5)
RegexpMatchEasy1_32 131MB/s ± 0% 132MB/s ± 0% +0.32% (p=0.008 n=5+5)
RegexpMatchEasy1_1K 1.27GB/s ± 0% 1.27GB/s ± 0% -0.24% (p=0.016 n=5+4)
RegexpMatchMedium_32 3.19MB/s ± 0% 3.21MB/s ± 0% +0.63% (p=0.008 n=5+5)
RegexpMatchMedium_1K 19.6MB/s ± 0% 19.7MB/s ± 0% +0.51% (p=0.029 n=4+4)
RegexpMatchHard_32 11.6MB/s ± 2% 11.7MB/s ± 0% ~ (p=1.000 n=5+5)
RegexpMatchHard_1K 13.0MB/s ± 0% 13.0MB/s ± 0% ~ (p=0.079 n=4+5)
Gzip 44.4MB/s ± 0% 44.5MB/s ± 0% +0.27% (p=0.008 n=5+5)
JSONEncode 96.4MB/s ± 0% 96.2MB/s ± 1% ~ (p=0.579 n=5+5)
JSONDecode 20.5MB/s ± 1% 20.6MB/s ± 1% ~ (p=0.111 n=5+5)
Template 21.4MB/s ± 0% 21.6MB/s ± 0% +0.94% (p=0.008 n=5+5)
GoParse 9.70MB/s ± 0% 9.63MB/s ± 0% -0.68% (p=0.016 n=4+5)
Revcomp 380MB/s ± 0% 380MB/s ± 0% ~ (p=0.222 n=5+5)
[Geo mean] 55.3MB/s 55.4MB/s +0.23%
Change-Id: I2e5338138991d9bc984e67b51212aa5d1b0f2a6b
Reviewed-on: https://go-review.googlesource.com/97335
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
2018-04-26 14:13:12 +00:00
Austin Clements
8871c930be
cmd/compile: don't lower OpConvert
...
Currently, each architecture lowers OpConvert to an arch-specific
OpXXXconvert. This is silly because OpConvert means the same thing on
all architectures and is logically a no-op that exists only to keep
track of conversions to and from unsafe.Pointer. Furthermore, lowering
it makes it harder to recognize in other analyses, particularly
liveness analysis.
This CL eliminates the lowering of OpConvert, leaving it as the
generic op until code generation time.
The main complexity here is that we still need to register-allocate
OpConvert operations. Currently, each arch's lowered OpConvert
specifies all GP registers in its register mask. Ideally, OpConvert
wouldn't affect value homing at all, and we could just copy the home
of OpConvert's source, but this can potentially home an OpConvert in a
LocalSlot, which neither regalloc nor stackalloc expect. Rather than
try to disentangle this assumption from regalloc and stackalloc, we
continue to register-allocate OpConvert, but teach regalloc that
OpConvert can be allocated to any allocatable GP register.
For #24543 .
Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6
Reviewed-on: https://go-review.googlesource.com/108496
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-04-20 18:46:39 +00:00
Ben Shi
34f5f8a580
cmd/compile: optimize ARM64 with register indexed load/store
...
ARM64 supports load/store instructions with a memory operand that
the address is calculated by base register + index register.
In this CL,
1. Some rules are added to the compile's ARM64 backend to emit
such efficient instructions.
2. A wrong rule of load combination is fixed.
The go1 benchmark does show improvement.
name old time/op new time/op delta
BinaryTree17-4 44.5s ± 2% 44.1s ± 1% -0.81% (p=0.000 n=28+29)
Fannkuch11-4 32.7s ± 3% 30.5s ± 0% -6.79% (p=0.000 n=30+26)
FmtFprintfEmpty-4 499ns ± 0% 506ns ± 5% +1.39% (p=0.003 n=25+30)
FmtFprintfString-4 1.07µs ± 0% 1.04µs ± 4% -3.17% (p=0.000 n=23+30)
FmtFprintfInt-4 1.15µs ± 4% 1.13µs ± 0% -1.55% (p=0.000 n=30+23)
FmtFprintfIntInt-4 1.77µs ± 4% 1.74µs ± 0% -1.71% (p=0.000 n=30+24)
FmtFprintfPrefixedInt-4 2.37µs ± 5% 2.12µs ± 0% -10.56% (p=0.000 n=30+23)
FmtFprintfFloat-4 3.03µs ± 1% 3.03µs ± 4% -0.13% (p=0.003 n=25+30)
FmtManyArgs-4 7.38µs ± 1% 7.43µs ± 4% +0.59% (p=0.003 n=25+30)
GobDecode-4 101ms ± 6% 95ms ± 5% -5.55% (p=0.000 n=30+30)
GobEncode-4 78.0ms ± 4% 78.8ms ± 6% +1.05% (p=0.000 n=30+30)
Gzip-4 4.25s ± 0% 4.27s ± 4% +0.45% (p=0.003 n=24+30)
Gunzip-4 428ms ± 1% 420ms ± 0% -1.88% (p=0.000 n=23+23)
HTTPClientServer-4 549µs ± 1% 541µs ± 1% -1.56% (p=0.000 n=29+29)
JSONEncode-4 194ms ± 0% 188ms ± 4% ~ (p=0.417 n=23+30)
JSONDecode-4 890ms ± 5% 831ms ± 0% -6.55% (p=0.000 n=30+23)
Mandelbrot200-4 47.3ms ± 2% 46.5ms ± 0% ~ (p=0.980 n=30+26)
GoParse-4 43.1ms ± 6% 43.8ms ± 6% +1.65% (p=0.000 n=30+30)
RegexpMatchEasy0_32-4 1.06µs ± 0% 1.07µs ± 3% ~ (p=0.092 n=23+30)
RegexpMatchEasy0_1K-4 5.53µs ± 0% 5.51µs ± 0% -0.24% (p=0.000 n=25+25)
RegexpMatchEasy1_32-4 1.02µs ± 3% 1.01µs ± 0% -1.27% (p=0.000 n=30+24)
RegexpMatchEasy1_1K-4 7.26µs ± 0% 7.33µs ± 0% +0.95% (p=0.000 n=23+26)
RegexpMatchMedium_32-4 1.84µs ± 7% 1.79µs ± 1% ~ (p=0.333 n=30+23)
RegexpMatchMedium_1K-4 553µs ± 0% 547µs ± 0% -1.14% (p=0.000 n=24+22)
RegexpMatchHard_32-4 30.8µs ± 1% 30.3µs ± 0% -1.40% (p=0.000 n=24+24)
RegexpMatchHard_1K-4 928µs ± 0% 929µs ± 5% +0.12% (p=0.013 n=23+30)
Revcomp-4 8.13s ± 4% 6.32s ± 1% -22.23% (p=0.000 n=30+23)
Template-4 899ms ± 6% 854ms ± 1% -5.01% (p=0.000 n=30+24)
TimeParse-4 4.66µs ± 4% 4.59µs ± 1% -1.57% (p=0.000 n=30+23)
TimeFormat-4 4.58µs ± 0% 4.61µs ± 0% +0.57% (p=0.000 n=26+24)
[Geo mean] 717µs 698µs -2.55%
name old speed new speed delta
GobDecode-4 7.63MB/s ± 6% 8.08MB/s ± 5% +5.88% (p=0.000 n=30+30)
GobEncode-4 9.85MB/s ± 4% 9.75MB/s ± 6% -1.04% (p=0.000 n=30+30)
Gzip-4 4.56MB/s ± 0% 4.55MB/s ± 4% -0.36% (p=0.003 n=24+30)
Gunzip-4 45.3MB/s ± 1% 46.2MB/s ± 0% +1.92% (p=0.000 n=23+23)
JSONEncode-4 10.0MB/s ± 0% 10.4MB/s ± 4% ~ (p=0.403 n=23+30)
JSONDecode-4 2.18MB/s ± 5% 2.33MB/s ± 0% +6.91% (p=0.000 n=30+23)
GoParse-4 1.34MB/s ± 5% 1.32MB/s ± 5% -1.66% (p=0.000 n=30+30)
RegexpMatchEasy0_32-4 30.2MB/s ± 0% 29.8MB/s ± 3% ~ (p=0.099 n=23+30)
RegexpMatchEasy0_1K-4 185MB/s ± 0% 186MB/s ± 0% +0.24% (p=0.000 n=25+25)
RegexpMatchEasy1_32-4 31.4MB/s ± 3% 31.8MB/s ± 0% +1.24% (p=0.000 n=30+24)
RegexpMatchEasy1_1K-4 141MB/s ± 0% 140MB/s ± 0% -0.94% (p=0.000 n=23+26)
RegexpMatchMedium_32-4 541kB/s ± 6% 560kB/s ± 0% +3.45% (p=0.000 n=30+23)
RegexpMatchMedium_1K-4 1.85MB/s ± 0% 1.87MB/s ± 0% +1.08% (p=0.000 n=24+23)
RegexpMatchHard_32-4 1.04MB/s ± 1% 1.06MB/s ± 1% +1.48% (p=0.000 n=24+24)
RegexpMatchHard_1K-4 1.10MB/s ± 0% 1.10MB/s ± 5% +0.15% (p=0.004 n=23+30)
Revcomp-4 31.3MB/s ± 4% 40.2MB/s ± 1% +28.52% (p=0.000 n=30+23)
Template-4 2.16MB/s ± 6% 2.27MB/s ± 1% +5.18% (p=0.000 n=30+24)
[Geo mean] 7.57MB/s 7.79MB/s +2.98%
fixes #24907
Change-Id: I94afd0e3f53d62a1cf5e452f3dd6daf61be21785
Reviewed-on: https://go-review.googlesource.com/107376
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-04-19 15:08:10 +00:00
Balaram Makam
d7c7d88b2c
cmd/compile: intrinsify math/big.mulWW on ARM64
...
Performance numbers on amberwing:
pkg: math/big
name old time/op new time/op delta
QuoRem 3.08µs ± 0% 2.93µs ± 1% -4.89% (p=0.008 n=5+5)
ModSqrt225_Tonelli 721µs ± 0% 718µs ± 0% -0.46% (p=0.008 n=5+5)
ModSqrt224_3Mod4 218µs ± 0% 217µs ± 0% -0.27% (p=0.008 n=5+5)
ModSqrt5430_Tonelli 2.91s ± 0% 2.91s ± 0% ~ (p=0.222 n=5+5)
ModSqrt5430_3Mod4 970ms ± 0% 970ms ± 0% ~ (p=0.151 n=5+5)
Sqrt 45.9µs ± 0% 43.8µs ± 0% -4.63% (p=0.008 n=5+5)
IntSqr/1 19.9ns ± 0% 17.3ns ± 0% -13.07% (p=0.008 n=5+5)
IntSqr/2 52.6ns ± 0% 50.8ns ± 0% -3.35% (p=0.008 n=5+5)
IntSqr/3 70.4ns ± 0% 69.4ns ± 0% ~ (p=0.079 n=4+5)
IntSqr/5 103ns ± 0% 99ns ± 0% -3.98% (p=0.008 n=5+5)
IntSqr/8 179ns ± 0% 178ns ± 0% -0.56% (p=0.008 n=5+5)
IntSqr/10 272ns ± 0% 272ns ± 0% ~ (all equal)
IntSqr/20 763ns ± 0% 787ns ± 0% +3.15% (p=0.016 n=5+4)
IntSqr/30 1.25µs ± 1% 1.29µs ± 1% +3.27% (p=0.008 n=5+5)
IntSqr/50 2.64µs ± 0% 2.71µs ± 0% +2.61% (p=0.008 n=5+5)
IntSqr/80 5.67µs ± 0% 5.72µs ± 0% +0.88% (p=0.008 n=5+5)
IntSqr/100 8.05µs ± 0% 8.09µs ± 0% +0.45% (p=0.008 n=5+5)
IntSqr/200 28.0µs ± 0% 28.1µs ± 0% ~ (p=0.151 n=5+5)
IntSqr/300 59.4µs ± 0% 59.6µs ± 0% +0.36% (p=0.008 n=5+5)
IntSqr/500 141µs ± 0% 141µs ± 0% +0.08% (p=0.008 n=5+5)
IntSqr/800 280µs ± 0% 280µs ± 0% -0.12% (p=0.008 n=5+5)
IntSqr/1000 429µs ± 0% 428µs ± 0% -0.27% (p=0.008 n=5+5)
pkg: crypto-ecdsa
name old time/op new time/op delta
SignP384 7.85ms ± 1% 7.61ms ± 1% -3.12% (p=0.008 n=5+5)
Change-Id: I1ab30856cc0e570f6312f0bd8914779b55adbc16
Reviewed-on: https://go-review.googlesource.com/104135
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-04 18:37:24 +00:00
Geoff Berry
e244a7a7d3
cmd/compile/internal/ssa: add patterns for arm64 bitfield opcodes
...
Add patterns to match common idioms for EXTR, BFI, BFXIL, SBFIZ, SBFX,
UBFIZ and UBFX opcodes.
go1 benchmarks results on Amberwing:
name old time/op new time/op delta
FmtManyArgs 786ns ± 2% 714ns ± 1% -9.20% (p=0.000 n=10+10)
Gzip 437ms ± 0% 402ms ± 0% -7.99% (p=0.000 n=10+10)
FmtFprintfIntInt 196ns ± 0% 182ns ± 0% -7.28% (p=0.000 n=10+9)
FmtFprintfPrefixedInt 207ns ± 0% 199ns ± 0% -3.86% (p=0.000 n=10+10)
FmtFprintfFloat 324ns ± 0% 316ns ± 0% -2.47% (p=0.000 n=10+8)
FmtFprintfInt 119ns ± 0% 117ns ± 0% -1.68% (p=0.000 n=10+9)
GobDecode 12.8ms ± 2% 12.6ms ± 1% -1.62% (p=0.002 n=10+10)
JSONDecode 94.4ms ± 1% 93.4ms ± 0% -1.10% (p=0.000 n=10+10)
RegexpMatchEasy0_32 247ns ± 0% 245ns ± 0% -0.65% (p=0.000 n=10+10)
RegexpMatchMedium_32 314ns ± 0% 312ns ± 0% -0.64% (p=0.000 n=10+10)
RegexpMatchEasy0_1K 541ns ± 0% 538ns ± 0% -0.55% (p=0.000 n=10+9)
TimeParse 450ns ± 1% 448ns ± 1% -0.42% (p=0.035 n=9+9)
RegexpMatchEasy1_32 244ns ± 0% 243ns ± 0% -0.41% (p=0.000 n=10+10)
GoParse 6.03ms ± 0% 6.00ms ± 0% -0.40% (p=0.002 n=10+10)
RegexpMatchEasy1_1K 779ns ± 0% 777ns ± 0% -0.26% (p=0.000 n=10+10)
RegexpMatchHard_32 2.75µs ± 0% 2.74µs ± 1% -0.06% (p=0.026 n=9+9)
BinaryTree17 11.7s ± 0% 11.6s ± 0% ~ (p=0.089 n=10+10)
HTTPClientServer 89.1µs ± 1% 89.5µs ± 2% ~ (p=0.436 n=10+10)
RegexpMatchHard_1K 78.9µs ± 0% 79.5µs ± 2% ~ (p=0.469 n=10+10)
FmtFprintfEmpty 58.5ns ± 0% 58.5ns ± 0% ~ (all equal)
GobEncode 12.0ms ± 1% 12.1ms ± 0% ~ (p=0.075 n=10+10)
Revcomp 669ms ± 0% 668ms ± 0% ~ (p=0.091 n=7+9)
Mandelbrot200 5.35ms ± 0% 5.36ms ± 0% +0.07% (p=0.000 n=9+9)
RegexpMatchMedium_1K 52.1µs ± 0% 52.1µs ± 0% +0.10% (p=0.000 n=9+9)
Fannkuch11 3.25s ± 0% 3.26s ± 0% +0.36% (p=0.000 n=9+10)
FmtFprintfString 114ns ± 1% 115ns ± 0% +0.52% (p=0.011 n=10+10)
JSONEncode 20.2ms ± 0% 20.3ms ± 0% +0.65% (p=0.000 n=10+10)
Template 91.3ms ± 0% 92.3ms ± 0% +1.08% (p=0.000 n=10+10)
TimeFormat 484ns ± 0% 495ns ± 1% +2.30% (p=0.000 n=9+10)
There are some opportunities to improve this change further by adding
patterns to match the "extended register" versions of ADD/SUB/CMP, but I
think that should be evaluated on its own. The regressions in Template
and TimeFormat would likely be recovered by this, as they seem to be due
to generating:
ubfiz x0, x0, #3 , #8
add x1, x2, x0
instead of
add x1, x2, x0, lsl #3
Change-Id: I5644a8d70ac7a98e784a377a2b76ab47a3415a4b
Reviewed-on: https://go-review.googlesource.com/88355
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-15 14:10:41 +00:00
Meng Zhuo
8916773a3d
runtime, cmd/compile: use ldp for DUFFCOPY on ARM64
...
name old time/op new time/op delta
CopyFat8 2.15ns ± 1% 2.19ns ± 6% ~ (p=0.171 n=8+9)
CopyFat12 2.15ns ± 0% 2.17ns ± 2% ~ (p=0.137 n=8+10)
CopyFat16 2.17ns ± 3% 2.15ns ± 0% ~ (p=0.211 n=10+10)
CopyFat24 2.16ns ± 1% 2.15ns ± 0% ~ (p=0.087 n=10+10)
CopyFat32 11.5ns ± 0% 12.8ns ± 2% +10.87% (p=0.000 n=8+10)
CopyFat64 20.2ns ± 2% 12.9ns ± 0% -36.11% (p=0.000 n=10+10)
CopyFat128 37.2ns ± 0% 21.5ns ± 0% -42.20% (p=0.000 n=10+10)
CopyFat256 71.6ns ± 0% 38.7ns ± 0% -45.95% (p=0.000 n=10+10)
CopyFat512 140ns ± 0% 73ns ± 0% -47.86% (p=0.000 n=10+9)
CopyFat520 142ns ± 0% 74ns ± 0% -47.54% (p=0.000 n=10+10)
CopyFat1024 277ns ± 0% 141ns ± 0% -49.10% (p=0.000 n=10+10)
Change-Id: If54bc571add5db674d5e081579c87e80153d0a5a
Reviewed-on: https://go-review.googlesource.com/97395
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-06 04:14:59 +00:00
Heschi Kreinick
caa1b4afbd
cmd/compile/internal/ssa: note zero-width Ops
...
Add a bool to opInfo to indicate if an Op never results in any
instructions. This is a conservative approximation: some operations,
like Copy, may or may not generate code depending on their arguments.
I built the list by reading each arch's ssaGenValue function. Hopefully
I got them all.
Change-Id: I130b251b65f18208294e129bb7ddc3f91d57d31d
Reviewed-on: https://go-review.googlesource.com/97957
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-02 18:55:45 +00:00
Ben Shi
1057624985
cmd/compile: optimize ARM64 code with EON/ORN
...
EON and ORN are efficient ARM64 instructions. EON combines (x ^ ^y)
into a single operation, and so ORN does for (x | ^y).
This CL implements that optimization. And here are benchmark results
with RaspberryPi3/ArchLinux.
1. A specific test gets about 13% improvement.
EONORN 181µs ± 0% 157µs ± 0% -13.26% (p=0.000 n=26+23)
(https://github.com/benshi001/ugo1/blob/master/eonorn_test.go )
2. There is little change in the go1 benchmark, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 44.1s ± 2% 44.0s ± 2% ~ (p=0.513 n=30+30)
Fannkuch11-4 32.9s ± 3% 32.8s ± 3% -0.12% (p=0.024 n=30+30)
FmtFprintfEmpty-4 561ns ± 9% 558ns ± 9% ~ (p=0.654 n=30+30)
FmtFprintfString-4 1.09µs ± 4% 1.09µs ± 3% ~ (p=0.158 n=30+30)
FmtFprintfInt-4 1.12µs ± 0% 1.12µs ± 0% ~ (p=0.917 n=23+28)
FmtFprintfIntInt-4 1.73µs ± 0% 1.76µs ± 4% ~ (p=0.665 n=23+30)
FmtFprintfPrefixedInt-4 2.15µs ± 1% 2.15µs ± 0% ~ (p=0.389 n=27+26)
FmtFprintfFloat-4 3.18µs ± 4% 3.13µs ± 0% -1.50% (p=0.003 n=30+23)
FmtManyArgs-4 7.32µs ± 4% 7.21µs ± 0% ~ (p=0.220 n=30+25)
GobDecode-4 99.1ms ± 9% 97.0ms ± 0% -2.07% (p=0.000 n=30+23)
GobEncode-4 83.3ms ± 3% 82.4ms ± 4% ~ (p=0.321 n=30+30)
Gzip-4 4.39s ± 4% 4.32s ± 2% -1.42% (p=0.017 n=30+23)
Gunzip-4 440ms ± 0% 447ms ± 4% +1.54% (p=0.006 n=24+30)
HTTPClientServer-4 547µs ± 1% 537µs ± 1% -1.91% (p=0.000 n=30+30)
JSONEncode-4 211ms ± 0% 211ms ± 0% +0.04% (p=0.000 n=23+24)
JSONDecode-4 847ms ± 0% 847ms ± 0% ~ (p=0.158 n=25+25)
Mandelbrot200-4 46.5ms ± 0% 46.5ms ± 0% -0.04% (p=0.000 n=25+24)
GoParse-4 43.4ms ± 0% 43.4ms ± 0% ~ (p=0.494 n=24+25)
RegexpMatchEasy0_32-4 1.03µs ± 0% 1.03µs ± 0% ~ (all equal)
RegexpMatchEasy0_1K-4 4.02µs ± 3% 3.98µs ± 0% -0.95% (p=0.003 n=30+24)
RegexpMatchEasy1_32-4 1.01µs ± 3% 1.01µs ± 2% ~ (p=0.629 n=30+30)
RegexpMatchEasy1_1K-4 6.39µs ± 0% 6.39µs ± 0% ~ (p=0.564 n=24+23)
RegexpMatchMedium_32-4 1.80µs ± 3% 1.78µs ± 0% ~ (p=0.155 n=30+24)
RegexpMatchMedium_1K-4 555µs ± 0% 563µs ± 3% +1.55% (p=0.004 n=27+30)
RegexpMatchHard_32-4 31.0µs ± 4% 30.5µs ± 1% -1.58% (p=0.000 n=30+23)
RegexpMatchHard_1K-4 947µs ± 4% 931µs ± 0% -1.66% (p=0.009 n=30+24)
Revcomp-4 7.71s ± 4% 7.71s ± 4% ~ (p=0.196 n=29+30)
Template-4 877ms ± 0% 878ms ± 0% +0.16% (p=0.018 n=23+27)
TimeParse-4 4.75µs ± 1% 4.74µs ± 0% ~ (p=0.895 n=24+23)
TimeFormat-4 4.83µs ± 4% 4.83µs ± 4% ~ (p=0.767 n=30+30)
[Geo mean] 709µs 707µs -0.35%
name old speed new speed delta
GobDecode-4 7.75MB/s ± 8% 7.91MB/s ± 0% +2.03% (p=0.001 n=30+23)
GobEncode-4 9.22MB/s ± 3% 9.32MB/s ± 4% ~ (p=0.389 n=30+30)
Gzip-4 4.43MB/s ± 4% 4.43MB/s ± 4% ~ (p=0.888 n=30+30)
Gunzip-4 44.1MB/s ± 0% 43.4MB/s ± 4% -1.46% (p=0.009 n=24+30)
JSONEncode-4 9.18MB/s ± 0% 9.18MB/s ± 0% ~ (p=0.308 n=16+24)
JSONDecode-4 2.29MB/s ± 0% 2.29MB/s ± 0% ~ (all equal)
GoParse-4 1.33MB/s ± 0% 1.33MB/s ± 0% ~ (all equal)
RegexpMatchEasy0_32-4 30.9MB/s ± 0% 30.9MB/s ± 0% ~ (p=1.000 n=23+24)
RegexpMatchEasy0_1K-4 255MB/s ± 3% 257MB/s ± 0% +0.92% (p=0.004 n=30+24)
RegexpMatchEasy1_32-4 31.7MB/s ± 3% 31.6MB/s ± 2% ~ (p=0.603 n=30+30)
RegexpMatchEasy1_1K-4 160MB/s ± 0% 160MB/s ± 0% ~ (p=0.435 n=24+23)
RegexpMatchMedium_32-4 554kB/s ± 3% 560kB/s ± 0% +1.08% (p=0.004 n=30+24)
RegexpMatchMedium_1K-4 1.85MB/s ± 0% 1.82MB/s ± 3% -1.48% (p=0.001 n=27+30)
RegexpMatchHard_32-4 1.03MB/s ± 4% 1.05MB/s ± 1% +1.51% (p=0.027 n=30+23)
RegexpMatchHard_1K-4 1.08MB/s ± 4% 1.10MB/s ± 0% +1.69% (p=0.002 n=30+25)
Revcomp-4 33.0MB/s ± 4% 33.0MB/s ± 4% ~ (p=0.272 n=29+30)
Template-4 2.21MB/s ± 0% 2.21MB/s ± 0% ~ (all equal)
[Geo mean] 7.75MB/s 7.77MB/s +0.29%
3. There is little regression in the compilecmp benchmark.
name old time/op new time/op delta
Template 2.28s ± 3% 2.28s ± 4% ~ (p=0.739 n=10+10)
Unicode 1.34s ± 4% 1.32s ± 3% ~ (p=0.113 n=10+9)
GoTypes 8.10s ± 3% 8.18s ± 3% ~ (p=0.393 n=10+10)
Compiler 39.0s ± 3% 39.2s ± 3% ~ (p=0.393 n=10+10)
SSA 114s ± 3% 115s ± 2% ~ (p=0.631 n=10+10)
Flate 1.41s ± 2% 1.42s ± 3% ~ (p=0.353 n=10+10)
GoParser 1.81s ± 1% 1.83s ± 2% ~ (p=0.211 n=10+9)
Reflect 5.06s ± 2% 5.06s ± 2% ~ (p=0.912 n=10+10)
Tar 2.19s ± 3% 2.20s ± 3% ~ (p=0.247 n=10+10)
XML 2.65s ± 2% 2.67s ± 5% ~ (p=0.796 n=10+10)
[Geo mean] 4.92s 4.93s +0.27%
name old user-time/op new user-time/op delta
Template 2.81s ± 2% 2.81s ± 3% ~ (p=0.971 n=10+10)
Unicode 1.70s ± 3% 1.67s ± 5% ~ (p=0.315 n=10+10)
GoTypes 9.71s ± 1% 9.78s ± 1% +0.71% (p=0.023 n=10+10)
Compiler 47.3s ± 1% 47.1s ± 3% ~ (p=0.579 n=10+10)
SSA 143s ± 2% 143s ± 2% ~ (p=0.280 n=10+10)
Flate 1.70s ± 3% 1.71s ± 3% ~ (p=0.481 n=10+10)
GoParser 2.21s ± 3% 2.21s ± 1% ~ (p=0.549 n=10+9)
Reflect 5.89s ± 1% 5.87s ± 2% ~ (p=0.739 n=10+10)
Tar 2.66s ± 2% 2.63s ± 2% ~ (p=0.105 n=10+10)
XML 3.16s ± 3% 3.18s ± 2% ~ (p=0.143 n=10+10)
[Geo mean] 5.97s 5.97s -0.06%
name old text-bytes new text-bytes delta
HelloSize 637kB ± 0% 637kB ± 0% ~ (all equal)
name old data-bytes new data-bytes delta
HelloSize 9.46kB ± 0% 9.46kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 125kB ± 0% 125kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.24MB ± 0% 1.24MB ± 0% ~ (all equal)
Change-Id: Ie27357d65c5ce9d07afdffebe1e2daadcaa3369f
Reviewed-on: https://go-review.googlesource.com/97036
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-28 23:42:40 +00:00
Ben Shi
7113d3a512
cmd/compile: fix FP accuracy issue introduced by FMA optimization on ARM64
...
Two ARM64 rules are added to avoid FP accuracy issue, which causes
build failure.
https://build.golang.org/log/1360f5c9ef3f37968216350283c1013e9681725d
fixes #24033
Change-Id: I9b74b584ab5cc53fa49476de275dc549adf97610
Reviewed-on: https://go-review.googlesource.com/96355
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-22 15:28:08 +00:00
Ben Shi
f4c3072cf5
cmd/compile: improve FP performance on ARM64
...
FMADD/FMSUB/FNMADD/FNMSUB are efficient FP instructions, which can
be used by the comiler to improve FP performance. This CL implements
this optimization.
1. The compilecmp benchmark shows little change.
name old time/op new time/op delta
Template 2.35s ± 4% 2.38s ± 4% ~ (p=0.161 n=15+15)
Unicode 1.36s ± 5% 1.36s ± 4% ~ (p=0.685 n=14+13)
GoTypes 8.11s ± 3% 8.13s ± 2% ~ (p=0.624 n=15+15)
Compiler 40.5s ± 2% 40.7s ± 2% ~ (p=0.137 n=15+15)
SSA 115s ± 3% 116s ± 1% ~ (p=0.270 n=15+14)
Flate 1.46s ± 4% 1.45s ± 5% ~ (p=0.870 n=15+15)
GoParser 1.85s ± 2% 1.87s ± 3% ~ (p=0.477 n=14+15)
Reflect 5.11s ± 4% 5.10s ± 2% ~ (p=0.624 n=15+15)
Tar 2.23s ± 3% 2.23s ± 5% ~ (p=0.624 n=15+15)
XML 2.72s ± 5% 2.74s ± 3% ~ (p=0.290 n=15+14)
[Geo mean] 5.02s 5.03s +0.29%
name old user-time/op new user-time/op delta
Template 2.90s ± 2% 2.90s ± 3% ~ (p=0.780 n=14+15)
Unicode 1.71s ± 5% 1.70s ± 3% ~ (p=0.458 n=14+13)
GoTypes 9.77s ± 2% 9.76s ± 2% ~ (p=0.838 n=15+15)
Compiler 49.1s ± 2% 49.1s ± 2% ~ (p=0.902 n=15+15)
SSA 144s ± 1% 144s ± 2% ~ (p=0.567 n=15+15)
Flate 1.75s ± 5% 1.74s ± 3% ~ (p=0.461 n=15+15)
GoParser 2.22s ± 2% 2.21s ± 3% ~ (p=0.233 n=15+15)
Reflect 5.99s ± 2% 5.95s ± 1% ~ (p=0.093 n=14+15)
Tar 2.68s ± 2% 2.67s ± 3% ~ (p=0.310 n=14+15)
XML 3.22s ± 2% 3.24s ± 3% ~ (p=0.512 n=15+15)
[Geo mean] 6.08s 6.07s -0.19%
name old text-bytes new text-bytes delta
HelloSize 641kB ± 0% 641kB ± 0% ~ (all equal)
name old data-bytes new data-bytes delta
HelloSize 9.46kB ± 0% 9.46kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 125kB ± 0% 125kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.24MB ± 0% 1.24MB ± 0% ~ (all equal)
2. The go1 benchmark shows little improvement in total (excluding noise),
but some improvement in test case Mandelbrot200 and FmtFprintfFloat.
name old time/op new time/op delta
BinaryTree17-4 42.1s ± 2% 42.0s ± 2% ~ (p=0.453 n=30+28)
Fannkuch11-4 33.5s ± 3% 33.3s ± 3% -0.38% (p=0.045 n=30+30)
FmtFprintfEmpty-4 534ns ± 0% 534ns ± 0% ~ (all equal)
FmtFprintfString-4 1.09µs ± 0% 1.09µs ± 0% -0.27% (p=0.000 n=23+17)
FmtFprintfInt-4 1.16µs ± 3% 1.16µs ± 3% ~ (p=0.714 n=30+30)
FmtFprintfIntInt-4 1.76µs ± 1% 1.77µs ± 0% +0.15% (p=0.002 n=23+23)
FmtFprintfPrefixedInt-4 2.21µs ± 3% 2.20µs ± 3% ~ (p=0.390 n=30+30)
FmtFprintfFloat-4 3.28µs ± 0% 3.11µs ± 0% -5.01% (p=0.000 n=25+26)
FmtManyArgs-4 7.18µs ± 0% 7.19µs ± 0% +0.13% (p=0.000 n=24+25)
GobDecode-4 94.9ms ± 0% 95.6ms ± 5% +0.83% (p=0.002 n=23+29)
GobEncode-4 80.7ms ± 4% 79.8ms ± 0% -1.11% (p=0.003 n=30+24)
Gzip-4 4.58s ± 4% 4.59s ± 3% +0.26% (p=0.002 n=30+26)
Gunzip-4 449ms ± 4% 443ms ± 0% ~ (p=0.096 n=30+26)
HTTPClientServer-4 553µs ± 1% 548µs ± 1% -0.96% (p=0.000 n=30+30)
JSONEncode-4 215ms ± 4% 214ms ± 4% -0.29% (p=0.000 n=30+30)
JSONDecode-4 868ms ± 4% 875ms ± 5% +0.79% (p=0.008 n=30+30)
Mandelbrot200-4 51.4ms ± 0% 46.7ms ± 3% -9.09% (p=0.000 n=25+26)
GoParse-4 42.1ms ± 0% 41.8ms ± 0% -0.61% (p=0.000 n=25+24)
RegexpMatchEasy0_32-4 1.02µs ± 4% 1.02µs ± 4% -0.17% (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4 3.90µs ± 0% 3.95µs ± 4% ~ (p=0.516 n=23+30)
RegexpMatchEasy1_32-4 970ns ± 3% 973ns ± 3% ~ (p=0.951 n=30+30)
RegexpMatchEasy1_1K-4 6.43µs ± 3% 6.33µs ± 0% -1.62% (p=0.000 n=30+25)
RegexpMatchMedium_32-4 1.75µs ± 0% 1.75µs ± 0% ~ (p=0.422 n=25+24)
RegexpMatchMedium_1K-4 568µs ± 3% 562µs ± 0% ~ (p=0.079 n=30+24)
RegexpMatchHard_32-4 30.8µs ± 0% 31.2µs ± 4% +1.46% (p=0.018 n=23+30)
RegexpMatchHard_1K-4 932µs ± 0% 946µs ± 3% +1.49% (p=0.000 n=24+30)
Revcomp-4 7.69s ± 3% 7.69s ± 2% +0.04% (p=0.032 n=24+25)
Template-4 893ms ± 5% 880ms ± 6% -1.53% (p=0.000 n=30+30)
TimeParse-4 4.90µs ± 3% 4.84µs ± 0% ~ (p=0.080 n=30+25)
TimeFormat-4 4.70µs ± 1% 4.76µs ± 0% +1.21% (p=0.000 n=23+26)
[Geo mean] 710µs 706µs -0.63%
name old speed new speed delta
GobDecode-4 8.09MB/s ± 0% 8.03MB/s ± 5% -0.77% (p=0.002 n=23+29)
GobEncode-4 9.52MB/s ± 4% 9.62MB/s ± 0% +1.07% (p=0.003 n=30+24)
Gzip-4 4.24MB/s ± 4% 4.23MB/s ± 3% -0.35% (p=0.002 n=30+26)
Gunzip-4 43.2MB/s ± 4% 43.8MB/s ± 0% ~ (p=0.123 n=30+26)
JSONEncode-4 9.03MB/s ± 4% 9.06MB/s ± 4% +0.28% (p=0.000 n=30+30)
JSONDecode-4 2.24MB/s ± 4% 2.22MB/s ± 5% -0.79% (p=0.008 n=30+30)
GoParse-4 1.38MB/s ± 1% 1.38MB/s ± 0% ~ (p=0.401 n=25+17)
RegexpMatchEasy0_32-4 31.4MB/s ± 4% 31.5MB/s ± 3% +0.16% (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4 262MB/s ± 0% 259MB/s ± 4% ~ (p=0.693 n=23+30)
RegexpMatchEasy1_32-4 33.0MB/s ± 3% 32.9MB/s ± 3% ~ (p=0.139 n=30+30)
RegexpMatchEasy1_1K-4 159MB/s ± 3% 162MB/s ± 0% +1.60% (p=0.000 n=30+25)
RegexpMatchMedium_32-4 570kB/s ± 0% 570kB/s ± 0% ~ (all equal)
RegexpMatchMedium_1K-4 1.80MB/s ± 3% 1.82MB/s ± 0% +1.09% (p=0.007 n=30+24)
RegexpMatchHard_32-4 1.04MB/s ± 0% 1.03MB/s ± 3% -1.38% (p=0.003 n=23+30)
RegexpMatchHard_1K-4 1.10MB/s ± 0% 1.08MB/s ± 3% -1.52% (p=0.000 n=24+30)
Revcomp-4 33.0MB/s ± 3% 33.0MB/s ± 2% ~ (p=0.128 n=24+25)
Template-4 2.17MB/s ± 5% 2.21MB/s ± 6% +1.61% (p=0.000 n=30+30)
[Geo mean] 7.79MB/s 7.79MB/s +0.05%
Change-Id: Ied3dbdb5ba8e386168629cba06fcd4263bbb83e1
Reviewed-on: https://go-review.googlesource.com/94901
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-22 04:10:07 +00:00
Ben Shi
3c8b824453
cmd/compile: optimize ARM64 code with MNEG
...
A pair of MUL/NEG instructions can be combined to a single MNEG on ARM64.
This CL implements this optimization.
1. A special test case gets big improvement.
(https://github.com/benshi001/ugo1/blob/master/mneg_test.go )
name old time/op new time/op delta
MNEG-4 315µs ± 0% 260µs ± 0% -17.39% (p=0.000 n=24+25)
2. There is little change in the go1 benchmark, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 42.2s ± 2% 41.9s ± 2% -0.82% (p=0.001 n=30+26)
Fannkuch11-4 32.9s ± 0% 32.9s ± 0% -0.01% (p=0.006 n=20+26)
FmtFprintfEmpty-4 541ns ± 3% 534ns ± 0% -1.24% (p=0.003 n=30+26)
FmtFprintfString-4 1.09µs ± 0% 1.10µs ± 3% ~ (p=0.142 n=23+30)
FmtFprintfInt-4 1.14µs ± 0% 1.14µs ± 0% ~ (p=0.435 n=24+24)
FmtFprintfIntInt-4 1.76µs ± 0% 1.76µs ± 0% ~ (p=0.508 n=24+26)
FmtFprintfPrefixedInt-4 2.20µs ± 3% 2.17µs ± 0% -1.10% (p=0.017 n=30+24)
FmtFprintfFloat-4 3.28µs ± 0% 3.28µs ± 0% ~ (p=0.579 n=24+24)
FmtManyArgs-4 7.30µs ± 0% 7.30µs ± 0% ~ (p=0.662 n=26+27)
GobDecode-4 94.8ms ± 0% 94.8ms ± 0% +0.07% (p=0.010 n=25+23)
GobEncode-4 80.9ms ± 4% 80.6ms ± 4% ~ (p=0.901 n=30+30)
Gzip-4 4.45s ± 0% 4.49s ± 0% +0.98% (p=0.000 n=25+24)
Gunzip-4 450ms ± 3% 443ms ± 0% ~ (p=0.942 n=30+26)
HTTPClientServer-4 548µs ± 1% 551µs ± 1% +0.60% (p=0.000 n=29+30)
JSONEncode-4 210ms ± 0% 211ms ± 0% +0.03% (p=0.000 n=23+25)
JSONDecode-4 866ms ± 5% 877ms ± 5% ~ (p=0.187 n=30+30)
Mandelbrot200-4 51.4ms ± 0% 52.0ms ± 3% +1.15% (p=0.001 n=24+30)
GoParse-4 42.9ms ± 5% 41.9ms ± 0% -2.24% (p=0.000 n=30+26)
RegexpMatchEasy0_32-4 1.02µs ± 3% 1.01µs ± 0% ~ (p=0.247 n=30+26)
RegexpMatchEasy0_1K-4 3.90µs ± 0% 3.90µs ± 0% ~ (p=0.062 n=24+24)
RegexpMatchEasy1_32-4 955ns ± 0% 956ns ± 0% +0.16% (p=0.000 n=25+23)
RegexpMatchEasy1_1K-4 6.42µs ± 3% 6.37µs ± 0% -0.81% (p=0.012 n=30+24)
RegexpMatchMedium_32-4 1.77µs ± 3% 1.79µs ± 0% +1.28% (p=0.003 n=30+24)
RegexpMatchMedium_1K-4 561µs ± 0% 569µs ± 3% +1.50% (p=0.000 n=25+30)
RegexpMatchHard_32-4 31.0µs ± 4% 30.8µs ± 0% ~ (p=1.000 n=26+26)
RegexpMatchHard_1K-4 945µs ± 3% 945µs ± 3% ~ (p=0.513 n=30+30)
Revcomp-4 7.76s ± 4% 7.68s ± 0% ~ (p=0.464 n=29+23)
Template-4 903ms ± 5% 904ms ± 5% ~ (p=0.248 n=30+30)
TimeParse-4 4.80µs ± 0% 4.80µs ± 0% ~ (p=0.081 n=25+26)
TimeFormat-4 4.70µs ± 1% 4.70µs ± 1% ~ (p=0.763 n=24+26)
[Geo mean] 709µs 708µs -0.09%
name old speed new speed delta
GobDecode-4 8.10MB/s ± 0% 8.09MB/s ± 0% ~ (p=0.160 n=25+23)
GobEncode-4 9.49MB/s ± 4% 9.53MB/s ± 4% ~ (p=0.360 n=30+30)
Gzip-4 4.36MB/s ± 0% 4.32MB/s ± 0% -0.92% (p=0.000 n=25+24)
Gunzip-4 43.2MB/s ± 3% 43.8MB/s ± 0% ~ (p=0.980 n=30+26)
JSONEncode-4 9.22MB/s ± 0% 9.22MB/s ± 0% -0.04% (p=0.005 n=23+25)
JSONDecode-4 2.24MB/s ± 5% 2.21MB/s ± 4% ~ (p=0.252 n=30+30)
GoParse-4 1.35MB/s ± 5% 1.38MB/s ± 0% +2.00% (p=0.003 n=30+26)
RegexpMatchEasy0_32-4 31.5MB/s ± 3% 31.8MB/s ± 0% ~ (p=0.110 n=30+26)
RegexpMatchEasy0_1K-4 263MB/s ± 0% 263MB/s ± 0% ~ (p=0.111 n=24+24)
RegexpMatchEasy1_32-4 33.5MB/s ± 0% 33.4MB/s ± 0% -0.16% (p=0.003 n=25+23)
RegexpMatchEasy1_1K-4 160MB/s ± 3% 161MB/s ± 0% +0.78% (p=0.012 n=30+24)
RegexpMatchMedium_32-4 565kB/s ± 3% 560kB/s ± 0% -0.83% (p=0.001 n=30+24)
RegexpMatchMedium_1K-4 1.83MB/s ± 0% 1.80MB/s ± 3% -1.56% (p=0.000 n=25+30)
RegexpMatchHard_32-4 1.03MB/s ± 3% 1.04MB/s ± 0% +1.46% (p=0.000 n=30+26)
RegexpMatchHard_1K-4 1.08MB/s ± 3% 1.09MB/s ± 3% ~ (p=0.444 n=30+30)
Revcomp-4 32.8MB/s ± 4% 33.1MB/s ± 0% ~ (p=0.858 n=29+23)
Template-4 2.15MB/s ± 5% 2.15MB/s ± 5% ~ (p=0.646 n=30+30)
[Geo mean] 7.79MB/s 7.81MB/s +0.21%
3. There is no regression in the compilecmp benchmark.
name old time/op new time/op delta
Template 2.35s ± 4% 2.33s ± 3% ~ (p=0.796 n=10+10)
Unicode 1.35s ± 6% 1.35s ± 5% ~ (p=1.000 n=9+10)
GoTypes 8.10s ± 3% 8.14s ± 3% ~ (p=0.604 n=9+10)
Compiler 40.5s ± 2% 40.2s ± 2% ~ (p=0.065 n=10+9)
SSA 115s ± 2% 115s ± 2% ~ (p=0.447 n=9+10)
Flate 1.45s ± 3% 1.45s ± 4% ~ (p=0.739 n=10+10)
GoParser 1.85s ± 3% 1.86s ± 2% ~ (p=0.853 n=10+10)
Reflect 5.11s ± 2% 5.10s ± 2% ~ (p=0.971 n=10+10)
Tar 2.23s ± 5% 2.23s ± 3% ~ (p=0.796 n=10+10)
XML 2.67s ± 2% 2.69s ± 2% ~ (p=0.549 n=9+10)
[Geo mean] 5.00s 5.00s +0.02%
name old user-time/op new user-time/op delta
Template 2.88s ± 2% 2.86s ± 2% ~ (p=0.529 n=10+10)
Unicode 1.70s ± 7% 1.69s ± 5% ~ (p=0.853 n=10+10)
GoTypes 9.72s ± 1% 9.73s ± 1% ~ (p=0.684 n=10+10)
Compiler 49.0s ± 1% 48.9s ± 1% ~ (p=0.631 n=10+10)
SSA 144s ± 1% 144s ± 2% ~ (p=0.684 n=10+10)
Flate 1.71s ± 4% 1.72s ± 4% ~ (p=0.853 n=10+10)
GoParser 2.23s ± 2% 2.23s ± 2% ~ (p=0.971 n=10+10)
Reflect 5.98s ± 2% 5.96s ± 2% ~ (p=0.481 n=10+10)
Tar 2.68s ± 3% 2.67s ± 2% ~ (p=0.393 n=10+10)
XML 3.21s ± 3% 3.22s ± 1% ~ (p=0.604 n=10+9)
[Geo mean] 6.05s 6.05s -0.04%
name old text-bytes new text-bytes delta
HelloSize 641kB ± 0% 641kB ± 0% ~ (all equal)
name old data-bytes new data-bytes delta
HelloSize 9.46kB ± 0% 9.46kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 125kB ± 0% 125kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.24MB ± 0% 1.24MB ± 0% ~ (all equal)
Change-Id: I9ed9128f0114e0f1ebb08ca2d042c90fcb2b1dcd
Reviewed-on: https://go-review.googlesource.com/95075
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-20 15:23:23 +00:00
philhofer
2d0172c3a7
cmd/compile/internal/ssa: emit csel on arm64
...
Introduce a new SSA pass to generate CondSelect intstrutions,
and add CondSelect lowering rules for arm64.
In order to make the CSEL instruction easier to optimize,
and to simplify the introduction of CSNEG, CSINC, and CSINV
in the future, modify the CSEL instruction to accept a condition
code in the aux field.
Notably, this change makes the go1 Gzip benchmark
more than 10% faster.
Benchmarks on a Cavium ThunderX:
name old time/op new time/op delta
BinaryTree17-96 15.9s ± 6% 16.0s ± 4% ~ (p=0.968 n=10+9)
Fannkuch11-96 7.17s ± 0% 7.00s ± 0% -2.43% (p=0.000 n=8+9)
FmtFprintfEmpty-96 208ns ± 1% 207ns ± 0% ~ (p=0.152 n=10+8)
FmtFprintfString-96 379ns ± 0% 375ns ± 0% -0.95% (p=0.000 n=10+9)
FmtFprintfInt-96 385ns ± 0% 383ns ± 0% -0.52% (p=0.000 n=9+10)
FmtFprintfIntInt-96 591ns ± 0% 586ns ± 0% -0.85% (p=0.006 n=7+9)
FmtFprintfPrefixedInt-96 656ns ± 0% 667ns ± 0% +1.71% (p=0.000 n=10+10)
FmtFprintfFloat-96 967ns ± 0% 984ns ± 0% +1.78% (p=0.000 n=10+10)
FmtManyArgs-96 2.35µs ± 0% 2.25µs ± 0% -4.63% (p=0.000 n=9+8)
GobDecode-96 31.0ms ± 0% 30.8ms ± 0% -0.36% (p=0.006 n=9+9)
GobEncode-96 24.4ms ± 0% 24.5ms ± 0% +0.30% (p=0.000 n=9+9)
Gzip-96 1.60s ± 0% 1.43s ± 0% -10.58% (p=0.000 n=9+10)
Gunzip-96 167ms ± 0% 169ms ± 0% +0.83% (p=0.000 n=8+9)
HTTPClientServer-96 311µs ± 1% 308µs ± 0% -0.75% (p=0.000 n=10+10)
JSONEncode-96 65.0ms ± 0% 64.8ms ± 0% -0.25% (p=0.000 n=9+8)
JSONDecode-96 262ms ± 1% 261ms ± 1% ~ (p=0.579 n=10+10)
Mandelbrot200-96 18.0ms ± 0% 18.1ms ± 0% +0.17% (p=0.000 n=8+10)
GoParse-96 14.0ms ± 0% 14.1ms ± 1% +0.42% (p=0.003 n=9+10)
RegexpMatchEasy0_32-96 644ns ± 2% 645ns ± 2% ~ (p=0.836 n=10+10)
RegexpMatchEasy0_1K-96 3.70µs ± 0% 3.49µs ± 0% -5.58% (p=0.000 n=10+10)
RegexpMatchEasy1_32-96 662ns ± 2% 657ns ± 2% ~ (p=0.137 n=10+10)
RegexpMatchEasy1_1K-96 4.47µs ± 0% 4.31µs ± 0% -3.48% (p=0.000 n=10+10)
RegexpMatchMedium_32-96 844ns ± 2% 849ns ± 1% ~ (p=0.208 n=10+10)
RegexpMatchMedium_1K-96 179µs ± 0% 182µs ± 0% +1.20% (p=0.000 n=10+10)
RegexpMatchHard_32-96 10.0µs ± 0% 10.1µs ± 0% +0.48% (p=0.000 n=10+9)
RegexpMatchHard_1K-96 297µs ± 0% 297µs ± 0% -0.14% (p=0.000 n=10+10)
Revcomp-96 3.08s ± 0% 3.13s ± 0% +1.56% (p=0.000 n=9+9)
Template-96 276ms ± 2% 275ms ± 1% ~ (p=0.393 n=10+10)
TimeParse-96 1.37µs ± 0% 1.36µs ± 0% -0.53% (p=0.000 n=10+7)
TimeFormat-96 1.40µs ± 0% 1.42µs ± 0% +0.97% (p=0.000 n=10+10)
[Geo mean] 264µs 262µs -0.77%
Change-Id: Ie54eee4b3092af53e6da3baa6d1755098f57f3a2
Reviewed-on: https://go-review.googlesource.com/55670
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-02-20 06:00:54 +00:00
Chad Rosier
07f0f09563
cmd/compile: make math.Ceil/Floor/Round/Trunc intrinsics on arm64
...
name old time/op new time/op delta
Ceil 550ns ± 0% 486ns ± 7% -11.64% (p=0.000 n=13+18)
Floor 495ns ±19% 512ns ±12% ~ (p=0.164 n=20+20)
Round 550ns ± 0% 487ns ± 8% -11.49% (p=0.000 n=12+19)
Trunc 563ns ± 7% 488ns ±13% -13.44% (p=0.000 n=15+2)
Change-Id: I53f234b160b3c026a277506e2cf977d150379464
Reviewed-on: https://go-review.googlesource.com/88295
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-16 15:37:57 +00:00
Balaram Makam
fcba05148f
cmd/compile: arm64 intrinsics for math/bits.OnesCount
...
This adds math/bits intrinsics for OnesCount on arm64.
name old time/op new time/op delta
OnesCount 3.81ns ± 0% 1.60ns ± 0% -57.96% (p=0.000 n=7+8)
OnesCount8 1.60ns ± 0% 1.60ns ± 0% ~ (all equal)
OnesCount16 2.41ns ± 0% 1.60ns ± 0% -33.61% (p=0.000 n=8+8)
OnesCount32 4.17ns ± 0% 1.60ns ± 0% -61.58% (p=0.000 n=8+8)
OnesCount64 3.80ns ± 0% 1.60ns ± 0% -57.84% (p=0.000 n=8+8)
Update #18616
Conflicts:
src/cmd/compile/internal/gc/asm_test.go
Change-Id: I63ac2f63acafdb1f60656ab8a56be0b326eec5cb
Reviewed-on: https://go-review.googlesource.com/90835
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-15 23:00:20 +00:00
Ben Shi
ebb77aa867
cmd/compile/internal/ssa: optimize arm64 with FNMULS/FNMULD
...
FNMULS&FNMULD are efficient arm64 instructions, which can be used
to improve FP performance. This CL use them to optimize pairs of neg-mul
operations.
Here are benchmark test results on Raspberry Pi 3 with ArchLinux.
1. A special test case gets about 15% improvement.
(https://github.com/benshi001/ugo1/blob/master/fpmul_test.go )
FPMul-4 485µs ± 0% 410µs ± 0% -15.49% (p=0.000 n=26+23)
2. There is little regression in the go1 benchmark (excluding noise).
name old time/op new time/op delta
BinaryTree17-4 42.0s ± 3% 42.1s ± 2% ~ (p=0.542 n=39+40)
Fannkuch11-4 33.3s ± 3% 32.9s ± 1% ~ (p=0.200 n=40+32)
FmtFprintfEmpty-4 534ns ± 0% 534ns ± 0% ~ (all equal)
FmtFprintfString-4 1.09µs ± 1% 1.09µs ± 0% ~ (p=0.950 n=32+32)
FmtFprintfInt-4 1.14µs ± 0% 1.14µs ± 1% ~ (p=0.571 n=32+31)
FmtFprintfIntInt-4 1.79µs ± 3% 1.76µs ± 0% -1.42% (p=0.004 n=40+34)
FmtFprintfPrefixedInt-4 2.17µs ± 0% 2.17µs ± 0% ~ (p=0.073 n=31+34)
FmtFprintfFloat-4 3.33µs ± 3% 3.28µs ± 0% -1.46% (p=0.001 n=40+34)
FmtManyArgs-4 7.28µs ± 6% 7.19µs ± 0% ~ (p=0.641 n=40+33)
GobDecode-4 96.5ms ± 4% 96.5ms ± 9% ~ (p=0.214 n=40+40)
GobEncode-4 79.5ms ± 0% 80.7ms ± 4% +1.51% (p=0.000 n=34+40)
Gzip-4 4.53s ± 4% 4.56s ± 4% +0.60% (p=0.000 n=40+40)
Gunzip-4 451ms ± 3% 442ms ± 0% -1.93% (p=0.000 n=40+32)
HTTPClientServer-4 530µs ± 1% 535µs ± 1% +0.88% (p=0.000 n=39+39)
JSONEncode-4 214ms ± 4% 211ms ± 0% ~ (p=0.059 n=40+31)
JSONDecode-4 865ms ± 5% 864ms ± 4% -0.06% (p=0.003 n=40+40)
Mandelbrot200-4 52.0ms ± 3% 52.1ms ± 3% ~ (p=0.556 n=40+40)
GoParse-4 43.1ms ± 8% 42.1ms ± 0% ~ (p=0.083 n=40+33)
RegexpMatchEasy0_32-4 1.02µs ± 3% 1.02µs ± 4% +0.06% (p=0.020 n=40+40)
RegexpMatchEasy0_1K-4 3.90µs ± 0% 3.96µs ± 3% +1.58% (p=0.000 n=31+40)
RegexpMatchEasy1_32-4 967ns ± 4% 981ns ± 3% +1.40% (p=0.000 n=40+40)
RegexpMatchEasy1_1K-4 6.41µs ± 4% 6.43µs ± 3% ~ (p=0.386 n=40+40)
RegexpMatchMedium_32-4 1.76µs ± 3% 1.78µs ± 3% +1.08% (p=0.000 n=40+40)
RegexpMatchMedium_1K-4 561µs ± 0% 562µs ± 0% +0.09% (p=0.003 n=34+31)
RegexpMatchHard_32-4 31.5µs ± 2% 31.1µs ± 4% -1.17% (p=0.000 n=30+40)
RegexpMatchHard_1K-4 960µs ± 3% 950µs ± 4% -1.02% (p=0.016 n=40+40)
Revcomp-4 7.79s ± 7% 7.79s ± 4% ~ (p=0.859 n=40+40)
Template-4 889ms ± 6% 872ms ± 3% -1.86% (p=0.025 n=40+31)
TimeParse-4 4.80µs ± 0% 4.89µs ± 3% +1.71% (p=0.001 n=31+40)
TimeFormat-4 4.70µs ± 1% 4.78µs ± 3% +1.57% (p=0.000 n=33+40)
[Geo mean] 710µs 709µs -0.13%
name old speed new speed delta
GobDecode-4 7.96MB/s ± 4% 7.96MB/s ± 9% ~ (p=0.174 n=40+40)
GobEncode-4 9.65MB/s ± 0% 9.51MB/s ± 4% -1.45% (p=0.000 n=34+40)
Gzip-4 4.29MB/s ± 4% 4.26MB/s ± 4% -0.59% (p=0.000 n=40+40)
Gunzip-4 43.0MB/s ± 3% 43.9MB/s ± 0% +1.90% (p=0.000 n=40+32)
JSONEncode-4 9.09MB/s ± 4% 9.22MB/s ± 0% ~ (p=0.429 n=40+31)
JSONDecode-4 2.25MB/s ± 5% 2.25MB/s ± 4% ~ (p=0.278 n=40+40)
GoParse-4 1.35MB/s ± 7% 1.37MB/s ± 0% ~ (p=0.071 n=40+25)
RegexpMatchEasy0_32-4 31.5MB/s ± 3% 31.5MB/s ± 4% -0.08% (p=0.018 n=40+40)
RegexpMatchEasy0_1K-4 263MB/s ± 0% 259MB/s ± 3% -1.51% (p=0.000 n=31+40)
RegexpMatchEasy1_32-4 33.1MB/s ± 4% 32.6MB/s ± 3% -1.38% (p=0.000 n=40+40)
RegexpMatchEasy1_1K-4 160MB/s ± 4% 159MB/s ± 3% ~ (p=0.364 n=40+40)
RegexpMatchMedium_32-4 565kB/s ± 3% 562kB/s ± 2% ~ (p=0.208 n=40+40)
RegexpMatchMedium_1K-4 1.82MB/s ± 0% 1.82MB/s ± 0% -0.27% (p=0.000 n=34+31)
RegexpMatchHard_32-4 1.02MB/s ± 3% 1.03MB/s ± 4% +1.04% (p=0.000 n=32+40)
RegexpMatchHard_1K-4 1.07MB/s ± 4% 1.08MB/s ± 4% +0.94% (p=0.003 n=40+40)
Revcomp-4 32.6MB/s ± 7% 32.6MB/s ± 4% ~ (p=0.965 n=40+40)
Template-4 2.18MB/s ± 6% 2.22MB/s ± 3% +1.83% (p=0.020 n=40+31)
[Geo mean] 7.77MB/s 7.78MB/s +0.16%
3. There is little change in the compilecmp benchmark (excluding noise).
name old time/op new time/op delta
Template 2.37s ± 3% 2.35s ± 4% ~ (p=0.529 n=10+10)
Unicode 1.38s ± 8% 1.36s ± 5% ~ (p=0.247 n=10+10)
GoTypes 8.10s ± 2% 8.10s ± 2% ~ (p=0.971 n=10+10)
Compiler 40.5s ± 4% 40.8s ± 1% ~ (p=0.529 n=10+10)
SSA 115s ± 2% 115s ± 3% ~ (p=0.684 n=10+10)
Flate 1.45s ± 5% 1.46s ± 3% ~ (p=0.796 n=10+10)
GoParser 1.86s ± 4% 1.84s ± 2% ~ (p=0.095 n=9+10)
Reflect 5.11s ± 2% 5.13s ± 2% ~ (p=0.315 n=10+10)
Tar 2.22s ± 3% 2.23s ± 1% ~ (p=0.299 n=9+7)
XML 2.72s ± 3% 2.72s ± 3% ~ (p=0.912 n=10+10)
[Geo mean] 5.03s 5.02s -0.21%
name old user-time/op new user-time/op delta
Template 2.92s ± 2% 2.89s ± 1% ~ (p=0.247 n=10+10)
Unicode 1.71s ± 5% 1.69s ± 4% ~ (p=0.393 n=10+10)
GoTypes 9.78s ± 2% 9.76s ± 2% ~ (p=0.631 n=10+10)
Compiler 49.1s ± 2% 49.1s ± 1% ~ (p=0.796 n=10+10)
SSA 144s ± 1% 144s ± 2% ~ (p=0.796 n=10+10)
Flate 1.74s ± 2% 1.73s ± 3% ~ (p=0.842 n=10+9)
GoParser 2.23s ± 3% 2.25s ± 2% ~ (p=0.143 n=10+10)
Reflect 5.93s ± 3% 5.98s ± 2% ~ (p=0.211 n=10+9)
Tar 2.65s ± 2% 2.69s ± 3% +1.51% (p=0.010 n=9+10)
XML 3.25s ± 2% 3.21s ± 1% -1.24% (p=0.035 n=10+9)
[Geo mean] 6.07s 6.07s -0.08%
name old text-bytes new text-bytes delta
HelloSize 641kB ± 0% 641kB ± 0% ~ (all equal)
name old data-bytes new data-bytes delta
HelloSize 9.46kB ± 0% 9.46kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 125kB ± 0% 125kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.24MB ± 0% 1.24MB ± 0% ~ (all equal)
Change-Id: Id095d998c380eef929755124084df02446a6b7c1
Reviewed-on: https://go-review.googlesource.com/92555
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-14 15:22:05 +00:00
Austin Clements
79594ee95a
runtime: buffered write barrier for arm64
...
Updates #22460 .
Change-Id: I5f8fbece9545840f5fc4c9834e2050b0920776f0
Reviewed-on: https://go-review.googlesource.com/92699
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-13 16:34:19 +00:00
Cherry Zhang
6f3e5e637c
cmd/compile: intrinsify runtime.getcallersp
...
Add a compiler intrinsic for getcallersp. So we are able to get
rid of the argument (not done in this CL).
Change-Id: Ic38fda1c694f918328659ab44654198fb116668d
Reviewed-on: https://go-review.googlesource.com/69350
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
2017-10-10 15:15:21 +00:00
Wei Xiao
c02fc1605a
cmd/compile: memory clearing optimization for arm64
...
Use "STP (ZR, ZR), O(R)" instead of "MOVD ZR, O(R)" to implement memory clearing.
Also improve assembler supports to STP/LDP.
Results (A57@2GHzx8):
benchmark old ns/op new ns/op delta
BenchmarkClearFat8-8 1.00 1.00 +0.00%
BenchmarkClearFat12-8 1.01 1.01 +0.00%
BenchmarkClearFat16-8 1.01 1.01 +0.00%
BenchmarkClearFat24-8 1.52 1.52 +0.00%
BenchmarkClearFat32-8 3.00 2.02 -32.67%
BenchmarkClearFat40-8 3.50 2.52 -28.00%
BenchmarkClearFat48-8 3.50 3.03 -13.43%
BenchmarkClearFat56-8 4.00 3.50 -12.50%
BenchmarkClearFat64-8 4.25 4.00 -5.88%
BenchmarkClearFat128-8 8.01 8.01 +0.00%
BenchmarkClearFat256-8 16.1 16.0 -0.62%
BenchmarkClearFat512-8 32.1 32.0 -0.31%
BenchmarkClearFat1024-8 64.1 64.1 +0.00%
Change-Id: Ie5f5eac271ff685884775005825f206167a5c146
Reviewed-on: https://go-review.googlesource.com/55610
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-08-25 20:09:06 +00:00
philhofer
c59b495963
cmd/compile: add support for arm64 bit-test instructions
...
Add support for generating TBZ/TBNZ instructions.
The bit-test-and-branch pattern shows up in a number of
important places, including the runtime (gc bitmaps).
Before this change, there were 3 TB[N]?Z instructions in the Go tool,
all of which were in hand-written assembly. After this change, there
are 285. Also, the go1 benchmark binary gets about 4.5kB smaller.
Fixes #21361
Change-Id: I170c138b852754b9b8df149966ca5e62e6dfa771
Reviewed-on: https://go-review.googlesource.com/54470
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-08-15 13:39:11 +00:00
Keith Randall
1e72bf6218
cmd/compile: experiment which clobbers all dead pointer fields
...
The experiment "clobberdead" clobbers all pointer fields that the
compiler thinks are dead, just before and after every safepoint.
Useful for debugging the generation of live pointer bitmaps.
Helped find the following issues:
Update #15936
Update #16026
Update #16095
Update #18860
Change-Id: Id1d12f86845e3d93bae903d968b1eac61fc461f9
Reviewed-on: https://go-review.googlesource.com/23924
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-21 20:19:50 +00:00
Matthew Dempsky
691755304c
cmd/compile/internal/ssa: populate SymEffects for SSA Ops
...
Changes to ${GOARCH}Ops.go files were mechanically produced using
github.com/mdempsky/ssa-symops, a one-off tool that inserts
"SymEffect: X" elements by pattern matching against the Op names.
Change-Id: Ibf3e481ffd588647f2a31662d72114b740ccbfcf
Reviewed-on: https://go-review.googlesource.com/38084
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-14 18:34:45 +00:00
Matthew Dempsky
08d8d5c986
cmd/compile/internal/ssa: replace {Defer,Go}Call with StaticCall
...
Passes toolstash-check -all.
Change-Id: Icf8b75364e4761a5e56567f503b2c1cb17382ed2
Reviewed-on: https://go-review.googlesource.com/38080
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-13 19:44:36 +00:00
David Chase
11b283092a
cmd/compile: add opcode flag hasSideEffects for do-not-remove
...
Added a flag to generic and various architectures' atomic
operations that are judged to have observable side effects
and thus cannot be dead-code-eliminated.
Test requires GOMAXPROCS > 1 without preemption in loop.
Fixes #19182 .
Change-Id: Id2230031abd2cca0bbb32fd68fc8a58fb912070f
Reviewed-on: https://go-review.googlesource.com/37333
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-02-22 15:15:47 +00:00
shawnps
067bab00a8
all: fix misspellings
...
Change-Id: I429637ca91f7db4144f17621de851a548dc1ce76
Reviewed-on: https://go-review.googlesource.com/34923
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-01-07 16:53:25 +00:00
Cherry Zhang
5c59cb4aa3
cmd/compile: make LR allocatable in non-leaf functions on ARM64
...
The mechanism is initially introduced (and reviewed) in CL 30597
on S390X.
Change-Id: I12fbe6e9269b2936690e0ec896cb6b5aa40ad7da
Reviewed-on: https://go-review.googlesource.com/32180
Reviewed-by: David Chase <drchase@google.com>
2016-10-27 15:35:06 +00:00
Michael Munday
15817e409b
cmd/compile: make link register allocatable in non-leaf functions
...
We save and restore the link register in non-leaf functions because
it is clobbered by CALLs. It is therefore available for general
purpose use.
Only enabled on s390x currently. The RC4 benchmarks in particular
benefit from the extra register:
name old speed new speed delta
RC4_128 243MB/s ± 2% 341MB/s ± 2% +40.46% (p=0.008 n=5+5)
RC4_1K 267MB/s ± 0% 359MB/s ± 1% +34.32% (p=0.008 n=5+5)
RC4_8K 271MB/s ± 0% 362MB/s ± 0% +33.61% (p=0.008 n=5+5)
Change-Id: Id23bff95e771da9425353da2f32668b8e34ba09f
Reviewed-on: https://go-review.googlesource.com/30597
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-11 18:52:35 +00:00
Cherry Zhang
b662e524e4
cmd/compile: use CBZ/CBNZ instrunctions on ARM64
...
These are conditional branches that takes a register instead of
flags as control value.
Reduce binary size by 0.7%, text size by 2.4% (cmd/go as an
exmaple).
Change-Id: I0020cfde745f9eab680b8b949ad28c87fe183afd
Reviewed-on: https://go-review.googlesource.com/30030
Reviewed-by: David Chase <drchase@google.com>
2016-10-05 18:22:56 +00:00
Keith Randall
98938189a1
cmd/compile: remove duplicate nilchecks
...
Mark nil check operations as faulting if their arg is zero.
This lets the late nilcheck pass remove duplicates.
Fixes #17242 .
Change-Id: I4c9938d8a5a1e43edd85b4a66f0b34004860bcd9
Reviewed-on: https://go-review.googlesource.com/29952
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-09-27 23:54:01 +00:00
Cherry Zhang
9d4b40f55d
runtime, cmd/compile: implement and use DUFFCOPY on ARM64
...
Change-Id: I8984eac30e5df78d4b94f19412135d3cc36969f8
Reviewed-on: https://go-review.googlesource.com/29910
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-09-27 15:07:31 +00:00
Keith Randall
3134ab3c2d
cmd/compile: redo nil checks
...
Get rid of BlockCheck. Josh goaded me into it, and I went
down a rabbithole making it happen.
NilCheck now panics if the pointer is nil and returns void, as before.
BlockCheck is gone, and NilCheck is no longer a Control value for
any block. It just exists (and deadcode knows not to throw it away).
I rewrote the nilcheckelim pass to handle this case. In particular,
there can now be multiple NilCheck ops per block.
I moved all of the arch-dependent nil check elimination done as
part of ssaGenValue into its own proper pass, so we don't have to
duplicate that code for every architecture.
Making the arch-dependent nil check its own pass means I needed
to add a bunch of flags to the opcode table so I could write
the code without arch-dependent ops everywhere.
Change-Id: I419f891ac9b0de313033ff09115c374163416a9f
Reviewed-on: https://go-review.googlesource.com/29120
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-09-15 02:42:13 +00:00
Cherry Zhang
38d35e714a
cmd/compile, runtime/internal/atomic: intrinsify And8, Or8 on ARM64
...
Also add assembly implementation, in case intrinsics is disabled.
Change-Id: Iff0a8a8ce326651bd29f6c403f5ec08dd3629993
Reviewed-on: https://go-review.googlesource.com/28979
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-09-13 02:09:15 +00:00
Keith Randall
c345a3913f
cmd/compile: get rid of BlockCall
...
No need for it, we can treat calls as (mostly) normal values
that take a memory and return a memory.
Lowers the number of basic blocks needed to represent a function.
"go test -c net/http" uses 27% fewer basic blocks.
Probably doesn't affect generated code much, but should help
various passes whose running time and/or space depends on
the number of basic blocks.
Fixes #15631
Change-Id: I0bf21e123f835e2cfa382753955a4f8bce03dfa6
Reviewed-on: https://go-review.googlesource.com/28950
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-09-12 23:27:02 +00:00
Cherry Zhang
4354ffd38b
cmd/compile: intrinsify Ctz, Bswap, and some atomics on ARM64
...
Change-Id: Ia5bf72b70e6f6522d6fb8cd050e78f862d37b5ae
Reviewed-on: https://go-review.googlesource.com/27936
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-09-08 19:45:25 +00:00
Cherry Zhang
f9dafc742d
cmd/compile, runtime, etc: get rid of constant FP registers
...
On ARM64, MIPS64, and PPC64, some floating point registers were
reserved for constants 0, 1, 2, 0.5, etc. This CL removes them.
On ARM64, they are never used. On MIPS64 and PPC64, the only use
case is a multiplication-by-2 in the old backend of the compiler,
which is replaced with an addition.
Change-Id: I737cbf43283756e3408964fc88c567a938c57036
Reviewed-on: https://go-review.googlesource.com/28095
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-08-30 23:16:17 +00:00
Cherry Zhang
659dd4f1d7
cmd/compile: add more ARM64 optimizations
...
- Use machine instructions for uint64<->float conversions
- Do not enforce alignment on Zero/Move
ARM64 supports unaligned load/stores, but only aligned offset
or small offset can be encoded into instructions.
- Do combined loads
Change-Id: Iffca7dd0f13070b17b784861ce5a30af584680eb
Reviewed-on: https://go-review.googlesource.com/27086
Reviewed-by: David Chase <drchase@google.com>
2016-08-17 18:44:39 +00:00
Cherry Zhang
d99cee79b9
[dev.ssa] cmd/compile, etc.: more ARM64 optimizations, and enable SSA by default
...
Add more ARM64 optimizations:
- use hardware zero register when it is possible.
- use shifted ops.
The assembler supports shifted ops but not documented, nor knows
how to print it. This CL adds them.
- enable fast division.
This was disabled because it makes the old backend generate slower
code. But with SSA it generates faster code.
Turn on SSA by default, also adjust tests.
Change-Id: I7794479954c83bb65008dcb457bc1e21d7496da6
Reviewed-on: https://go-review.googlesource.com/26950
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-08-15 03:37:34 +00:00
Cherry Zhang
ed1ad8f56c
[dev.ssa] cmd/compile: add some ARM64 optimizations
...
Mostly mirrors ARM, includes:
- constant folding
- simplification of load, store, extension, and arithmetics
- nilcheck removal
Change-Id: Iffaa5fcdce100fe327429ecab316cb395e543469
Reviewed-on: https://go-review.googlesource.com/26710
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-08-11 18:08:47 +00:00
Cherry Zhang
0484052358
[dev.ssa] cmd/compile: remove flags from regMask
...
Reg allocator skips flag-typed values. Flag allocator uses the type
and whether the op has "clobberFlags" set.
Tested on AMD64, ARM, ARM64, 386. Passed 'toolstash -cmp' on AMD64.
PPC64 is coded blindly.
Change-Id: Ib1cc27efecef6a1bb27f7d7ed035a582660d244f
Reviewed-on: https://go-review.googlesource.com/25480
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-08-07 03:08:03 +00:00