Commit graph

117 commits

Author SHA1 Message Date
Giovanni Bajo
26085fcea3 cmd/compile: remove asmtest infrastructure
Not used anymore, all tests have been migrated to
the top-level testsuite.

Change-Id: I536e6c14f62153c01e4966ad41e1501b38494c7f
Reviewed-on: https://go-review.googlesource.com/107336
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-15 20:57:00 +00:00
Giovanni Bajo
01aa1d7dbe test: migrate plan9 tests to codegen
And remove it from asmtest. Next CL will remove the whole
asmtest infrastructure.

Change-Id: I5851bf7c617456d62a3c6cffacf70252df7b056b
Reviewed-on: https://go-review.googlesource.com/107335
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-15 20:02:30 +00:00
Alberto Donizetti
467eca6076 test/codegen: port last stack and memcombining tests
And delete them from asm_test.

Also delete an arm64 cmov test has been already ported to the new test
harness.

Change-Id: I4458721e1f512bc9ecbbe1c22a2c9c7109ad68fe
Reviewed-on: https://go-review.googlesource.com/106335
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-04-11 16:08:04 +00:00
Alberto Donizetti
188e2bf897 test/codegen: port arm64 BIC/EON/ORN and masking tests
And delete them from asm_test.

Change-Id: I24f421b87e8cb4770c887a6dfd58eacd0088947d
Reviewed-on: https://go-review.googlesource.com/106056
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-10 10:57:50 +00:00
Alberto Donizetti
d5ff631e6b test/codegen: port last remaining misc bit/arithmetic tests
And delete them from asm_test.

Change-Id: I9a75efe9858ef9d7ac86065f860c2ae3f25b0941
Reviewed-on: https://go-review.googlesource.com/105597
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2018-04-10 07:58:35 +00:00
Alberto Donizetti
54c3f56ee0 test/codegen: port various mem-combining tests
And delete them from asm_test.

Change-Id: I0e33d58274951ab5acb67b0117b60ef617ea887a
Reviewed-on: https://go-review.googlesource.com/105735
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2018-04-09 12:00:06 +00:00
Alberto Donizetti
3e31eb6b84 test/codegen: port arm64 slice zeroing tests
Finish porting arm64 slice zeroing codegen tests; delete them from
asm_test.

Change-Id: Id2532df8ba1c340fa662a6b5238daa3de30548be
Reviewed-on: https://go-review.googlesource.com/105136
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-04-07 09:55:51 +00:00
Alberto Donizetti
f2abca90a2 test/codegen: port arm64 byte slice zeroing tests
And delete them from asm_test.

Change-Id: Id533130470da9176a401cb94972f626f43a62148
Reviewed-on: https://go-review.googlesource.com/103656
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-04-04 13:18:15 +00:00
Alberto Donizetti
3b0b8bcd68 test/codegen: port stack-related tests to codegen
And delete them from asm_test.

Change-Id: Idfe1249052d82d15b9c30b292c78656a0bf7b48d
Reviewed-on: https://go-review.googlesource.com/103315
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-30 08:08:06 +00:00
Alberto Donizetti
a27cd4fd31 test/codegen: port tbz/tbnz arm64 tests
And delete them from asm_test.

Change-Id: I34fcf85ae8ce09cd146fe4ce6a0ae7616bd97e2d
Reviewed-on: https://go-review.googlesource.com/102296
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-24 09:35:53 +00:00
Giovanni Bajo
79112707bb cmd/compile: add patterns for bit set/clear/complement on amd64
This patch completes implementation of BT(Q|L), and adds support
for BT(S|R|C)(Q|L).

Example of code changes from time.(*Time).addSec:

        if t.wall&hasMonotonic != 0 {
  0x1073465               488b08                  MOVQ 0(AX), CX
  0x1073468               4889ca                  MOVQ CX, DX
  0x107346b               48c1e93f                SHRQ $0x3f, CX
  0x107346f               48c1e13f                SHLQ $0x3f, CX
  0x1073473               48f7c1ffffffff          TESTQ $-0x1, CX
  0x107347a               746b                    JE 0x10734e7

        if t.wall&hasMonotonic != 0 {
  0x1073435               488b08                  MOVQ 0(AX), CX
  0x1073438               480fbae13f              BTQ $0x3f, CX
  0x107343d               7363                    JAE 0x10734a2

Another example:

                        t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
  0x10734c8               4881e1ffffff3f          ANDQ $0x3fffffff, CX
  0x10734cf               48c1e61e                SHLQ $0x1e, SI
  0x10734d3               4809ce                  ORQ CX, SI
  0x10734d6               48b90000000000000080    MOVQ $0x8000000000000000, CX
  0x10734e0               4809f1                  ORQ SI, CX
  0x10734e3               488908                  MOVQ CX, 0(AX)

                        t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
  0x107348b		4881e2ffffff3f		ANDQ $0x3fffffff, DX
  0x1073492		48c1e61e		SHLQ $0x1e, SI
  0x1073496		4809f2			ORQ SI, DX
  0x1073499		480fbaea3f		BTSQ $0x3f, DX
  0x107349e		488910			MOVQ DX, 0(AX)

Go1 benchmarks seem unaffected, and I would be surprised
otherwise:

name                     old time/op    new time/op     delta
BinaryTree17-4              2.64s ± 4%      2.56s ± 9%  -2.92%  (p=0.008 n=9+9)
Fannkuch11-4                2.90s ± 1%      2.95s ± 3%  +1.76%  (p=0.010 n=10+9)
FmtFprintfEmpty-4          35.3ns ± 1%     34.5ns ± 2%  -2.34%  (p=0.004 n=9+8)
FmtFprintfString-4         57.0ns ± 1%     58.4ns ± 5%  +2.52%  (p=0.029 n=9+10)
FmtFprintfInt-4            59.8ns ± 3%     59.8ns ± 6%    ~     (p=0.565 n=10+10)
FmtFprintfIntInt-4         93.9ns ± 3%     91.2ns ± 5%  -2.94%  (p=0.014 n=10+9)
FmtFprintfPrefixedInt-4     107ns ± 6%      104ns ± 6%    ~     (p=0.099 n=10+10)
FmtFprintfFloat-4           187ns ± 3%      188ns ± 3%    ~     (p=0.505 n=10+9)
FmtManyArgs-4               410ns ± 1%      415ns ± 6%    ~     (p=0.649 n=8+10)
GobDecode-4                5.30ms ± 3%     5.27ms ± 3%    ~     (p=0.436 n=10+10)
GobEncode-4                4.62ms ± 5%     4.47ms ± 2%  -3.24%  (p=0.001 n=9+10)
Gzip-4                      197ms ± 4%      193ms ± 3%    ~     (p=0.123 n=10+10)
Gunzip-4                   30.4ms ± 3%     30.1ms ± 3%    ~     (p=0.481 n=10+10)
HTTPClientServer-4         76.3µs ± 1%     76.0µs ± 1%    ~     (p=0.236 n=8+9)
JSONEncode-4               10.5ms ± 9%     10.3ms ± 3%    ~     (p=0.280 n=10+10)
JSONDecode-4               42.3ms ±10%     41.3ms ± 2%    ~     (p=0.053 n=9+10)
Mandelbrot200-4            3.80ms ± 2%     3.72ms ± 2%  -2.15%  (p=0.001 n=9+10)
GoParse-4                  2.88ms ±10%     2.81ms ± 2%    ~     (p=0.247 n=10+10)
RegexpMatchEasy0_32-4      69.5ns ± 4%     68.6ns ± 2%    ~     (p=0.171 n=10+10)
RegexpMatchEasy0_1K-4       165ns ± 3%      162ns ± 3%    ~     (p=0.137 n=10+10)
RegexpMatchEasy1_32-4      65.7ns ± 6%     64.4ns ± 2%  -2.02%  (p=0.037 n=10+10)
RegexpMatchEasy1_1K-4       278ns ± 2%      279ns ± 3%    ~     (p=0.991 n=8+9)
RegexpMatchMedium_32-4     99.3ns ± 3%     98.5ns ± 4%    ~     (p=0.457 n=10+9)
RegexpMatchMedium_1K-4     30.1µs ± 1%     30.4µs ± 2%    ~     (p=0.173 n=8+10)
RegexpMatchHard_32-4       1.40µs ± 2%     1.41µs ± 4%    ~     (p=0.565 n=10+10)
RegexpMatchHard_1K-4       42.5µs ± 1%     41.5µs ± 3%  -2.13%  (p=0.002 n=8+9)
Revcomp-4                   332ms ± 4%      328ms ± 5%    ~     (p=0.720 n=9+10)
Template-4                 48.3ms ± 2%     49.6ms ± 3%  +2.56%  (p=0.002 n=8+10)
TimeParse-4                 252ns ± 2%      249ns ± 3%    ~     (p=0.116 n=9+10)
TimeFormat-4                262ns ± 4%      252ns ± 3%  -4.01%  (p=0.000 n=9+10)

name                     old speed      new speed       delta
GobDecode-4               145MB/s ± 3%    146MB/s ± 3%    ~     (p=0.436 n=10+10)
GobEncode-4               166MB/s ± 5%    172MB/s ± 2%  +3.28%  (p=0.001 n=9+10)
Gzip-4                   98.6MB/s ± 4%  100.4MB/s ± 3%    ~     (p=0.123 n=10+10)
Gunzip-4                  639MB/s ± 3%    645MB/s ± 3%    ~     (p=0.481 n=10+10)
JSONEncode-4              185MB/s ± 8%    189MB/s ± 3%    ~     (p=0.280 n=10+10)
JSONDecode-4             46.0MB/s ± 9%   47.0MB/s ± 2%  +2.21%  (p=0.046 n=9+10)
GoParse-4                20.1MB/s ± 9%   20.6MB/s ± 2%    ~     (p=0.239 n=10+10)
RegexpMatchEasy0_32-4     460MB/s ± 4%    467MB/s ± 2%    ~     (p=0.165 n=10+10)
RegexpMatchEasy0_1K-4    6.19GB/s ± 3%   6.28GB/s ± 3%    ~     (p=0.165 n=10+10)
RegexpMatchEasy1_32-4     487MB/s ± 5%    497MB/s ± 2%  +2.00%  (p=0.043 n=10+10)
RegexpMatchEasy1_1K-4    3.67GB/s ± 2%   3.67GB/s ± 3%    ~     (p=0.963 n=8+9)
RegexpMatchMedium_32-4   10.1MB/s ± 3%   10.1MB/s ± 4%    ~     (p=0.435 n=10+9)
RegexpMatchMedium_1K-4   34.0MB/s ± 1%   33.7MB/s ± 2%    ~     (p=0.173 n=8+10)
RegexpMatchHard_32-4     22.9MB/s ± 2%   22.7MB/s ± 4%    ~     (p=0.565 n=10+10)
RegexpMatchHard_1K-4     24.0MB/s ± 3%   24.7MB/s ± 3%  +2.64%  (p=0.001 n=9+9)
Revcomp-4                 766MB/s ± 4%    775MB/s ± 5%    ~     (p=0.720 n=9+10)
Template-4               40.2MB/s ± 2%   39.2MB/s ± 3%  -2.47%  (p=0.002 n=8+10)

The rules match ~1800 times during all.bash.

Fixes #18943

Change-Id: I64be1ada34e89c486dfd935bf429b35652117ed4
Reviewed-on: https://go-review.googlesource.com/94766
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-24 02:38:50 +00:00
Alberto Donizetti
fc6280d4b0 test/codegen: port direct comparisons with memory tests
And remove them from asm_test.

Change-Id: I1ca29b40546d6de06f20bfd550ed8ff87f495454
Reviewed-on: https://go-review.googlesource.com/102115
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-22 17:20:09 +00:00
Alberto Donizetti
be371edd67 test/codegen: port comparisons tests to codegen
And delete them from asm_test.

Change-Id: I64c512bfef3b3da6db5c5d29277675dade28b8ab
Reviewed-on: https://go-review.googlesource.com/101595
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-20 19:38:06 +00:00
Alberto Donizetti
5a4e09837c test/codegen: port maps test to codegen
And delete them from asm_test.

Change-Id: I3cf0934706a640136cb0f646509174f8c1bf3363
Reviewed-on: https://go-review.googlesource.com/101395
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-19 13:39:34 +00:00
Alberto Donizetti
b61b1d2c57 test/codegen: port structs test to codegen
And delete them from asm_test.

Change-Id: Ia286239a3d8f3915f2ca25dbcb39f3354a4f8aea
Reviewed-on: https://go-review.googlesource.com/101138
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-18 16:53:53 +00:00
Alberto Donizetti
cceee685be test/codegen: port floats tests to codegen
And delete them from asm_test.

Change-Id: Ibdaca3496eefc73c731b511ddb9636a1f3dff68c
Reviewed-on: https://go-review.googlesource.com/100915
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-15 18:05:59 +00:00
Alberto Donizetti
ded9a1b372 test/codegen: port len/cap pow2 div tests to codegen
And delete them from asm_test.

Change-Id: I29c8d098a8893e6b669b6272a2f508985ac9d618
Reviewed-on: https://go-review.googlesource.com/100876
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-15 13:34:01 +00:00
Alberto Donizetti
cd3aae9b81 test/codegen: port all small memmove tests to codegen
This change ports all the remaining tests checking that small memmoves
are replaced with MOVs to the new codegen test harness, and deletes
them from the asm_test file.

Change-Id: I01c94b441e27a5d61518035af62d62779dafeb56
Reviewed-on: https://go-review.googlesource.com/100476
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-14 15:57:07 +00:00
Giovanni Bajo
f7ac70a566 test: move rotate tests to top-level testsuite.
Remove old tests from asm_test.

Change-Id: Ib408ec7faa60068bddecf709b93ce308e0ef665a
Reviewed-on: https://go-review.googlesource.com/100075
Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com>
2018-03-11 10:08:18 +00:00
Alberto Donizetti
5f541b11aa test/codegen: port MULs merging tests to codegen
And delete them from asm_go.

Change-Id: I0057cbd90ca55fa51c596e32406e190f3866f93e
Reviewed-on: https://go-review.googlesource.com/99815
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-09 17:01:56 +00:00
Alberto Donizetti
cde34780b7 test/codegen: port math/bits.RotateLeft tests to codegen
Only RotateLeft{64,32} were tested, and just for ppc64. This CL adds
tests for RotateLeft{64,32,16,8} on arm64 and amd64/386, for the cases
where the calls are actually instrinsified.

RotateLeft tests (the last ones for math/bits functions) are deleted
from asm_test.

This CL also adds a space between the "//" and the arch name in the
comments, to uniform this file to the style used in all the other
files.

Change-Id: Ifc2a27261d70bcc294b4ec64490d8367f62d2b89
Reviewed-on: https://go-review.googlesource.com/99596
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-09 10:53:38 +00:00
Alberto Donizetti
3772b2e1d5 test/codegen: port 2^n muls tests to codegen harness
And delete them from the asm_test.go file.

Change-Id: I124c8c352299646ec7db0968cdb0fe59a3b5d83d
Reviewed-on: https://go-review.googlesource.com/99475
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-08 16:30:14 +00:00
Alberto Donizetti
8516ecd05f test/codegen: port math/bits.ReverseBytes tests to codegen
And remove them from ssa_test.

Change-Id: If767af662801219774d1bdb787c77edfa6067770
Reviewed-on: https://go-review.googlesource.com/98976
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-06 20:34:33 +00:00
Alberto Donizetti
18ae5eca3b test/codegen: port math/bits.OnesCount tests to codegen
And remove them from ssa_test.

Change-Id: I3efac5fea529bb0efa2dae32124530482ba5058e
Reviewed-on: https://go-review.googlesource.com/98815
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-06 17:53:00 +00:00
Alberto Donizetti
85dcc709a8 test/codegen: port math/bits.TrailingZeros tests to codegen
And remove them from ssa_test.

Change-Id: Ib5de5c0d908f23915e0847eca338cacf2fa5325b
Reviewed-on: https://go-review.googlesource.com/98795
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-06 11:48:37 +00:00
Alberto Donizetti
83e41b3e76 test/codegen: port math/bits.Leadingzero tests to codegen
Change-Id: Ic21d25db5d56ce77516c53082dfbc010e5875b81
Reviewed-on: https://go-review.googlesource.com/98655
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-05 19:52:04 +00:00
Alberto Donizetti
c1806906d8 test: port bits.Len intrinsics tests to the new codegen harness
This change move bits.Len* intrinsification tests to the new codegen
test harness, removing them from the old ssa_test file. Five different
test functions (one for each bit.Len function tested) was used, to
avoid possible unwanted interactions between multiple calls inside one
function.

Change-Id: Iffd5be55b58e88597fa30a562a28dacb01236d8b
Reviewed-on: https://go-review.googlesource.com/98156
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-05 18:01:19 +00:00
Giovanni Bajo
89ae7045f3 test: convert all math-related tests from asm_test
Change-Id: If542f0b5c5754e6eb2f9b302fe5a148ba9a57338
Reviewed-on: https://go-review.googlesource.com/98443
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-04 16:52:33 +00:00
Giovanni Bajo
fad31e513d test: move load/store combines into asmcheck
This CL moves the load/store combining tests into asmcheck.
In addition at being more compact, it's also now easier to
spot what it is missing in each architecture.

While doing so, I think I uncovered a bug in ppc64le and arm64
rules, because they fail to load/store combine in non-trivial
functions. Not sure why, I'll open an issue.

Change-Id: Ia1572d53c0553d9104f3e52b95e4d1768a8440a3
Reviewed-on: https://go-review.googlesource.com/98441
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-04 16:52:03 +00:00
Giovanni Bajo
8ce74b7d11 test: port a nil-check interface test from asm_test
Change-Id: I69c1688506d1aeca655047acf35d1bff966fc01e
Reviewed-on: https://go-review.googlesource.com/98442
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-03 20:20:54 +00:00
Chad Rosier
39fefa0709 cmd/compile/internal/ssa: combine consecutive BigEndian stores on arm64
This optimization mirrors that which is already implemented for AMD64.  The
optimization specifically targets the binary.BigEndian.PutUint* functions.

encoding-binary results on Amberwing:
name                   old time/op    new time/op    delta
ReadSlice1000Int32s      9.83µs ± 2%    9.78µs ± 1%     ~     (p=0.362 n=9+10)
ReadStruct               5.24µs ± 3%    5.19µs ± 2%     ~     (p=0.285 n=10+10)
ReadInts                 8.35µs ± 8%    8.44µs ± 3%     ~     (p=0.323 n=10+10)
WriteInts                3.38µs ± 3%    3.44µs ±15%     ~     (p=0.921 n=9+10)
WriteSlice1000Int32s     11.4µs ± 6%    10.2µs ± 4%   -9.94%  (p=0.000 n=10+10)
PutUint16                 510ns ±12%     500ns ± 0%     ~     (p=0.586 n=10+7)
PutUint32                 530ns ±15%     490ns ±12%     ~     (p=0.086 n=10+10)
PutUint64                 550ns ± 0%     470ns ± 6%  -14.52%  (p=0.000 n=7+10)
LittleEndianPutUint16     500ns ± 0%     475ns ±16%     ~     (p=0.120 n=7+10)
LittleEndianPutUint32     450ns ± 0%     517ns ±16%  +14.81%  (p=0.004 n=8+9)
LittleEndianPutUint64     550ns ± 0%     485ns ±13%  -11.82%  (p=0.000 n=8+10)
PutUvarint32              685ns ±12%     622ns ± 4%   -9.17%  (p=0.005 n=10+9)
PutUvarint64              735ns ± 9%     711ns ± 9%     ~     (p=0.272 n=10+9)
[Geo mean]               1.47µs         1.42µs        -3.87%

name                   old speed      new speed      delta
ReadSlice1000Int32s     407MB/s ± 2%   409MB/s ± 1%     ~     (p=0.362 n=9+10)
ReadStruct             14.3MB/s ± 3%  14.4MB/s ± 2%     ~     (p=0.250 n=10+10)
ReadInts               3.59MB/s ± 7%  3.56MB/s ± 4%     ~     (p=0.340 n=10+10)
WriteInts              8.87MB/s ± 3%  8.74MB/s ±13%     ~     (p=0.890 n=9+10)
WriteSlice1000Int32s    352MB/s ± 6%   391MB/s ± 4%  +11.03%  (p=0.000 n=10+10)
PutUint16              3.95MB/s ±13%  4.00MB/s ± 0%     ~     (p=0.312 n=10+7)
PutUint32              7.62MB/s ±17%  8.21MB/s ±11%     ~     (p=0.086 n=10+10)
PutUint64              14.6MB/s ± 0%  17.1MB/s ± 6%  +17.28%  (p=0.000 n=7+10)
LittleEndianPutUint16  4.00MB/s ± 0%  4.23MB/s ±18%     ~     (p=0.176 n=7+10)
LittleEndianPutUint32  8.89MB/s ± 0%  7.64MB/s ±20%  -14.05%  (p=0.001 n=8+10)
LittleEndianPutUint64  14.6MB/s ± 0%  16.6MB/s ±12%  +13.86%  (p=0.000 n=8+10)
PutUvarint32           5.86MB/s ±14%  6.44MB/s ± 5%   +9.84%  (p=0.006 n=10+9)
PutUvarint64           10.9MB/s ± 8%  11.3MB/s ± 9%     ~     (p=0.373 n=10+9)
[Geo mean]             14.2MB/s       14.8MB/s        +3.93%

go1 results on Amberwing:
RegexpMatchEasy0_32       254ns ± 0%     254ns ± 0%    ~     (all equal)
RegexpMatchEasy0_1K       547ns ± 0%     547ns ± 0%    ~     (all equal)
RegexpMatchEasy1_32       252ns ± 0%     253ns ± 1%    ~     (p=0.294 n=8+10)
RegexpMatchEasy1_1K       782ns ± 0%     783ns ± 1%    ~     (p=0.529 n=8+9)
RegexpMatchMedium_32      316ns ± 0%     316ns ± 0%    ~     (all equal)
RegexpMatchMedium_1K     51.5µs ± 0%    51.5µs ± 0%    ~     (p=0.645 n=10+9)
RegexpMatchHard_32       2.75µs ± 0%    2.75µs ± 0%    ~     (all equal)
RegexpMatchHard_1K       78.7µs ± 0%    78.7µs ± 0%    ~     (p=0.754 n=10+10)
FmtFprintfEmpty          57.0ns ± 0%    57.0ns ± 0%    ~     (all equal)
FmtFprintfString          111ns ± 0%     111ns ± 0%    ~     (all equal)
FmtFprintfInt             114ns ± 0%     114ns ± 1%    ~     (p=0.065 n=9+10)
FmtFprintfIntInt          182ns ± 0%     178ns ± 0%  -2.20%  (p=0.000 n=10+10)
FmtFprintfPrefixedInt     225ns ± 0%     227ns ± 0%  +0.89%  (p=0.000 n=10+10)
FmtFprintfFloat           307ns ± 0%     307ns ± 0%    ~     (p=1.000 n=9+9)
FmtManyArgs               697ns ± 0%     701ns ± 2%    ~     (p=0.108 n=9+10)
Gzip                      436ms ± 0%     437ms ± 0%  +0.23%  (p=0.000 n=10+8)
HTTPClientServer         88.8µs ± 2%    89.6µs ± 1%  +0.98%  (p=0.019 n=10+10)
JSONEncode               20.1ms ± 1%    20.2ms ± 1%  +0.48%  (p=0.007 n=10+10)
JSONDecode               94.7ms ± 1%    94.1ms ± 0%  -0.62%  (p=0.000 n=10+9)
GobDecode                12.6ms ± 2%    12.6ms ± 1%    ~     (p=0.360 n=10+8)
GobEncode                12.0ms ± 1%    11.9ms ± 1%  -1.34%  (p=0.000 n=10+10)
Mandelbrot200            5.05ms ± 0%    5.05ms ± 0%  +0.12%  (p=0.000 n=10+10)
TimeParse                 448ns ± 0%     448ns ± 0%    ~     (p=0.529 n=8+9)
TimeFormat                501ns ± 1%     501ns ± 1%    ~     (p=1.000 n=10+9)
Template                 90.6ms ± 0%    89.1ms ± 0%  -1.67%  (p=0.000 n=9+9)
GoParse                  6.01ms ± 0%    5.96ms ± 0%  -0.83%  (p=0.000 n=10+9)
BinaryTree17              11.7s ± 0%     11.7s ± 0%    ~     (p=0.481 n=10+10)
Revcomp                   675ms ± 0%     675ms ± 0%    ~     (p=0.436 n=9+9)
Fannkuch11                3.26s ± 0%     3.27s ± 1%  +0.57%  (p=0.000 n=10+10)
[Geo mean]               67.4µs         67.3µs       -0.10%

name                   old speed      new speed      delta
RegexpMatchEasy0_32     126MB/s ± 0%   126MB/s ± 0%    ~     (p=0.353 n=10+7)
RegexpMatchEasy0_1K    1.87GB/s ± 0%  1.87GB/s ± 0%    ~     (p=0.275 n=8+10)
RegexpMatchEasy1_32     127MB/s ± 0%   126MB/s ± 1%    ~     (p=0.110 n=8+10)
RegexpMatchEasy1_1K    1.31GB/s ± 0%  1.31GB/s ± 1%    ~     (p=0.079 n=8+10)
RegexpMatchMedium_32   3.16MB/s ± 0%  3.16MB/s ± 0%    ~     (all equal)
RegexpMatchMedium_1K   19.9MB/s ± 0%  19.9MB/s ± 0%    ~     (p=0.889 n=10+9)
RegexpMatchHard_32     11.7MB/s ± 0%  11.7MB/s ± 0%    ~     (all equal)
RegexpMatchHard_1K     13.0MB/s ± 0%  13.0MB/s ± 0%    ~     (p=1.000 n=10+10)
Gzip                   44.5MB/s ± 0%  44.4MB/s ± 0%  -0.22%  (p=0.000 n=10+8)
JSONEncode             96.6MB/s ± 1%  96.1MB/s ± 1%  -0.48%  (p=0.007 n=10+10)
JSONDecode             20.5MB/s ± 1%  20.6MB/s ± 0%  +0.63%  (p=0.000 n=10+9)
GobDecode              61.0MB/s ± 2%  61.1MB/s ± 1%    ~     (p=0.372 n=10+8)
GobEncode              63.8MB/s ± 1%  64.7MB/s ± 1%  +1.36%  (p=0.000 n=10+10)
Template               21.4MB/s ± 0%  21.8MB/s ± 0%  +1.69%  (p=0.000 n=9+9)
GoParse                9.63MB/s ± 0%  9.71MB/s ± 0%  +0.84%  (p=0.000 n=9+8)
Revcomp                 377MB/s ± 0%   376MB/s ± 0%    ~     (p=0.399 n=9+9)
[Geo mean]             56.2MB/s       56.3MB/s       +0.20%

Change-Id: Ic915373f5ef512f9fbc45745860e5db7f6de6286
Reviewed-on: https://go-review.googlesource.com/97755
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-01 20:29:22 +00:00
Chad Rosier
77ba071ec6 cmd/compile/internal/ssa: combine consecutive LittleEndian stores on arm64
This optimization mirrors that which is already implemented for AMD64.  The
optimization specifically targets the binary.LittleEndian.PutUint* functions.

encoding/binary results on Amberwing:
name                   old time/op    new time/op    delta
ReadSlice1000Int32s      9.67µs ± 1%    9.64µs ± 1%     ~     (p=0.185 n=9+9)
ReadStruct               5.24µs ± 2%    5.36µs ± 2%   +2.24%  (p=0.002 n=10+8)
ReadInts                 8.69µs ± 5%    8.88µs ± 5%     ~     (p=0.083 n=10+10)
WriteInts                3.90µs ±10%    3.71µs ± 9%     ~     (p=0.077 n=10+10)
WriteSlice1000Int32s     10.9µs ± 1%    10.9µs ± 1%     ~     (p=0.701 n=9+9)
PutUint16                 572ns ±14%     505ns ±11%  -11.75%  (p=0.006 n=9+10)
PutUint32                 550ns ±18%     540ns ±11%     ~     (p=0.692 n=10+10)
PutUint64                 565ns ±15%     540ns ±17%     ~     (p=0.248 n=10+10)
LittleEndianPutUint16     540ns ±11%     500ns ±10%     ~     (p=0.094 n=10+10)
LittleEndianPutUint32     520ns ±15%     480ns ±15%     ~     (p=0.087 n=10+10)
LittleEndianPutUint64     505ns ±29%     470ns ±17%     ~     (p=0.208 n=10+10)
PutUvarint32              700ns ±21%     635ns ±10%   -9.29%  (p=0.028 n=10+10)
PutUvarint64              740ns ± 8%     740ns ± 8%     ~     (p=0.713 n=10+10)
[Geo mean]               1.53µs         1.47µs        -3.93%

name                   old speed      new speed      delta
ReadSlice1000Int32s     414MB/s ± 1%   415MB/s ± 1%     ~     (p=0.185 n=9+9)
ReadStruct             14.3MB/s ± 2%  14.0MB/s ± 2%   -2.21%  (p=0.000 n=10+8)
ReadInts               3.45MB/s ± 4%  3.38MB/s ± 6%     ~     (p=0.085 n=10+10)
WriteInts              7.71MB/s ± 9%  8.09MB/s ± 8%   +4.93%  (p=0.048 n=10+10)
WriteSlice1000Int32s    367MB/s ± 1%   366MB/s ± 1%     ~     (p=0.701 n=9+9)
PutUint16              3.51MB/s ±14%  3.99MB/s ±11%  +13.47%  (p=0.009 n=9+10)
PutUint32              7.35MB/s ±21%  7.44MB/s ±10%     ~     (p=0.692 n=10+10)
PutUint64              14.3MB/s ±14%  15.0MB/s ±19%     ~     (p=0.248 n=10+10)
LittleEndianPutUint16  3.72MB/s ±11%  4.03MB/s ±10%     ~     (p=0.094 n=10+10)
LittleEndianPutUint32  7.75MB/s ±15%  8.39MB/s ±13%     ~     (p=0.087 n=10+10)
LittleEndianPutUint64  16.1MB/s ±23%  17.2MB/s ±16%     ~     (p=0.208 n=10+10)
PutUvarint32           5.76MB/s ±18%  6.32MB/s ±10%   +9.72%  (p=0.028 n=10+10)
PutUvarint64           10.8MB/s ± 8%  10.8MB/s ± 8%     ~     (p=0.713 n=10+10)
[Geo mean]             13.7MB/s       14.3MB/s        +4.02%

go1 results on Amberwing:
name                   old time/op    new time/op    delta
RegexpMatchEasy0_32       249ns ± 0%     249ns ± 0%    ~     (p=0.087 n=10+10)
RegexpMatchEasy0_1K       584ns ± 0%     584ns ± 0%    ~     (all equal)
RegexpMatchEasy1_32       246ns ± 0%     246ns ± 0%    ~     (p=1.000 n=10+10)
RegexpMatchEasy1_1K       806ns ± 0%     806ns ± 0%    ~     (p=0.706 n=10+9)
RegexpMatchMedium_32      314ns ± 0%     314ns ± 0%    ~     (all equal)
RegexpMatchMedium_1K     52.1µs ± 0%    52.1µs ± 0%    ~     (p=0.245 n=10+8)
RegexpMatchHard_32       2.75µs ± 1%    2.75µs ± 1%    ~     (p=0.690 n=10+10)
RegexpMatchHard_1K       78.9µs ± 0%    78.9µs ± 1%    ~     (p=0.295 n=9+9)
FmtFprintfEmpty          58.5ns ± 0%    58.5ns ± 0%    ~     (all equal)
FmtFprintfString          112ns ± 0%     112ns ± 0%    ~     (all equal)
FmtFprintfInt             117ns ± 0%     116ns ± 0%  -0.85%  (p=0.000 n=10+10)
FmtFprintfIntInt          181ns ± 0%     181ns ± 0%    ~     (all equal)
FmtFprintfPrefixedInt     222ns ± 0%     224ns ± 0%  +0.90%  (p=0.000 n=9+10)
FmtFprintfFloat           318ns ± 1%     322ns ± 0%    ~     (p=0.059 n=10+8)
FmtManyArgs               736ns ± 1%     735ns ± 0%    ~     (p=0.206 n=9+9)
Gzip                      437ms ± 0%     436ms ± 0%  -0.25%  (p=0.000 n=10+10)
HTTPClientServer         89.8µs ± 1%    90.2µs ± 2%    ~     (p=0.393 n=10+10)
JSONEncode               20.1ms ± 1%    20.2ms ± 1%    ~     (p=0.065 n=9+10)
JSONDecode               94.2ms ± 1%    93.9ms ± 1%  -0.42%  (p=0.043 n=10+10)
GobDecode                12.7ms ± 1%    12.8ms ± 2%  +0.94%  (p=0.019 n=10+10)
GobEncode                12.1ms ± 0%    12.1ms ± 0%    ~     (p=0.052 n=10+10)
Mandelbrot200            5.06ms ± 0%    5.05ms ± 0%  -0.04%  (p=0.000 n=9+10)
TimeParse                 450ns ± 3%     446ns ± 0%    ~     (p=0.238 n=10+9)
TimeFormat                485ns ± 1%     483ns ± 1%    ~     (p=0.073 n=10+10)
Template                 90.4ms ± 0%    90.7ms ± 0%  +0.29%  (p=0.000 n=8+10)
GoParse                  6.01ms ± 0%    6.03ms ± 0%  +0.35%  (p=0.000 n=10+10)
BinaryTree17              11.7s ± 0%     11.7s ± 0%    ~     (p=0.481 n=10+10)
Revcomp                   669ms ± 0%     669ms ± 0%    ~     (p=0.315 n=10+10)
Fannkuch11                3.40s ± 0%     3.37s ± 0%  -0.92%  (p=0.000 n=10+10)
[Geo mean]               67.9µs         67.9µs       +0.02%

name                   old speed      new speed      delta
RegexpMatchEasy0_32     128MB/s ± 0%   128MB/s ± 0%  -0.08%  (p=0.003 n=8+10)
RegexpMatchEasy0_1K    1.75GB/s ± 0%  1.75GB/s ± 0%    ~     (p=0.642 n=8+10)
RegexpMatchEasy1_32     130MB/s ± 0%   130MB/s ± 0%    ~     (p=0.690 n=10+9)
RegexpMatchEasy1_1K    1.27GB/s ± 0%  1.27GB/s ± 0%    ~     (p=0.661 n=10+9)
RegexpMatchMedium_32   3.18MB/s ± 0%  3.18MB/s ± 0%    ~     (all equal)
RegexpMatchMedium_1K   19.7MB/s ± 0%  19.6MB/s ± 0%    ~     (p=0.190 n=10+9)
RegexpMatchHard_32     11.6MB/s ± 0%  11.6MB/s ± 1%    ~     (p=0.669 n=10+10)
RegexpMatchHard_1K     13.0MB/s ± 0%  13.0MB/s ± 0%    ~     (p=0.718 n=9+9)
Gzip                   44.4MB/s ± 0%  44.5MB/s ± 0%  +0.24%  (p=0.000 n=10+10)
JSONEncode             96.5MB/s ± 1%  96.1MB/s ± 1%    ~     (p=0.065 n=9+10)
JSONDecode             20.6MB/s ± 1%  20.7MB/s ± 1%  +0.42%  (p=0.041 n=10+10)
GobDecode              60.6MB/s ± 1%  60.0MB/s ± 2%  -0.92%  (p=0.016 n=10+10)
GobEncode              63.4MB/s ± 0%  63.6MB/s ± 0%    ~     (p=0.055 n=10+10)
Template               21.5MB/s ± 0%  21.4MB/s ± 0%  -0.30%  (p=0.000 n=9+10)
GoParse                9.64MB/s ± 0%  9.61MB/s ± 0%  -0.36%  (p=0.000 n=10+10)
Revcomp                 380MB/s ± 0%   380MB/s ± 0%    ~     (p=0.323 n=10+10)
[Geo mean]             56.0MB/s       55.9MB/s       -0.07%

Change-Id: I79a4978d42d01a5f72ed5ceec07f5e78ac6b3859
Reviewed-on: https://go-review.googlesource.com/97175
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-01 16:40:19 +00:00
Ben Shi
1057624985 cmd/compile: optimize ARM64 code with EON/ORN
EON and ORN are efficient ARM64 instructions. EON combines (x ^ ^y)
into a single operation, and so ORN does for (x | ^y).

This CL implements that optimization. And here are benchmark results
with RaspberryPi3/ArchLinux.

1. A specific test gets about 13% improvement.
EONORN                      181µs ± 0%     157µs ± 0%  -13.26%  (p=0.000 n=26+23)
(https://github.com/benshi001/ugo1/blob/master/eonorn_test.go)

2. There is little change in the go1 benchmark, excluding noise.
name                     old time/op    new time/op    delta
BinaryTree17-4              44.1s ± 2%     44.0s ± 2%    ~     (p=0.513 n=30+30)
Fannkuch11-4                32.9s ± 3%     32.8s ± 3%  -0.12%  (p=0.024 n=30+30)
FmtFprintfEmpty-4           561ns ± 9%     558ns ± 9%    ~     (p=0.654 n=30+30)
FmtFprintfString-4         1.09µs ± 4%    1.09µs ± 3%    ~     (p=0.158 n=30+30)
FmtFprintfInt-4            1.12µs ± 0%    1.12µs ± 0%    ~     (p=0.917 n=23+28)
FmtFprintfIntInt-4         1.73µs ± 0%    1.76µs ± 4%    ~     (p=0.665 n=23+30)
FmtFprintfPrefixedInt-4    2.15µs ± 1%    2.15µs ± 0%    ~     (p=0.389 n=27+26)
FmtFprintfFloat-4          3.18µs ± 4%    3.13µs ± 0%  -1.50%  (p=0.003 n=30+23)
FmtManyArgs-4              7.32µs ± 4%    7.21µs ± 0%    ~     (p=0.220 n=30+25)
GobDecode-4                99.1ms ± 9%    97.0ms ± 0%  -2.07%  (p=0.000 n=30+23)
GobEncode-4                83.3ms ± 3%    82.4ms ± 4%    ~     (p=0.321 n=30+30)
Gzip-4                      4.39s ± 4%     4.32s ± 2%  -1.42%  (p=0.017 n=30+23)
Gunzip-4                    440ms ± 0%     447ms ± 4%  +1.54%  (p=0.006 n=24+30)
HTTPClientServer-4          547µs ± 1%     537µs ± 1%  -1.91%  (p=0.000 n=30+30)
JSONEncode-4                211ms ± 0%     211ms ± 0%  +0.04%  (p=0.000 n=23+24)
JSONDecode-4                847ms ± 0%     847ms ± 0%    ~     (p=0.158 n=25+25)
Mandelbrot200-4            46.5ms ± 0%    46.5ms ± 0%  -0.04%  (p=0.000 n=25+24)
GoParse-4                  43.4ms ± 0%    43.4ms ± 0%    ~     (p=0.494 n=24+25)
RegexpMatchEasy0_32-4      1.03µs ± 0%    1.03µs ± 0%    ~     (all equal)
RegexpMatchEasy0_1K-4      4.02µs ± 3%    3.98µs ± 0%  -0.95%  (p=0.003 n=30+24)
RegexpMatchEasy1_32-4      1.01µs ± 3%    1.01µs ± 2%    ~     (p=0.629 n=30+30)
RegexpMatchEasy1_1K-4      6.39µs ± 0%    6.39µs ± 0%    ~     (p=0.564 n=24+23)
RegexpMatchMedium_32-4     1.80µs ± 3%    1.78µs ± 0%    ~     (p=0.155 n=30+24)
RegexpMatchMedium_1K-4      555µs ± 0%     563µs ± 3%  +1.55%  (p=0.004 n=27+30)
RegexpMatchHard_32-4       31.0µs ± 4%    30.5µs ± 1%  -1.58%  (p=0.000 n=30+23)
RegexpMatchHard_1K-4        947µs ± 4%     931µs ± 0%  -1.66%  (p=0.009 n=30+24)
Revcomp-4                   7.71s ± 4%     7.71s ± 4%    ~     (p=0.196 n=29+30)
Template-4                  877ms ± 0%     878ms ± 0%  +0.16%  (p=0.018 n=23+27)
TimeParse-4                4.75µs ± 1%    4.74µs ± 0%    ~     (p=0.895 n=24+23)
TimeFormat-4               4.83µs ± 4%    4.83µs ± 4%    ~     (p=0.767 n=30+30)
[Geo mean]                  709µs          707µs       -0.35%

name                     old speed      new speed      delta
GobDecode-4              7.75MB/s ± 8%  7.91MB/s ± 0%  +2.03%  (p=0.001 n=30+23)
GobEncode-4              9.22MB/s ± 3%  9.32MB/s ± 4%    ~     (p=0.389 n=30+30)
Gzip-4                   4.43MB/s ± 4%  4.43MB/s ± 4%    ~     (p=0.888 n=30+30)
Gunzip-4                 44.1MB/s ± 0%  43.4MB/s ± 4%  -1.46%  (p=0.009 n=24+30)
JSONEncode-4             9.18MB/s ± 0%  9.18MB/s ± 0%    ~     (p=0.308 n=16+24)
JSONDecode-4             2.29MB/s ± 0%  2.29MB/s ± 0%    ~     (all equal)
GoParse-4                1.33MB/s ± 0%  1.33MB/s ± 0%    ~     (all equal)
RegexpMatchEasy0_32-4    30.9MB/s ± 0%  30.9MB/s ± 0%    ~     (p=1.000 n=23+24)
RegexpMatchEasy0_1K-4     255MB/s ± 3%   257MB/s ± 0%  +0.92%  (p=0.004 n=30+24)
RegexpMatchEasy1_32-4    31.7MB/s ± 3%  31.6MB/s ± 2%    ~     (p=0.603 n=30+30)
RegexpMatchEasy1_1K-4     160MB/s ± 0%   160MB/s ± 0%    ~     (p=0.435 n=24+23)
RegexpMatchMedium_32-4    554kB/s ± 3%   560kB/s ± 0%  +1.08%  (p=0.004 n=30+24)
RegexpMatchMedium_1K-4   1.85MB/s ± 0%  1.82MB/s ± 3%  -1.48%  (p=0.001 n=27+30)
RegexpMatchHard_32-4     1.03MB/s ± 4%  1.05MB/s ± 1%  +1.51%  (p=0.027 n=30+23)
RegexpMatchHard_1K-4     1.08MB/s ± 4%  1.10MB/s ± 0%  +1.69%  (p=0.002 n=30+25)
Revcomp-4                33.0MB/s ± 4%  33.0MB/s ± 4%    ~     (p=0.272 n=29+30)
Template-4               2.21MB/s ± 0%  2.21MB/s ± 0%    ~     (all equal)
[Geo mean]               7.75MB/s       7.77MB/s       +0.29%

3. There is little regression in the compilecmp benchmark.
name        old time/op       new time/op       delta
Template          2.28s ± 3%        2.28s ± 4%    ~     (p=0.739 n=10+10)
Unicode           1.34s ± 4%        1.32s ± 3%    ~     (p=0.113 n=10+9)
GoTypes           8.10s ± 3%        8.18s ± 3%    ~     (p=0.393 n=10+10)
Compiler          39.0s ± 3%        39.2s ± 3%    ~     (p=0.393 n=10+10)
SSA                114s ± 3%         115s ± 2%    ~     (p=0.631 n=10+10)
Flate             1.41s ± 2%        1.42s ± 3%    ~     (p=0.353 n=10+10)
GoParser          1.81s ± 1%        1.83s ± 2%    ~     (p=0.211 n=10+9)
Reflect           5.06s ± 2%        5.06s ± 2%    ~     (p=0.912 n=10+10)
Tar               2.19s ± 3%        2.20s ± 3%    ~     (p=0.247 n=10+10)
XML               2.65s ± 2%        2.67s ± 5%    ~     (p=0.796 n=10+10)
[Geo mean]        4.92s             4.93s       +0.27%

name        old user-time/op  new user-time/op  delta
Template          2.81s ± 2%        2.81s ± 3%    ~     (p=0.971 n=10+10)
Unicode           1.70s ± 3%        1.67s ± 5%    ~     (p=0.315 n=10+10)
GoTypes           9.71s ± 1%        9.78s ± 1%  +0.71%  (p=0.023 n=10+10)
Compiler          47.3s ± 1%        47.1s ± 3%    ~     (p=0.579 n=10+10)
SSA                143s ± 2%         143s ± 2%    ~     (p=0.280 n=10+10)
Flate             1.70s ± 3%        1.71s ± 3%    ~     (p=0.481 n=10+10)
GoParser          2.21s ± 3%        2.21s ± 1%    ~     (p=0.549 n=10+9)
Reflect           5.89s ± 1%        5.87s ± 2%    ~     (p=0.739 n=10+10)
Tar               2.66s ± 2%        2.63s ± 2%    ~     (p=0.105 n=10+10)
XML               3.16s ± 3%        3.18s ± 2%    ~     (p=0.143 n=10+10)
[Geo mean]        5.97s             5.97s       -0.06%

name        old text-bytes    new text-bytes    delta
HelloSize         637kB ± 0%        637kB ± 0%    ~     (all equal)

name        old data-bytes    new data-bytes    delta
HelloSize        9.46kB ± 0%       9.46kB ± 0%    ~     (all equal)

name        old bss-bytes     new bss-bytes     delta
HelloSize         125kB ± 0%        125kB ± 0%    ~     (all equal)

name        old exe-bytes     new exe-bytes     delta
HelloSize        1.24MB ± 0%       1.24MB ± 0%    ~     (all equal)

Change-Id: Ie27357d65c5ce9d07afdffebe1e2daadcaa3369f
Reviewed-on: https://go-review.googlesource.com/97036
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-28 23:42:40 +00:00
Balaram Makam
094258408d cmd/compile: improve fractional word zeroing
This change improves fractional word zeroing by
using overlapping MOVDs for the fractions.

Performance of go1 benchmarks on Amberwing was all noise:
name                   old time/op    new time/op    delta
RegexpMatchEasy0_32       247ns ± 0%     246ns ± 0%  -0.40%  (p=0.008 n=5+5)
RegexpMatchEasy0_1K       581ns ± 0%     579ns ± 0%  -0.34%  (p=0.000 n=5+4)
RegexpMatchEasy1_32       244ns ± 0%     242ns ± 0%    ~     (p=0.079 n=4+5)
RegexpMatchEasy1_1K       804ns ± 0%     805ns ± 0%    ~     (p=0.238 n=5+4)
RegexpMatchMedium_32      313ns ± 0%     311ns ± 0%  -0.64%  (p=0.008 n=5+5)
RegexpMatchMedium_1K     52.2µs ± 0%    51.9µs ± 0%  -0.52%  (p=0.016 n=5+4)
RegexpMatchHard_32       2.75µs ± 0%    2.74µs ± 0%    ~     (p=0.603 n=5+5)
RegexpMatchHard_1K       78.8µs ± 0%    78.9µs ± 0%  +0.05%  (p=0.008 n=5+5)
FmtFprintfEmpty          58.6ns ± 0%    58.6ns ± 0%    ~     (p=0.159 n=5+5)
FmtFprintfString          118ns ± 0%     119ns ± 0%  +0.85%  (p=0.008 n=5+5)
FmtFprintfInt             119ns ± 0%     123ns ± 0%  +3.36%  (p=0.016 n=5+4)
FmtFprintfIntInt          192ns ± 0%     200ns ± 0%  +4.17%  (p=0.008 n=5+5)
FmtFprintfPrefixedInt     224ns ± 0%     209ns ± 0%  -6.70%  (p=0.008 n=5+5)
FmtFprintfFloat           335ns ± 0%     335ns ± 0%    ~     (all equal)
FmtManyArgs               775ns ± 0%     811ns ± 1%  +4.67%  (p=0.016 n=4+5)
Gzip                      437ms ± 0%     438ms ± 0%  +0.19%  (p=0.008 n=5+5)
HTTPClientServer         88.7µs ± 1%    90.3µs ± 1%  +1.75%  (p=0.016 n=5+5)
JSONEncode               20.1ms ± 1%    20.1ms ± 0%    ~     (p=1.000 n=5+5)
JSONDecode               94.7ms ± 1%    94.8ms ± 1%    ~     (p=0.548 n=5+5)
GobDecode                12.8ms ± 1%    12.8ms ± 1%    ~     (p=0.548 n=5+5)
GobEncode                12.1ms ± 0%    12.1ms ± 0%    ~     (p=0.151 n=5+5)
Mandelbrot200            5.37ms ± 0%    5.37ms ± 0%  -0.03%  (p=0.008 n=5+5)
TimeParse                 450ns ± 0%     451ns ± 1%    ~     (p=0.635 n=4+5)
TimeFormat                485ns ± 0%     484ns ± 0%    ~     (p=0.508 n=5+5)
Template                 90.4ms ± 0%    90.2ms ± 0%  -0.24%  (p=0.016 n=5+5)
GoParse                  5.98ms ± 0%    5.98ms ± 0%    ~     (p=1.000 n=5+5)
BinaryTree17              11.8s ± 0%     11.8s ± 0%    ~     (p=0.841 n=5+5)
Revcomp                   669ms ± 0%     669ms ± 0%    ~     (p=0.310 n=5+5)
Fannkuch11                3.28s ± 0%     3.34s ± 0%  +1.64%  (p=0.008 n=5+5)

name                   old speed      new speed      delta
RegexpMatchEasy0_32     129MB/s ± 0%   130MB/s ± 0%  +0.30%  (p=0.016 n=4+5)
RegexpMatchEasy0_1K    1.76GB/s ± 0%  1.77GB/s ± 0%  +0.27%  (p=0.016 n=5+4)
RegexpMatchEasy1_32     131MB/s ± 0%   132MB/s ± 0%  +0.71%  (p=0.016 n=4+5)
RegexpMatchEasy1_1K    1.27GB/s ± 0%  1.27GB/s ± 0%  -0.17%  (p=0.016 n=5+4)
RegexpMatchMedium_32   3.19MB/s ± 0%  3.21MB/s ± 0%  +0.63%  (p=0.008 n=5+5)
RegexpMatchMedium_1K   19.6MB/s ± 0%  19.7MB/s ± 0%  +0.52%  (p=0.016 n=5+4)
RegexpMatchHard_32     11.7MB/s ± 0%  11.7MB/s ± 0%    ~     (p=0.643 n=5+5)
RegexpMatchHard_1K     13.0MB/s ± 0%  13.0MB/s ± 0%    ~     (p=0.079 n=4+5)
Gzip                   44.4MB/s ± 0%  44.3MB/s ± 0%  -0.19%  (p=0.008 n=5+5)
JSONEncode             96.3MB/s ± 1%  96.4MB/s ± 0%    ~     (p=1.000 n=5+5)
JSONDecode             20.5MB/s ± 1%  20.5MB/s ± 1%    ~     (p=0.460 n=5+5)
GobDecode              60.1MB/s ± 1%  59.9MB/s ± 1%    ~     (p=0.548 n=5+5)
GobEncode              63.5MB/s ± 0%  63.7MB/s ± 0%    ~     (p=0.135 n=5+5)
Template               21.5MB/s ± 0%  21.5MB/s ± 0%  +0.24%  (p=0.016 n=5+5)
GoParse                9.68MB/s ± 0%  9.69MB/s ± 0%    ~     (p=0.786 n=5+5)
Revcomp                 380MB/s ± 0%   380MB/s ± 0%    ~     (p=0.310 n=5+5)
Change-Id: I596eee6421cdbad1a0189cdb9fe0628bba534eaf
Reviewed-on: https://go-review.googlesource.com/96775
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-28 23:28:39 +00:00
Ilya Tocar
0f2ef0ad44 cmd/compile/internal/ssa: combine byte stores on amd64
On amd64 we optimize  encoding/binary.BigEndian.PutUint{16,32,64}
into bswap + single store, but strangely enough not LittleEndian.PutUint{16,32}.
We have similar rules, but they use 64-bit shifts everywhere,
and fail for 16/32-bit case. Add rules that matchLittleEndian.PutUint,
and relevant tests. Performance results:

LittleEndianPutUint16-6    1.43ns ± 0%    1.07ns ± 0%   -25.17%  (p=0.000 n=9+9)
LittleEndianPutUint32-6    2.14ns ± 0%    0.94ns ± 0%   -56.07%  (p=0.019 n=6+8)

LittleEndianPutUint16-6  1.40GB/s ± 0%  1.87GB/s ± 0%   +33.24%  (p=0.000 n=9+9)
LittleEndianPutUint32-6  1.87GB/s ± 0%  4.26GB/s ± 0%  +128.54%  (p=0.000 n=8+8)

Discovered, while looking at ethereum_ethash from community benchmarks

Change-Id: Id86d5443687ecddd2803edf3203dbdd1246f61fe
Reviewed-on: https://go-review.googlesource.com/95475
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-02-27 19:38:50 +00:00
Chad Rosier
ecd9e8a2fe cmd/compile/internal/ssa: combine zero stores into larger stores on arm64
This reduces the go tool binary on arm64 by 12k.

go1 results on Amberwing:
name                   old time/op    new time/op    delta
RegexpMatchEasy0_32       249ns ± 0%     249ns ± 0%    ~     (p=0.087 n=10+10)
RegexpMatchEasy0_1K       584ns ± 0%     584ns ± 0%    ~     (all equal)
RegexpMatchEasy1_32       246ns ± 0%     246ns ± 0%    ~     (p=1.000 n=10+10)
RegexpMatchEasy1_1K       806ns ± 0%     806ns ± 0%    ~     (p=0.706 n=10+9)
RegexpMatchMedium_32      314ns ± 0%     314ns ± 0%    ~     (all equal)
RegexpMatchMedium_1K     52.1µs ± 0%    52.1µs ± 0%    ~     (p=0.245 n=10+8)
RegexpMatchHard_32       2.75µs ± 1%    2.75µs ± 1%    ~     (p=0.690 n=10+10)
RegexpMatchHard_1K       78.9µs ± 0%    78.9µs ± 1%    ~     (p=0.295 n=9+9)
FmtFprintfEmpty          58.5ns ± 0%    58.5ns ± 0%    ~     (all equal)
FmtFprintfString          112ns ± 0%     112ns ± 0%    ~     (all equal)
FmtFprintfInt             117ns ± 0%     116ns ± 0%  -0.85%  (p=0.000 n=10+10)
FmtFprintfIntInt          181ns ± 0%     181ns ± 0%    ~     (all equal)
FmtFprintfPrefixedInt     222ns ± 0%     224ns ± 0%  +0.90%  (p=0.000 n=9+10)
FmtFprintfFloat           318ns ± 1%     322ns ± 0%    ~     (p=0.059 n=10+8)
FmtManyArgs               736ns ± 1%     735ns ± 0%    ~     (p=0.206 n=9+9)
Gzip                      437ms ± 0%     436ms ± 0%  -0.25%  (p=0.000 n=10+10)
HTTPClientServer         89.8µs ± 1%    90.2µs ± 2%    ~     (p=0.393 n=10+10)
JSONEncode               20.1ms ± 1%    20.2ms ± 1%    ~     (p=0.065 n=9+10)
JSONDecode               94.2ms ± 1%    93.9ms ± 1%  -0.42%  (p=0.043 n=10+10)
GobDecode                12.7ms ± 1%    12.8ms ± 2%  +0.94%  (p=0.019 n=10+10)
GobEncode                12.1ms ± 0%    12.1ms ± 0%    ~     (p=0.052 n=10+10)
Mandelbrot200            5.06ms ± 0%    5.05ms ± 0%  -0.04%  (p=0.000 n=9+10)
TimeParse                 450ns ± 3%     446ns ± 0%    ~     (p=0.238 n=10+9)
TimeFormat                485ns ± 1%     483ns ± 1%    ~     (p=0.073 n=10+10)
Template                 90.4ms ± 0%    90.7ms ± 0%  +0.29%  (p=0.000 n=8+10)
GoParse                  6.01ms ± 0%    6.03ms ± 0%  +0.35%  (p=0.000 n=10+10)
BinaryTree17              11.7s ± 0%     11.7s ± 0%    ~     (p=0.481 n=10+10)
Revcomp                   669ms ± 0%     669ms ± 0%    ~     (p=0.315 n=10+10)
Fannkuch11                3.40s ± 0%     3.37s ± 0%  -0.92%  (p=0.000 n=10+10)
[Geo mean]               67.9µs         67.9µs       +0.02%

name                   old speed      new speed      delta
RegexpMatchEasy0_32     128MB/s ± 0%   128MB/s ± 0%  -0.08%  (p=0.003 n=8+10)
RegexpMatchEasy0_1K    1.75GB/s ± 0%  1.75GB/s ± 0%    ~     (p=0.642 n=8+10)
RegexpMatchEasy1_32     130MB/s ± 0%   130MB/s ± 0%    ~     (p=0.690 n=10+9)
RegexpMatchEasy1_1K    1.27GB/s ± 0%  1.27GB/s ± 0%    ~     (p=0.661 n=10+9)
RegexpMatchMedium_32   3.18MB/s ± 0%  3.18MB/s ± 0%    ~     (all equal)
RegexpMatchMedium_1K   19.7MB/s ± 0%  19.6MB/s ± 0%    ~     (p=0.190 n=10+9)
RegexpMatchHard_32     11.6MB/s ± 0%  11.6MB/s ± 1%    ~     (p=0.669 n=10+10)
RegexpMatchHard_1K     13.0MB/s ± 0%  13.0MB/s ± 0%    ~     (p=0.718 n=9+9)
Gzip                   44.4MB/s ± 0%  44.5MB/s ± 0%  +0.24%  (p=0.000 n=10+10)
JSONEncode             96.5MB/s ± 1%  96.1MB/s ± 1%    ~     (p=0.065 n=9+10)
JSONDecode             20.6MB/s ± 1%  20.7MB/s ± 1%  +0.42%  (p=0.041 n=10+10)
GobDecode              60.6MB/s ± 1%  60.0MB/s ± 2%  -0.92%  (p=0.016 n=10+10)
GobEncode              63.4MB/s ± 0%  63.6MB/s ± 0%    ~     (p=0.055 n=10+10)
Template               21.5MB/s ± 0%  21.4MB/s ± 0%  -0.30%  (p=0.000 n=9+10)
GoParse                9.64MB/s ± 0%  9.61MB/s ± 0%  -0.36%  (p=0.000 n=10+10)
Revcomp                 380MB/s ± 0%   380MB/s ± 0%    ~     (p=0.323 n=10+10)
[Geo mean]             56.0MB/s       55.9MB/s       -0.07%

Change-Id: Ia732fa57fbcf4767d72382516d9f16705d177736
Reviewed-on: https://go-review.googlesource.com/96435
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-27 00:07:25 +00:00
Keith Randall
4b00d3f4a2 cmd/compile: implement comparisons directly with memory
Allow the compiler to generate code like CMPQ 16(AX), $7

It's tricky because it's difficult to spill such a comparison during
flagalloc, because the same memory state might not be available at
the restore locations.

Solve this problem by decomposing the compare+load back into its parts
if it needs to be spilled.

The big win is that the write barrier test goes from:

MOVL	runtime.writeBarrier(SB), CX
TESTL	CX, CX
JNE	60

to

CMPL	runtime.writeBarrier(SB), $0
JNE	59

It's one instruction and one byte smaller.

Fixes #19485
Fixes #15245
Update #22460

Binaries are about 0.15% smaller.

Change-Id: I4fd8d1111b6b9924d52f9a0901ca1b2e5cce0836
Reviewed-on: https://go-review.googlesource.com/86035
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-02-26 23:49:44 +00:00
Alberto Donizetti
37a038a3dc cmd/compile: add code generation tests for sqrt intrinsics
Add "sqrt-intrisified" code generation tests for mips64 and 386, where
we weren't intrisifying math.Sqrt (see CL 96615 and CL 95916), and for
mips and amd64, which lacked sqrt intrinsics tests.

Change-Id: I0cfc08aec6eefd47f3cd7a5995a89393e8b7ed9e
Reviewed-on: https://go-review.googlesource.com/96716
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-23 16:48:53 +00:00
Giovanni Bajo
0cacc4d0e2 cmd/compile: fold LEAQ/ADDQconst into SETx ops
This saves an instruction and a register. The new rules
match ~4900 times during all.bash.

Change-Id: I2f867c5e70262004e31f545f3bb89e939c45b718
Reviewed-on: https://go-review.googlesource.com/94767
Reviewed-by: Keith Randall <khr@golang.org>
2018-02-20 22:32:35 +00:00
philhofer
2d0172c3a7 cmd/compile/internal/ssa: emit csel on arm64
Introduce a new SSA pass to generate CondSelect intstrutions,
and add CondSelect lowering rules for arm64.

In order to make the CSEL instruction easier to optimize,
and to simplify the introduction of CSNEG, CSINC, and CSINV
in the future, modify the CSEL instruction to accept a condition
code in the aux field.

Notably, this change makes the go1 Gzip benchmark
more than 10% faster.

Benchmarks on a Cavium ThunderX:

name                      old time/op    new time/op    delta
BinaryTree17-96              15.9s ± 6%     16.0s ± 4%     ~     (p=0.968 n=10+9)
Fannkuch11-96                7.17s ± 0%     7.00s ± 0%   -2.43%  (p=0.000 n=8+9)
FmtFprintfEmpty-96           208ns ± 1%     207ns ± 0%     ~     (p=0.152 n=10+8)
FmtFprintfString-96          379ns ± 0%     375ns ± 0%   -0.95%  (p=0.000 n=10+9)
FmtFprintfInt-96             385ns ± 0%     383ns ± 0%   -0.52%  (p=0.000 n=9+10)
FmtFprintfIntInt-96          591ns ± 0%     586ns ± 0%   -0.85%  (p=0.006 n=7+9)
FmtFprintfPrefixedInt-96     656ns ± 0%     667ns ± 0%   +1.71%  (p=0.000 n=10+10)
FmtFprintfFloat-96           967ns ± 0%     984ns ± 0%   +1.78%  (p=0.000 n=10+10)
FmtManyArgs-96              2.35µs ± 0%    2.25µs ± 0%   -4.63%  (p=0.000 n=9+8)
GobDecode-96                31.0ms ± 0%    30.8ms ± 0%   -0.36%  (p=0.006 n=9+9)
GobEncode-96                24.4ms ± 0%    24.5ms ± 0%   +0.30%  (p=0.000 n=9+9)
Gzip-96                      1.60s ± 0%     1.43s ± 0%  -10.58%  (p=0.000 n=9+10)
Gunzip-96                    167ms ± 0%     169ms ± 0%   +0.83%  (p=0.000 n=8+9)
HTTPClientServer-96          311µs ± 1%     308µs ± 0%   -0.75%  (p=0.000 n=10+10)
JSONEncode-96               65.0ms ± 0%    64.8ms ± 0%   -0.25%  (p=0.000 n=9+8)
JSONDecode-96                262ms ± 1%     261ms ± 1%     ~     (p=0.579 n=10+10)
Mandelbrot200-96            18.0ms ± 0%    18.1ms ± 0%   +0.17%  (p=0.000 n=8+10)
GoParse-96                  14.0ms ± 0%    14.1ms ± 1%   +0.42%  (p=0.003 n=9+10)
RegexpMatchEasy0_32-96       644ns ± 2%     645ns ± 2%     ~     (p=0.836 n=10+10)
RegexpMatchEasy0_1K-96      3.70µs ± 0%    3.49µs ± 0%   -5.58%  (p=0.000 n=10+10)
RegexpMatchEasy1_32-96       662ns ± 2%     657ns ± 2%     ~     (p=0.137 n=10+10)
RegexpMatchEasy1_1K-96      4.47µs ± 0%    4.31µs ± 0%   -3.48%  (p=0.000 n=10+10)
RegexpMatchMedium_32-96      844ns ± 2%     849ns ± 1%     ~     (p=0.208 n=10+10)
RegexpMatchMedium_1K-96      179µs ± 0%     182µs ± 0%   +1.20%  (p=0.000 n=10+10)
RegexpMatchHard_32-96       10.0µs ± 0%    10.1µs ± 0%   +0.48%  (p=0.000 n=10+9)
RegexpMatchHard_1K-96        297µs ± 0%     297µs ± 0%   -0.14%  (p=0.000 n=10+10)
Revcomp-96                   3.08s ± 0%     3.13s ± 0%   +1.56%  (p=0.000 n=9+9)
Template-96                  276ms ± 2%     275ms ± 1%     ~     (p=0.393 n=10+10)
TimeParse-96                1.37µs ± 0%    1.36µs ± 0%   -0.53%  (p=0.000 n=10+7)
TimeFormat-96               1.40µs ± 0%    1.42µs ± 0%   +0.97%  (p=0.000 n=10+10)
[Geo mean]                   264µs          262µs        -0.77%

Change-Id: Ie54eee4b3092af53e6da3baa6d1755098f57f3a2
Reviewed-on: https://go-review.googlesource.com/55670
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-02-20 06:00:54 +00:00
Chad Rosier
07f0f09563 cmd/compile: make math.Ceil/Floor/Round/Trunc intrinsics on arm64
name       old time/op  new time/op  delta
Ceil        550ns ± 0%   486ns ± 7%  -11.64%  (p=0.000 n=13+18)
Floor       495ns ±19%   512ns ±12%     ~     (p=0.164 n=20+20)
Round       550ns ± 0%   487ns ± 8%  -11.49%  (p=0.000 n=12+19)
Trunc       563ns ± 7%   488ns ±13%  -13.44%  (p=0.000 n=15+2)

Change-Id: I53f234b160b3c026a277506e2cf977d150379464
Reviewed-on: https://go-review.googlesource.com/88295
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-16 15:37:57 +00:00
Balaram Makam
fcba05148f cmd/compile: arm64 intrinsics for math/bits.OnesCount
This adds math/bits intrinsics for OnesCount on arm64.

name         old time/op  new time/op  delta
OnesCount    3.81ns ± 0%  1.60ns ± 0%  -57.96%  (p=0.000 n=7+8)
OnesCount8   1.60ns ± 0%  1.60ns ± 0%     ~     (all equal)
OnesCount16  2.41ns ± 0%  1.60ns ± 0%  -33.61%  (p=0.000 n=8+8)
OnesCount32  4.17ns ± 0%  1.60ns ± 0%  -61.58%  (p=0.000 n=8+8)
OnesCount64  3.80ns ± 0%  1.60ns ± 0%  -57.84%  (p=0.000 n=8+8)

Update #18616

Conflicts:
	src/cmd/compile/internal/gc/asm_test.go

Change-Id: I63ac2f63acafdb1f60656ab8a56be0b326eec5cb
Reviewed-on: https://go-review.googlesource.com/90835
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-15 23:00:20 +00:00
Chad Rosier
51932c326f cmd/compile: improve absorb shifts optimization for arm64
Current absorb shifts optimization can generate dead Value nodes which increase
use count of other live nodes. It will impact other optimizations (such as
combined loads) which are enabled based on specific use count. This patch fixes
the issue by decreasing the use count of nodes referenced by dead Value nodes
generated by absorb shifts optimization.

Performance impacts on go1 benchmarks (data collected on A57@2GHzx8):

name                     old time/op    new time/op    delta
BinaryTree17-8              6.28s ± 2%     6.24s ± 1%     ~     (p=0.065 n=10+9)
Fannkuch11-8                6.32s ± 0%     6.33s ± 0%   +0.17%  (p=0.000 n=10+10)
FmtFprintfEmpty-8          98.9ns ± 0%    99.2ns ± 0%   +0.34%  (p=0.000 n=9+7)
FmtFprintfString-8          183ns ± 1%     182ns ± 1%   -1.01%  (p=0.005 n=9+10)
FmtFprintfInt-8             199ns ± 1%     202ns ± 1%   +1.41%  (p=0.000 n=10+9)
FmtFprintfIntInt-8          272ns ± 1%     276ns ± 3%   +1.36%  (p=0.015 n=10+10)
FmtFprintfPrefixedInt-8     367ns ± 1%     369ns ± 1%   +0.68%  (p=0.042 n=10+10)
FmtFprintfFloat-8           491ns ± 1%     493ns ± 1%     ~     (p=0.064 n=10+10)
FmtManyArgs-8              1.31µs ± 1%    1.32µs ± 1%   +0.39%  (p=0.042 n=8+9)
GobDecode-8                17.0ms ± 2%    16.2ms ± 2%   -4.74%  (p=0.000 n=10+10)
GobEncode-8                13.7ms ± 2%    13.4ms ± 1%   -2.40%  (p=0.000 n=10+9)
Gzip-8                      844ms ± 0%     737ms ± 0%  -12.70%  (p=0.000 n=10+10)
Gunzip-8                   84.4ms ± 1%    83.9ms ± 0%   -0.55%  (p=0.000 n=10+8)
HTTPClientServer-8          122µs ± 1%     124µs ± 1%   +1.75%  (p=0.000 n=10+9)
JSONEncode-8               34.9ms ± 1%    32.4ms ± 0%   -7.11%  (p=0.000 n=10+9)
JSONDecode-8                150ms ± 0%     146ms ± 1%   -2.84%  (p=0.000 n=7+10)
Mandelbrot200-8            10.0ms ± 0%    10.0ms ± 0%     ~     (p=0.529 n=10+10)
GoParse-8                  8.18ms ± 1%    8.03ms ± 0%   -1.93%  (p=0.000 n=10+10)
RegexpMatchEasy0_32-8       209ns ± 0%     209ns ± 0%     ~     (p=0.248 n=10+9)
RegexpMatchEasy0_1K-8       789ns ± 1%     790ns ± 0%     ~     (p=0.361 n=10+10)
RegexpMatchEasy1_32-8       202ns ± 0%     202ns ± 1%     ~     (p=0.137 n=8+10)
RegexpMatchEasy1_1K-8      1.12µs ± 2%    1.12µs ± 1%     ~     (p=0.810 n=10+10)
RegexpMatchMedium_32-8      298ns ± 0%     298ns ± 0%     ~     (p=0.443 n=10+9)
RegexpMatchMedium_1K-8     83.0µs ± 5%    78.6µs ± 0%   -5.37%  (p=0.000 n=10+10)
RegexpMatchHard_32-8       4.32µs ± 0%    4.26µs ± 0%   -1.47%  (p=0.000 n=10+10)
RegexpMatchHard_1K-8        132µs ± 4%     126µs ± 0%   -4.41%  (p=0.000 n=10+9)
Revcomp-8                   1.11s ± 0%     1.11s ± 0%   +0.14%  (p=0.017 n=10+9)
Template-8                  155ms ± 1%     155ms ± 1%     ~     (p=0.796 n=10+10)
TimeParse-8                 774ns ± 1%     785ns ± 1%   +1.41%  (p=0.001 n=10+10)
TimeFormat-8                788ns ± 1%     806ns ± 1%   +2.24%  (p=0.000 n=10+9)

name                     old speed      new speed      delta
GobDecode-8              45.2MB/s ± 2%  47.5MB/s ± 2%   +4.96%  (p=0.000 n=10+10)
GobEncode-8              56.0MB/s ± 2%  57.4MB/s ± 1%   +2.44%  (p=0.000 n=10+9)
Gzip-8                   23.0MB/s ± 0%  26.3MB/s ± 0%  +14.55%  (p=0.000 n=10+10)
Gunzip-8                  230MB/s ± 1%   231MB/s ± 0%   +0.55%  (p=0.000 n=10+8)
JSONEncode-8             55.6MB/s ± 1%  59.9MB/s ± 0%   +7.65%  (p=0.000 n=10+9)
JSONDecode-8             12.9MB/s ± 0%  13.3MB/s ± 1%   +2.94%  (p=0.000 n=7+10)
GoParse-8                7.08MB/s ± 1%  7.22MB/s ± 0%   +1.95%  (p=0.000 n=10+10)
RegexpMatchEasy0_32-8     153MB/s ± 0%   153MB/s ± 0%   -0.16%  (p=0.023 n=10+10)
RegexpMatchEasy0_1K-8    1.30GB/s ± 1%  1.30GB/s ± 0%     ~     (p=0.393 n=10+10)
RegexpMatchEasy1_32-8     158MB/s ± 0%   158MB/s ± 0%     ~     (p=0.684 n=10+10)
RegexpMatchEasy1_1K-8     915MB/s ± 2%   918MB/s ± 1%     ~     (p=0.796 n=10+10)
RegexpMatchMedium_32-8   3.35MB/s ± 0%  3.35MB/s ± 0%     ~     (p=1.000 n=10+9)
RegexpMatchMedium_1K-8   12.3MB/s ± 5%  13.0MB/s ± 0%   +5.56%  (p=0.000 n=10+10)
RegexpMatchHard_32-8     7.40MB/s ± 0%  7.51MB/s ± 0%   +1.50%  (p=0.000 n=10+10)
RegexpMatchHard_1K-8     7.75MB/s ± 4%  8.10MB/s ± 0%   +4.52%  (p=0.000 n=10+8)
Revcomp-8                 229MB/s ± 0%   228MB/s ± 0%   -0.14%  (p=0.017 n=10+9)
Template-8               12.5MB/s ± 1%  12.5MB/s ± 1%     ~     (p=0.780 n=10+10)

Change-Id: I103389f168eac79f6af44e8fef93acc2a7a4ac96
Reviewed-on: https://go-review.googlesource.com/88415
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-15 20:54:50 +00:00
Chad Rosier
cdd961630c cmd/compile: generate tbz/tbnz when comparing against zero on arm64
The tbz/tbnz checks the sign bit to determine if the value is >= 0 or < 0.

go1 benchmark results:
name                   old speed      new speed      delta
JSONEncode             94.4MB/s ± 1%  95.7MB/s ± 0%  +1.36%  (p=0.000 n=10+9)
JSONDecode             19.7MB/s ± 1%  19.9MB/s ± 1%  +1.08%  (p=0.000 n=9+10)
Gzip                   45.5MB/s ± 0%  46.0MB/s ± 0%  +1.06%  (p=0.000 n=10+10)
Revcomp                 376MB/s ± 0%   379MB/s ± 0%  +0.69%  (p=0.000 n=10+10)
RegexpMatchHard_1K     12.6MB/s ± 0%  12.7MB/s ± 0%  +0.57%  (p=0.000 n=10+8)
RegexpMatchMedium_32   3.21MB/s ± 0%  3.22MB/s ± 0%  +0.31%  (p=0.000 n=9+10)
RegexpMatchEasy1_1K    1.27GB/s ± 0%  1.27GB/s ± 0%  +0.23%  (p=0.000 n=9+9)
RegexpMatchHard_32     11.4MB/s ± 0%  11.4MB/s ± 1%  +0.19%  (p=0.036 n=10+8)
RegexpMatchEasy0_1K    1.77GB/s ± 0%  1.77GB/s ± 0%  +0.13%  (p=0.000 n=9+10)
RegexpMatchMedium_1K   19.3MB/s ± 0%  19.3MB/s ± 0%  +0.04%  (p=0.008 n=10+8)
RegexpMatchEasy0_32     131MB/s ± 0%   131MB/s ± 0%    ~     (p=0.211 n=10+10)
GobDecode              57.5MB/s ± 1%  57.6MB/s ± 2%    ~     (p=0.469 n=10+10)
GobEncode              58.6MB/s ± 1%  58.5MB/s ± 2%    ~     (p=0.781 n=10+10)
GoParse                9.40MB/s ± 0%  9.39MB/s ± 0%  -0.19%  (p=0.005 n=10+9)
RegexpMatchEasy1_32     133MB/s ± 0%   133MB/s ± 0%  -0.48%  (p=0.000 n=10+10)
Template               20.9MB/s ± 0%  20.6MB/s ± 0%  -1.54%  (p=0.000 n=8+10)

Change-Id: I411efe44db35c3962445618d5a47c12e31b3925b
Reviewed-on: https://go-review.googlesource.com/92715
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-14 15:52:41 +00:00
Austin Clements
2010189407 runtime: remove legacy eager write barrier
Now that the buffered write barrier is implemented for all
architectures, we can remove the old eager write barrier
implementation. This CL removes the implementation from the runtime,
support in the compiler for calling it, and updates some compiler
tests that relied on the old eager barrier support. It also makes sure
that all of the useful comments from the old write barrier
implementation still have a place to live.

Fixes #22460.

Updates #21640 since this fixes the layering concerns of the write
barrier (but not the other things in that issue).

Change-Id: I580f93c152e89607e0a72fe43370237ba97bae74
Reviewed-on: https://go-review.googlesource.com/92705
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-13 16:34:46 +00:00
Keith Randall
23e8e197b0 cmd/compile: use unsigned loads for multi-element comparisons
When loading multiple elements of an array into a single register,
make sure we treat them as unsigned.  When treated as signed, the
upper bits might all be set, causing the shift-or combo to clobber
the values higher in the register.

Fixes #23719.

Change-Id: Ic87da03e9bd0fe2c60bb214b99f846e4e9446052
Reviewed-on: https://go-review.googlesource.com/92335
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-02-06 18:24:33 +00:00
Caleb Spare
67fdf587dc cmd/compile: don't combine 64-bit loads/stores on amd64
This causes a performance regression for some calls.

Fixes #23424.
Updates #6853.

Change-Id: Id1db652d5aca0ce631a3417c0c056d6637fefa9e
Reviewed-on: https://go-review.googlesource.com/88135
Run-TryBot: Caleb Spare <cespare@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-01-17 22:05:33 +00:00
Keith Randall
fa1f52c5f6 cmd/compile: always nil check before interface call
Fixes #22703

The fix was already done by Cherry for defer/go of an interface call (CL 23820).
We just need to do it everywhere.

Change-Id: I0115d22e443931fe1bcce44c93c4d0770b5fd268
Reviewed-on: https://go-review.googlesource.com/77450
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-11-14 05:39:45 +00:00
Alberto Donizetti
33a9f01729 cmd/compile: add mul by ±2ⁿ code-generation tests for arm/arm64
This change adds code generation tests for multiplication by ±2ⁿ for
arm and arm64, in preparation for a future CL which will remove the
relevant architecture-specific SSA rules (the reduction is already
performed by rules in generic.rules added in CL 36323).

Change-Id: Iebdd5c3bb2fc632c85888569ff0c49f78569a862
Reviewed-on: https://go-review.googlesource.com/75752
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-11-04 10:28:27 +00:00
Lynn Boger
bb1fd3b5ff cmd/compile: add rules to improve consecutive byte loads and stores on ppc64le
This adds new rules to recognize consecutive byte loads and
stores and lowers them to loads and stores such as lhz, lwz, ld,
sth, stw, std. This change only covers the little endian cases
on little endian machines, such as is found in encoding/binary
UintXX or PutUintXX for little endian. Big endian will be done
later.

Updates were also made to binary_test.go to allow the benchmark
for Uint and PutUint to actually use those functions because
the way they were written, those functions were being
optimized out.

Testcases were also added to cmd/compile/internal/gc/asm_test.go.

Updates #22496

The following improvement can be found in golang.org/x/crypto

poly1305:

Benchmark64-16              142           114           -19.72%
Benchmark1K-16              1717          1424          -17.06%
Benchmark64Unaligned-16     142           113           -20.42%
Benchmark1KUnaligned-16     1721          1428          -17.02%

chacha20poly1305:

BenchmarkChacha20Poly1305Open_64-16     1012       885   -12.55%
BenchmarkChacha20Poly1305Seal_64-16     971        836   -13.90%
BenchmarkChacha20Poly1305Open_1350-16   11113      9539  -14.16%
BenchmarkChacha20Poly1305Seal_1350-16   11013      9392  -14.72%
BenchmarkChacha20Poly1305Open_8K-16     61074      53431 -12.51%
BenchmarkChacha20Poly1305Seal_8K-16     61214      54806 -10.47%

Other improvements of around 10% found in crypto/tls.

Results after updating encoding/binary/binary_test.go:

BenchmarkLittleEndianPutUint64-16     1.87      0.93      -50.27%
BenchmarkLittleEndianPutUint32-16     1.19      0.93      -21.85%
BenchmarkLittleEndianPutUint16-16     1.16      1.03      -11.21%

Change-Id: I7bbe2fbcbd11362d58662fecd907a0c07e6ca2fb
Reviewed-on: https://go-review.googlesource.com/74410
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
2017-11-03 18:46:59 +00:00