Commit graph

79 commits

Author SHA1 Message Date
Giovanni Bajo
0cacc4d0e2 cmd/compile: fold LEAQ/ADDQconst into SETx ops
This saves an instruction and a register. The new rules
match ~4900 times during all.bash.

Change-Id: I2f867c5e70262004e31f545f3bb89e939c45b718
Reviewed-on: https://go-review.googlesource.com/94767
Reviewed-by: Keith Randall <khr@golang.org>
2018-02-20 22:32:35 +00:00
philhofer
2d0172c3a7 cmd/compile/internal/ssa: emit csel on arm64
Introduce a new SSA pass to generate CondSelect intstrutions,
and add CondSelect lowering rules for arm64.

In order to make the CSEL instruction easier to optimize,
and to simplify the introduction of CSNEG, CSINC, and CSINV
in the future, modify the CSEL instruction to accept a condition
code in the aux field.

Notably, this change makes the go1 Gzip benchmark
more than 10% faster.

Benchmarks on a Cavium ThunderX:

name                      old time/op    new time/op    delta
BinaryTree17-96              15.9s ± 6%     16.0s ± 4%     ~     (p=0.968 n=10+9)
Fannkuch11-96                7.17s ± 0%     7.00s ± 0%   -2.43%  (p=0.000 n=8+9)
FmtFprintfEmpty-96           208ns ± 1%     207ns ± 0%     ~     (p=0.152 n=10+8)
FmtFprintfString-96          379ns ± 0%     375ns ± 0%   -0.95%  (p=0.000 n=10+9)
FmtFprintfInt-96             385ns ± 0%     383ns ± 0%   -0.52%  (p=0.000 n=9+10)
FmtFprintfIntInt-96          591ns ± 0%     586ns ± 0%   -0.85%  (p=0.006 n=7+9)
FmtFprintfPrefixedInt-96     656ns ± 0%     667ns ± 0%   +1.71%  (p=0.000 n=10+10)
FmtFprintfFloat-96           967ns ± 0%     984ns ± 0%   +1.78%  (p=0.000 n=10+10)
FmtManyArgs-96              2.35µs ± 0%    2.25µs ± 0%   -4.63%  (p=0.000 n=9+8)
GobDecode-96                31.0ms ± 0%    30.8ms ± 0%   -0.36%  (p=0.006 n=9+9)
GobEncode-96                24.4ms ± 0%    24.5ms ± 0%   +0.30%  (p=0.000 n=9+9)
Gzip-96                      1.60s ± 0%     1.43s ± 0%  -10.58%  (p=0.000 n=9+10)
Gunzip-96                    167ms ± 0%     169ms ± 0%   +0.83%  (p=0.000 n=8+9)
HTTPClientServer-96          311µs ± 1%     308µs ± 0%   -0.75%  (p=0.000 n=10+10)
JSONEncode-96               65.0ms ± 0%    64.8ms ± 0%   -0.25%  (p=0.000 n=9+8)
JSONDecode-96                262ms ± 1%     261ms ± 1%     ~     (p=0.579 n=10+10)
Mandelbrot200-96            18.0ms ± 0%    18.1ms ± 0%   +0.17%  (p=0.000 n=8+10)
GoParse-96                  14.0ms ± 0%    14.1ms ± 1%   +0.42%  (p=0.003 n=9+10)
RegexpMatchEasy0_32-96       644ns ± 2%     645ns ± 2%     ~     (p=0.836 n=10+10)
RegexpMatchEasy0_1K-96      3.70µs ± 0%    3.49µs ± 0%   -5.58%  (p=0.000 n=10+10)
RegexpMatchEasy1_32-96       662ns ± 2%     657ns ± 2%     ~     (p=0.137 n=10+10)
RegexpMatchEasy1_1K-96      4.47µs ± 0%    4.31µs ± 0%   -3.48%  (p=0.000 n=10+10)
RegexpMatchMedium_32-96      844ns ± 2%     849ns ± 1%     ~     (p=0.208 n=10+10)
RegexpMatchMedium_1K-96      179µs ± 0%     182µs ± 0%   +1.20%  (p=0.000 n=10+10)
RegexpMatchHard_32-96       10.0µs ± 0%    10.1µs ± 0%   +0.48%  (p=0.000 n=10+9)
RegexpMatchHard_1K-96        297µs ± 0%     297µs ± 0%   -0.14%  (p=0.000 n=10+10)
Revcomp-96                   3.08s ± 0%     3.13s ± 0%   +1.56%  (p=0.000 n=9+9)
Template-96                  276ms ± 2%     275ms ± 1%     ~     (p=0.393 n=10+10)
TimeParse-96                1.37µs ± 0%    1.36µs ± 0%   -0.53%  (p=0.000 n=10+7)
TimeFormat-96               1.40µs ± 0%    1.42µs ± 0%   +0.97%  (p=0.000 n=10+10)
[Geo mean]                   264µs          262µs        -0.77%

Change-Id: Ie54eee4b3092af53e6da3baa6d1755098f57f3a2
Reviewed-on: https://go-review.googlesource.com/55670
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-02-20 06:00:54 +00:00
Chad Rosier
07f0f09563 cmd/compile: make math.Ceil/Floor/Round/Trunc intrinsics on arm64
name       old time/op  new time/op  delta
Ceil        550ns ± 0%   486ns ± 7%  -11.64%  (p=0.000 n=13+18)
Floor       495ns ±19%   512ns ±12%     ~     (p=0.164 n=20+20)
Round       550ns ± 0%   487ns ± 8%  -11.49%  (p=0.000 n=12+19)
Trunc       563ns ± 7%   488ns ±13%  -13.44%  (p=0.000 n=15+2)

Change-Id: I53f234b160b3c026a277506e2cf977d150379464
Reviewed-on: https://go-review.googlesource.com/88295
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-16 15:37:57 +00:00
Balaram Makam
fcba05148f cmd/compile: arm64 intrinsics for math/bits.OnesCount
This adds math/bits intrinsics for OnesCount on arm64.

name         old time/op  new time/op  delta
OnesCount    3.81ns ± 0%  1.60ns ± 0%  -57.96%  (p=0.000 n=7+8)
OnesCount8   1.60ns ± 0%  1.60ns ± 0%     ~     (all equal)
OnesCount16  2.41ns ± 0%  1.60ns ± 0%  -33.61%  (p=0.000 n=8+8)
OnesCount32  4.17ns ± 0%  1.60ns ± 0%  -61.58%  (p=0.000 n=8+8)
OnesCount64  3.80ns ± 0%  1.60ns ± 0%  -57.84%  (p=0.000 n=8+8)

Update #18616

Conflicts:
	src/cmd/compile/internal/gc/asm_test.go

Change-Id: I63ac2f63acafdb1f60656ab8a56be0b326eec5cb
Reviewed-on: https://go-review.googlesource.com/90835
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-15 23:00:20 +00:00
Chad Rosier
51932c326f cmd/compile: improve absorb shifts optimization for arm64
Current absorb shifts optimization can generate dead Value nodes which increase
use count of other live nodes. It will impact other optimizations (such as
combined loads) which are enabled based on specific use count. This patch fixes
the issue by decreasing the use count of nodes referenced by dead Value nodes
generated by absorb shifts optimization.

Performance impacts on go1 benchmarks (data collected on A57@2GHzx8):

name                     old time/op    new time/op    delta
BinaryTree17-8              6.28s ± 2%     6.24s ± 1%     ~     (p=0.065 n=10+9)
Fannkuch11-8                6.32s ± 0%     6.33s ± 0%   +0.17%  (p=0.000 n=10+10)
FmtFprintfEmpty-8          98.9ns ± 0%    99.2ns ± 0%   +0.34%  (p=0.000 n=9+7)
FmtFprintfString-8          183ns ± 1%     182ns ± 1%   -1.01%  (p=0.005 n=9+10)
FmtFprintfInt-8             199ns ± 1%     202ns ± 1%   +1.41%  (p=0.000 n=10+9)
FmtFprintfIntInt-8          272ns ± 1%     276ns ± 3%   +1.36%  (p=0.015 n=10+10)
FmtFprintfPrefixedInt-8     367ns ± 1%     369ns ± 1%   +0.68%  (p=0.042 n=10+10)
FmtFprintfFloat-8           491ns ± 1%     493ns ± 1%     ~     (p=0.064 n=10+10)
FmtManyArgs-8              1.31µs ± 1%    1.32µs ± 1%   +0.39%  (p=0.042 n=8+9)
GobDecode-8                17.0ms ± 2%    16.2ms ± 2%   -4.74%  (p=0.000 n=10+10)
GobEncode-8                13.7ms ± 2%    13.4ms ± 1%   -2.40%  (p=0.000 n=10+9)
Gzip-8                      844ms ± 0%     737ms ± 0%  -12.70%  (p=0.000 n=10+10)
Gunzip-8                   84.4ms ± 1%    83.9ms ± 0%   -0.55%  (p=0.000 n=10+8)
HTTPClientServer-8          122µs ± 1%     124µs ± 1%   +1.75%  (p=0.000 n=10+9)
JSONEncode-8               34.9ms ± 1%    32.4ms ± 0%   -7.11%  (p=0.000 n=10+9)
JSONDecode-8                150ms ± 0%     146ms ± 1%   -2.84%  (p=0.000 n=7+10)
Mandelbrot200-8            10.0ms ± 0%    10.0ms ± 0%     ~     (p=0.529 n=10+10)
GoParse-8                  8.18ms ± 1%    8.03ms ± 0%   -1.93%  (p=0.000 n=10+10)
RegexpMatchEasy0_32-8       209ns ± 0%     209ns ± 0%     ~     (p=0.248 n=10+9)
RegexpMatchEasy0_1K-8       789ns ± 1%     790ns ± 0%     ~     (p=0.361 n=10+10)
RegexpMatchEasy1_32-8       202ns ± 0%     202ns ± 1%     ~     (p=0.137 n=8+10)
RegexpMatchEasy1_1K-8      1.12µs ± 2%    1.12µs ± 1%     ~     (p=0.810 n=10+10)
RegexpMatchMedium_32-8      298ns ± 0%     298ns ± 0%     ~     (p=0.443 n=10+9)
RegexpMatchMedium_1K-8     83.0µs ± 5%    78.6µs ± 0%   -5.37%  (p=0.000 n=10+10)
RegexpMatchHard_32-8       4.32µs ± 0%    4.26µs ± 0%   -1.47%  (p=0.000 n=10+10)
RegexpMatchHard_1K-8        132µs ± 4%     126µs ± 0%   -4.41%  (p=0.000 n=10+9)
Revcomp-8                   1.11s ± 0%     1.11s ± 0%   +0.14%  (p=0.017 n=10+9)
Template-8                  155ms ± 1%     155ms ± 1%     ~     (p=0.796 n=10+10)
TimeParse-8                 774ns ± 1%     785ns ± 1%   +1.41%  (p=0.001 n=10+10)
TimeFormat-8                788ns ± 1%     806ns ± 1%   +2.24%  (p=0.000 n=10+9)

name                     old speed      new speed      delta
GobDecode-8              45.2MB/s ± 2%  47.5MB/s ± 2%   +4.96%  (p=0.000 n=10+10)
GobEncode-8              56.0MB/s ± 2%  57.4MB/s ± 1%   +2.44%  (p=0.000 n=10+9)
Gzip-8                   23.0MB/s ± 0%  26.3MB/s ± 0%  +14.55%  (p=0.000 n=10+10)
Gunzip-8                  230MB/s ± 1%   231MB/s ± 0%   +0.55%  (p=0.000 n=10+8)
JSONEncode-8             55.6MB/s ± 1%  59.9MB/s ± 0%   +7.65%  (p=0.000 n=10+9)
JSONDecode-8             12.9MB/s ± 0%  13.3MB/s ± 1%   +2.94%  (p=0.000 n=7+10)
GoParse-8                7.08MB/s ± 1%  7.22MB/s ± 0%   +1.95%  (p=0.000 n=10+10)
RegexpMatchEasy0_32-8     153MB/s ± 0%   153MB/s ± 0%   -0.16%  (p=0.023 n=10+10)
RegexpMatchEasy0_1K-8    1.30GB/s ± 1%  1.30GB/s ± 0%     ~     (p=0.393 n=10+10)
RegexpMatchEasy1_32-8     158MB/s ± 0%   158MB/s ± 0%     ~     (p=0.684 n=10+10)
RegexpMatchEasy1_1K-8     915MB/s ± 2%   918MB/s ± 1%     ~     (p=0.796 n=10+10)
RegexpMatchMedium_32-8   3.35MB/s ± 0%  3.35MB/s ± 0%     ~     (p=1.000 n=10+9)
RegexpMatchMedium_1K-8   12.3MB/s ± 5%  13.0MB/s ± 0%   +5.56%  (p=0.000 n=10+10)
RegexpMatchHard_32-8     7.40MB/s ± 0%  7.51MB/s ± 0%   +1.50%  (p=0.000 n=10+10)
RegexpMatchHard_1K-8     7.75MB/s ± 4%  8.10MB/s ± 0%   +4.52%  (p=0.000 n=10+8)
Revcomp-8                 229MB/s ± 0%   228MB/s ± 0%   -0.14%  (p=0.017 n=10+9)
Template-8               12.5MB/s ± 1%  12.5MB/s ± 1%     ~     (p=0.780 n=10+10)

Change-Id: I103389f168eac79f6af44e8fef93acc2a7a4ac96
Reviewed-on: https://go-review.googlesource.com/88415
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-15 20:54:50 +00:00
Chad Rosier
cdd961630c cmd/compile: generate tbz/tbnz when comparing against zero on arm64
The tbz/tbnz checks the sign bit to determine if the value is >= 0 or < 0.

go1 benchmark results:
name                   old speed      new speed      delta
JSONEncode             94.4MB/s ± 1%  95.7MB/s ± 0%  +1.36%  (p=0.000 n=10+9)
JSONDecode             19.7MB/s ± 1%  19.9MB/s ± 1%  +1.08%  (p=0.000 n=9+10)
Gzip                   45.5MB/s ± 0%  46.0MB/s ± 0%  +1.06%  (p=0.000 n=10+10)
Revcomp                 376MB/s ± 0%   379MB/s ± 0%  +0.69%  (p=0.000 n=10+10)
RegexpMatchHard_1K     12.6MB/s ± 0%  12.7MB/s ± 0%  +0.57%  (p=0.000 n=10+8)
RegexpMatchMedium_32   3.21MB/s ± 0%  3.22MB/s ± 0%  +0.31%  (p=0.000 n=9+10)
RegexpMatchEasy1_1K    1.27GB/s ± 0%  1.27GB/s ± 0%  +0.23%  (p=0.000 n=9+9)
RegexpMatchHard_32     11.4MB/s ± 0%  11.4MB/s ± 1%  +0.19%  (p=0.036 n=10+8)
RegexpMatchEasy0_1K    1.77GB/s ± 0%  1.77GB/s ± 0%  +0.13%  (p=0.000 n=9+10)
RegexpMatchMedium_1K   19.3MB/s ± 0%  19.3MB/s ± 0%  +0.04%  (p=0.008 n=10+8)
RegexpMatchEasy0_32     131MB/s ± 0%   131MB/s ± 0%    ~     (p=0.211 n=10+10)
GobDecode              57.5MB/s ± 1%  57.6MB/s ± 2%    ~     (p=0.469 n=10+10)
GobEncode              58.6MB/s ± 1%  58.5MB/s ± 2%    ~     (p=0.781 n=10+10)
GoParse                9.40MB/s ± 0%  9.39MB/s ± 0%  -0.19%  (p=0.005 n=10+9)
RegexpMatchEasy1_32     133MB/s ± 0%   133MB/s ± 0%  -0.48%  (p=0.000 n=10+10)
Template               20.9MB/s ± 0%  20.6MB/s ± 0%  -1.54%  (p=0.000 n=8+10)

Change-Id: I411efe44db35c3962445618d5a47c12e31b3925b
Reviewed-on: https://go-review.googlesource.com/92715
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-14 15:52:41 +00:00
Austin Clements
2010189407 runtime: remove legacy eager write barrier
Now that the buffered write barrier is implemented for all
architectures, we can remove the old eager write barrier
implementation. This CL removes the implementation from the runtime,
support in the compiler for calling it, and updates some compiler
tests that relied on the old eager barrier support. It also makes sure
that all of the useful comments from the old write barrier
implementation still have a place to live.

Fixes #22460.

Updates #21640 since this fixes the layering concerns of the write
barrier (but not the other things in that issue).

Change-Id: I580f93c152e89607e0a72fe43370237ba97bae74
Reviewed-on: https://go-review.googlesource.com/92705
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-13 16:34:46 +00:00
Keith Randall
23e8e197b0 cmd/compile: use unsigned loads for multi-element comparisons
When loading multiple elements of an array into a single register,
make sure we treat them as unsigned.  When treated as signed, the
upper bits might all be set, causing the shift-or combo to clobber
the values higher in the register.

Fixes #23719.

Change-Id: Ic87da03e9bd0fe2c60bb214b99f846e4e9446052
Reviewed-on: https://go-review.googlesource.com/92335
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-02-06 18:24:33 +00:00
Caleb Spare
67fdf587dc cmd/compile: don't combine 64-bit loads/stores on amd64
This causes a performance regression for some calls.

Fixes #23424.
Updates #6853.

Change-Id: Id1db652d5aca0ce631a3417c0c056d6637fefa9e
Reviewed-on: https://go-review.googlesource.com/88135
Run-TryBot: Caleb Spare <cespare@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-01-17 22:05:33 +00:00
Keith Randall
fa1f52c5f6 cmd/compile: always nil check before interface call
Fixes #22703

The fix was already done by Cherry for defer/go of an interface call (CL 23820).
We just need to do it everywhere.

Change-Id: I0115d22e443931fe1bcce44c93c4d0770b5fd268
Reviewed-on: https://go-review.googlesource.com/77450
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-11-14 05:39:45 +00:00
Alberto Donizetti
33a9f01729 cmd/compile: add mul by ±2ⁿ code-generation tests for arm/arm64
This change adds code generation tests for multiplication by ±2ⁿ for
arm and arm64, in preparation for a future CL which will remove the
relevant architecture-specific SSA rules (the reduction is already
performed by rules in generic.rules added in CL 36323).

Change-Id: Iebdd5c3bb2fc632c85888569ff0c49f78569a862
Reviewed-on: https://go-review.googlesource.com/75752
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-11-04 10:28:27 +00:00
Lynn Boger
bb1fd3b5ff cmd/compile: add rules to improve consecutive byte loads and stores on ppc64le
This adds new rules to recognize consecutive byte loads and
stores and lowers them to loads and stores such as lhz, lwz, ld,
sth, stw, std. This change only covers the little endian cases
on little endian machines, such as is found in encoding/binary
UintXX or PutUintXX for little endian. Big endian will be done
later.

Updates were also made to binary_test.go to allow the benchmark
for Uint and PutUint to actually use those functions because
the way they were written, those functions were being
optimized out.

Testcases were also added to cmd/compile/internal/gc/asm_test.go.

Updates #22496

The following improvement can be found in golang.org/x/crypto

poly1305:

Benchmark64-16              142           114           -19.72%
Benchmark1K-16              1717          1424          -17.06%
Benchmark64Unaligned-16     142           113           -20.42%
Benchmark1KUnaligned-16     1721          1428          -17.02%

chacha20poly1305:

BenchmarkChacha20Poly1305Open_64-16     1012       885   -12.55%
BenchmarkChacha20Poly1305Seal_64-16     971        836   -13.90%
BenchmarkChacha20Poly1305Open_1350-16   11113      9539  -14.16%
BenchmarkChacha20Poly1305Seal_1350-16   11013      9392  -14.72%
BenchmarkChacha20Poly1305Open_8K-16     61074      53431 -12.51%
BenchmarkChacha20Poly1305Seal_8K-16     61214      54806 -10.47%

Other improvements of around 10% found in crypto/tls.

Results after updating encoding/binary/binary_test.go:

BenchmarkLittleEndianPutUint64-16     1.87      0.93      -50.27%
BenchmarkLittleEndianPutUint32-16     1.19      0.93      -21.85%
BenchmarkLittleEndianPutUint16-16     1.16      1.03      -11.21%

Change-Id: I7bbe2fbcbd11362d58662fecd907a0c07e6ca2fb
Reviewed-on: https://go-review.googlesource.com/74410
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
2017-11-03 18:46:59 +00:00
Ilya Tocar
f3884680fc cmd/compile/internal/ssa: inline memmove with known size
Replace calls to memmove with known (constant) size, with OpMove.
Do it only if it is safe from aliasing point of view.
Helps with code like this:

append(buf,"const str"...)

In strconv this provides nice benefit:
Quote-6                                   731ns ± 2%   647ns ± 3%  -11.41%  (p=0.000 n=10+10)
QuoteRune-6                               117ns ± 5%   111ns ± 1%   -4.54%  (p=0.000 n=10+10)
AppendQuote-6                             475ns ± 0%   396ns ± 0%  -16.59%  (p=0.000 n=9+10)
AppendQuoteRune-6                        32.0ns ± 0%  27.4ns ± 0%  -14.41%  (p=0.000 n=8+9)

Change-Id: I7704f5c51b46aed2d8f033de74c75140fc35036c
Reviewed-on: https://go-review.googlesource.com/54394
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-11-02 20:30:25 +00:00
Michael Munday
4745604bcb cmd/compile: intrinsify math.RoundToEven on s390x
The new RoundToEven function can be implemented as a single FIDBR
instruction on s390x.

name         old time/op  new time/op  delta
RoundToEven  5.32ns ± 1%  0.86ns ± 1%  -83.86%  (p=0.000 n=10+10)

Change-Id: Iaf597e57a0d1085961701e3c75ff4f6f6dcebb5f
Reviewed-on: https://go-review.googlesource.com/74350
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-31 18:04:27 +00:00
Michael Munday
96cdacb971 cmd/asm, cmd/compile: optimize math.Abs and math.Copysign on s390x
This change adds three new instructions:

- LPDFR: load positive (math.Abs(x))
- LNDFR: load negative (-math.Abs(x))
- CPSDR: copy sign (math.Copysign(x, y))

By making use of GPR <-> FPR moves we can now compile math.Abs and
math.Copysign to these instructions using SSA rules.

This CL also adds new rules to merge address generation into combined
load operations. This makes GPR <-> FPR move matching more reliable.

name                 old time/op  new time/op  delta
Copysign             1.85ns ± 0%  1.40ns ± 1%  -24.65%  (p=0.000 n=8+10)
Abs                  1.58ns ± 1%  0.73ns ± 1%  -53.64%  (p=0.000 n=10+10)

The geo mean improvement for all math package benchmarks was 4.6%.

Change-Id: I0cec35c5c1b3fb45243bf666b56b57faca981bc9
Reviewed-on: https://go-review.googlesource.com/73950
Run-TryBot: Michael Munday <mike.munday@ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-30 23:42:51 +00:00
Austin Clements
7e343134d3 cmd/compile: compiler support for buffered write barrier
This CL implements the compiler support for calling the buffered write
barrier added by the previous CL.

Since the buffered write barrier is only implemented on amd64 right
now, this still supports the old, eager write barrier as well. There's
little overhead to supporting both and this way a few tests in
test/fixedbugs that expect to have liveness maps at write barrier
calls can easily opt-in to the old, eager barrier.

This significantly improves the performance of the write barrier:

name             old time/op  new time/op  delta
WriteBarrier-12  73.5ns ±20%  19.2ns ±27%  -73.90%  (p=0.000 n=19+18)

It also reduces the size of binaries because the write barrier call is
more compact:

name        old object-bytes  new object-bytes  delta
Template           398k ± 0%         393k ± 0%  -1.14%  (p=0.008 n=5+5)
Unicode            208k ± 0%         206k ± 0%  -1.00%  (p=0.008 n=5+5)
GoTypes           1.18M ± 0%        1.15M ± 0%  -2.00%  (p=0.008 n=5+5)
Compiler          4.05M ± 0%        3.88M ± 0%  -4.26%  (p=0.008 n=5+5)
SSA               8.25M ± 0%        8.11M ± 0%  -1.59%  (p=0.008 n=5+5)
Flate              228k ± 0%         224k ± 0%  -1.83%  (p=0.008 n=5+5)
GoParser           295k ± 0%         284k ± 0%  -3.62%  (p=0.008 n=5+5)
Reflect           1.00M ± 0%        0.99M ± 0%  -0.70%  (p=0.008 n=5+5)
Tar                339k ± 0%         333k ± 0%  -1.67%  (p=0.008 n=5+5)
XML                404k ± 0%         395k ± 0%  -2.10%  (p=0.008 n=5+5)
[Geo mean]         704k              690k       -2.00%

name        old exe-bytes     new exe-bytes     delta
HelloSize         1.05M ± 0%        1.04M ± 0%  -1.55%  (p=0.008 n=5+5)

https://perf.golang.org/search?q=upload:20171027.1

(Amusingly, this also reduces compiler allocations by 0.75%, which,
combined with the better write barrier, speeds up the compiler overall
by 2.10%. See the perf link.)

It slightly improves the performance of most of the go1 benchmarks and
improves the performance of the x/benchmarks:

name                      old time/op    new time/op    delta
BinaryTree17-12              2.40s ± 1%     2.47s ± 1%  +2.69%  (p=0.000 n=19+19)
Fannkuch11-12                2.95s ± 0%     2.95s ± 0%  +0.21%  (p=0.000 n=20+19)
FmtFprintfEmpty-12          41.8ns ± 4%    41.4ns ± 2%  -1.03%  (p=0.014 n=20+20)
FmtFprintfString-12         68.7ns ± 2%    67.5ns ± 1%  -1.75%  (p=0.000 n=20+17)
FmtFprintfInt-12            79.0ns ± 3%    77.1ns ± 1%  -2.40%  (p=0.000 n=19+17)
FmtFprintfIntInt-12          127ns ± 1%     123ns ± 3%  -3.42%  (p=0.000 n=20+20)
FmtFprintfPrefixedInt-12     152ns ± 1%     150ns ± 1%  -1.02%  (p=0.000 n=18+17)
FmtFprintfFloat-12           211ns ± 1%     209ns ± 0%  -0.99%  (p=0.000 n=20+16)
FmtManyArgs-12               500ns ± 0%     496ns ± 0%  -0.73%  (p=0.000 n=17+20)
GobDecode-12                6.44ms ± 1%    6.53ms ± 0%  +1.28%  (p=0.000 n=20+19)
GobEncode-12                5.46ms ± 0%    5.46ms ± 1%    ~     (p=0.550 n=19+20)
Gzip-12                      220ms ± 1%     216ms ± 0%  -1.75%  (p=0.000 n=19+19)
Gunzip-12                   38.8ms ± 0%    38.6ms ± 0%  -0.30%  (p=0.000 n=18+19)
HTTPClientServer-12         79.0µs ± 1%    78.2µs ± 1%  -1.01%  (p=0.000 n=20+20)
JSONEncode-12               11.9ms ± 0%    11.9ms ± 0%  -0.29%  (p=0.000 n=20+19)
JSONDecode-12               52.6ms ± 0%    52.2ms ± 0%  -0.68%  (p=0.000 n=19+20)
Mandelbrot200-12            3.69ms ± 0%    3.68ms ± 0%  -0.36%  (p=0.000 n=20+20)
GoParse-12                  3.13ms ± 1%    3.18ms ± 1%  +1.67%  (p=0.000 n=19+20)
RegexpMatchEasy0_32-12      73.2ns ± 1%    72.3ns ± 1%  -1.19%  (p=0.000 n=19+18)
RegexpMatchEasy0_1K-12       241ns ± 0%     239ns ± 0%  -0.83%  (p=0.000 n=17+16)
RegexpMatchEasy1_32-12      68.6ns ± 1%    69.0ns ± 1%  +0.47%  (p=0.015 n=18+16)
RegexpMatchEasy1_1K-12       364ns ± 0%     361ns ± 0%  -0.67%  (p=0.000 n=16+17)
RegexpMatchMedium_32-12      104ns ± 1%     103ns ± 1%  -0.79%  (p=0.001 n=20+15)
RegexpMatchMedium_1K-12     33.8µs ± 3%    34.0µs ± 2%    ~     (p=0.267 n=20+19)
RegexpMatchHard_32-12       1.64µs ± 1%    1.62µs ± 2%  -1.25%  (p=0.000 n=19+18)
RegexpMatchHard_1K-12       49.2µs ± 0%    48.7µs ± 1%  -0.93%  (p=0.000 n=19+18)
Revcomp-12                   391ms ± 5%     396ms ± 7%    ~     (p=0.154 n=19+19)
Template-12                 63.1ms ± 0%    59.5ms ± 0%  -5.76%  (p=0.000 n=18+19)
TimeParse-12                 307ns ± 0%     306ns ± 0%  -0.39%  (p=0.000 n=19+17)
TimeFormat-12                325ns ± 0%     323ns ± 0%  -0.50%  (p=0.000 n=19+19)
[Geo mean]                  47.3µs         46.9µs       -0.67%

https://perf.golang.org/search?q=upload:20171026.1

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.25ms ± 1%  2.20ms ± 1%  -2.31%  (p=0.000 n=18+18)
HTTP-12                    12.6µs ± 0%  12.6µs ± 0%  -0.72%  (p=0.000 n=18+17)
JSON-12                    11.0ms ± 0%  11.0ms ± 1%  -0.68%  (p=0.000 n=17+19)

https://perf.golang.org/search?q=upload:20171026.2

Updates #14951.
Updates #22460.

Change-Id: Id4c0932890a1d41020071bec73b8522b1367d3e7
Reviewed-on: https://go-review.googlesource.com/73712
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-30 18:12:46 +00:00
Lynn Boger
4d0151ede5 cmd/compile,cmd/internal/obj/ppc64: make math.Abs,math.Copysign instrinsics on ppc64x
This adds support for math Abs, Copysign to be instrinsics on ppc64x.

New instruction FCPSGN is added to generate fcpsgn. Some new
rules are added to improve the int<->float conversions that are
generated mainly due to the Float64bits and Float64frombits in
the math package. PPC64.rules is also modified as suggested
in the review for CL 63290.

Improvements:
benchmark                           old ns/op     new ns/op     delta
BenchmarkAbs-16                   1.12          0.69          -38.39%
BenchmarkCopysign-16              1.30          0.93          -28.46%
BenchmarkNextafter32-16           9.34          8.05          -13.81%
BenchmarkFrexp-16                 8.81          7.60          -13.73%

Others that used Copysign also saw smaller improvements.

I attempted to make this work using rules since that
seems to be preferred, but due to the use of Float64bits and
Float64frombits in these functions, several rules had to be added and
even then not all cases were matched. Using rules became too
complicated and seemed too fragile for these.

Updates #21390

Change-Id: Ia265da9a18355e08000818a4fba1a40e9e031995
Reviewed-on: https://go-review.googlesource.com/67130
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-30 13:56:39 +00:00
Hugues Bruant
3c46f49f94 cmd/compile: fix incorrect go:noinline usage
This pragma is not actually honored by the compiler.
The tests implicitly relied on the inliner being unable
to inline closures with captured variables, which will
soon change.

Fixes #22208

Change-Id: I13abc9c930b9156d43ec216f8efb768952a29439
Reviewed-on: https://go-review.googlesource.com/73211
Reviewed-by: Michael Munday <mike.munday@ibm.com>
2017-10-30 07:48:21 +00:00
Aliaksandr Valialkin
0011cfbe2b cmd/compile: optimize signed non-negative div/mod by a power of 2
This CL optimizes assembly for len() or cap() division
by a power of 2 constants:

    func lenDiv(s []int) int {
        return len(s) / 16
    }

amd64 assembly before the CL:

    MOVQ    "".s+16(SP), AX
    MOVQ    AX, CX
    SARQ    $63, AX
    SHRQ    $60, AX
    ADDQ    CX, AX
    SARQ    $4, AX
    MOVQ    AX, "".~r1+32(SP)
    RET

amd64 assembly after the CL:

    MOVQ    "".s+16(SP), AX
    SHRQ    $4, AX
    MOVQ    AX, "".~r1+32(SP)
    RET

The CL relies on the fact that len() and cap() result cannot
be negative.

Trigger stats for the added SSA rules on linux/amd64 when running
make.bash:

     46 Div64
     12 Mod64

The added SSA rules may trigger on more cases in the future
when SSA values will be populated with the info on their
lower bounds.

For instance:

    func f(i int16) int16 {
        if i < 3 {
            return -1
        }

        // Lower bound of i is 3 here -> i is non-negative,
        // so unsigned arithmetics may be used here.
        return i % 16
    }

Change-Id: I8bc6be5a03e71157ced533c01416451ff6f1a7f0
Reviewed-on: https://go-review.googlesource.com/65530
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-06 15:15:39 +00:00
Alberto Donizetti
03614562ca cmd/compile: remove x86 arch-specific rules for +2ⁿ multiplication
amd64 and 386 have rules to reduce multiplication by a positive power
of two, but a more general reduction (both for positive and negative
powers of two) is already performed by generic rules that were added
in CL 36323 to replace walkmul (see lines 166:173 in generic.rules).

The x86 and amd64 rules are never triggered during all.bash and can be
removed, reducing rules duplication.

The change also adds a few code generation tests for amd64 and 386.

Change-Id: I566d48186643bd722a4c0137fe94e513b8b20e36
Reviewed-on: https://go-review.googlesource.com/68450
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-06 09:30:57 +00:00
Ilya Tocar
6b8a3c8889 cmd/compile/internal/amd64: add SETccmem
Combine setcc and store of result into setcc that writes directly to memory.
Triggers 200+ times in go tool.

Fixes #21630

Change-Id: Iafa22607426f4120140c88fae4b9aecb46e0bba8
Reviewed-on: https://go-review.googlesource.com/67950
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-05 20:53:28 +00:00
Michael Munday
7582494e06 cmd/compile: add s390x intrinsics for Ceil, Floor, Round and Trunc
Ceil, Floor and Trunc are pre-existing intrinsics. Round is a new
function and has been added as an intrinsic in this CL. All of the
functions can be implemented as a single 'LOAD FP INTEGER'
instruction, FIDBR, on s390x.

name   old time/op  new time/op  delta
Ceil   2.34ns ± 0%  0.85ns ± 0%  -63.74%  (p=0.000 n=5+4)
Floor  2.33ns ± 0%  0.85ns ± 1%  -63.35%  (p=0.008 n=5+5)
Round  4.23ns ± 0%  0.85ns ± 0%  -79.89%  (p=0.000 n=5+4)
Trunc  2.35ns ± 0%  0.85ns ± 0%  -63.83%  (p=0.029 n=4+4)

Change-Id: Idee7ba24a2899d12bf9afee4eedd6b4aaad3c510
Reviewed-on: https://go-review.googlesource.com/63890
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-09-20 10:01:35 +00:00
Michael Munday
95b146e8eb cmd/compile: improve floating point constant propagation
Add generic rules to propagate floating point constants through
comparisons and integer conversions. These new rules seldom trigger
in the standard library so there is no performance change, however
I think it is worth adding them anyway for completeness.

Change-Id: I9db5222746508a2996f1cafb72f4e0cf2541de07
Reviewed-on: https://go-review.googlesource.com/63795
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-09-14 23:08:33 +00:00
Lynn Boger
fa3fe2e3c6 cmd/compile, math/bits: add rotate rules to PPC64.rules
This adds rules to match the code in math/bits RotateLeft,
RotateLeft32, and RotateLef64 to allow them to be inlined.

The rules are complicated because the code in these function
use different types, and the non-const version of these
shifts generate Mask and Carry instructions that become
subexpressions during the match process.

Also adds a testcase to asm_test.go.

Improvement in math/bits:

BenchmarkRotateLeft-16       1.57     1.32      -15.92%
BenchmarkRotateLeft32-16     1.60     1.37      -14.37%
BenchmarkRotateLeft64-16     1.57     1.32      -15.92%

Updates #21390

Change-Id: Ib6f17669ecc9cab54f18d690be27e2225ca654a4
Reviewed-on: https://go-review.googlesource.com/59932
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-09-11 20:44:22 +00:00
Michael Munday
9da29b687f cmd/compile: propagate constants through math.Float{32,64}{,from}bits
This CL adds generic SSA rules to propagate constants through raw bits
conversions between floats and integers. This allows constants to
propagate through some math functions. For example, math.Copysign(0, -1)
is now constant folded to a load of -0.0.

Requires a fix to the ARM assembler which loaded -0.0 as +0.0.

Change-Id: I52649a4691077c7414f19d17bb599a6743c23ac2
Reviewed-on: https://go-review.googlesource.com/62250
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-09-08 17:24:03 +00:00
Keith Randall
aed1c119fd cmd/compile: fix assembly test
Bad merge, missed changing to keyed literal structs.

Bug introduced in CL 56252

Change-Id: I55cccff4990bd25e6387f6c90919ee5866900d7f
Reviewed-on: https://go-review.googlesource.com/61290
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-03 16:24:24 +00:00
Cholerae Hu
fb165eaffd cmd/compile: combine x*n - y*n into (x-y)*n
Do the similar thing to CL 55143 to reduce IMUL.

Change-Id: I1bd38f618058e3cd74fac181f003610ea13f2294
Reviewed-on: https://go-review.googlesource.com/56252
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-09-03 14:29:38 +00:00
Cherry Zhang
7846500a5a cmd/compile: remove redundant constant shift rules
Normal shift rules plus constant folding are enough to generate
efficient shift-by-constant instructions.

Add test to make sure we don't generate comparisons for constant
shifts.

TODO: there are still constant shift rules on PPC64. If they
are removed, the constant folding rules are not enough to remove
all the test and mask stuff for constant shifts. Leave them in
for now.

Fixes #20663.

Change-Id: I724cc324aa8607762d0c8aacf9bfa641bda5c2a1
Reviewed-on: https://go-review.googlesource.com/60330
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-08-31 02:08:48 +00:00
Keith Randall
2b079c3c04 cmd/compile: use keyed struct for asm tests
Just to make it clearer which regexps are positive and which
regexps are negative.

Change-Id: Ia190e89be28048fcae2491506f552afad90a5f85
Reviewed-on: https://go-review.googlesource.com/59490
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-28 17:34:25 +00:00
David du Colombier
adbfdfe377 cmd/compile: don't use MOVOstore for move on plan9/amd64
The SSA compiler currently generates MOVOstore instructions
to optimize 16 bytes moves on AMD64 architecture.

However, we can't use the MOVOstore instruction on Plan 9,
because floating point operations are not allowed in the
note handler.

We rely on the useSSE flag to disable the use of the
MOVOstore instruction on Plan 9 and replace it by two
MOVQstore instructions.

Fixes #21625

Change-Id: Idfefcceadccafe1752b059b5fe113ce566c0e71c
Reviewed-on: https://go-review.googlesource.com/59171
Run-TryBot: David du Colombier <0intro@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2017-08-28 16:21:28 +00:00
Ilya Tocar
9c99512d18 cmd/compile/internal/ssa: combine consecutive loads and stores on amd64
Sometimes (often for calls) we generate code like this:

MOVQ  (addr),AX
MOVQ  8(addr),BX
MOVQ  AX,(otheraddr)
MOVQ  BX,8(otheraddr)

Replace it with

MOVUPS (addr),X0
MOVUPS X0,(otheraddr)

For completeness do the same for 8,16,32-bit loads/stores too.
Shaves 1% from code sections of go tool.

/localdisk/itocar/golang/bin/go 10293917
go_old 10334877 [40960 bytes]

read-only data = 682 bytes (0.040769%)
global text (code) = 38961 bytes (1.036503%)
Total difference 39643 bytes (0.674628%)

Updates #6853

Change-Id: I1f0d2f60273a63a079b58927cd1c4e3429d2e7ae
Reviewed-on: https://go-review.googlesource.com/57130
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-08-25 20:05:17 +00:00
Keith Randall
fb05948d9e cmd/compile,math: improve code generation for math.Abs
Implement int reg <-> fp reg moves on amd64.
If we see a load to int reg followed by an int->fp move, then we can just
load to the fp reg instead.  Same for stores.

math.Abs is now:

MOVQ	"".x+8(SP), AX
SHLQ	$1, AX
SHRQ	$1, AX
MOVQ	AX, "".~r1+16(SP)

math.Copysign is now:

MOVQ	"".x+8(SP), AX
SHLQ	$1, AX
SHRQ	$1, AX
MOVQ	"".y+16(SP), CX
SHRQ	$63, CX
SHLQ	$63, CX
ORQ	CX, AX
MOVQ	AX, "".~r2+24(SP)

math.Float64bits is now:

MOVSD	"".x+8(SP), X0
MOVSD	X0, "".~r1+16(SP)
(it would be nicer to use a non-SSE reg for this, nothing is perfect)

And due to the fix for #21440, the inlined version of these improve as well.

name      old time/op  new time/op  delta
Abs       1.38ns ± 5%  0.89ns ±10%  -35.54%  (p=0.000 n=10+10)
Copysign  1.56ns ± 7%  1.35ns ± 6%  -13.77%  (p=0.000 n=9+10)

Fixes #13095

Change-Id: Ibd7f2792412a6668608780b0688a77062e1f1499
Reviewed-on: https://go-review.googlesource.com/58732
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2017-08-25 19:15:01 +00:00
Michael Munday
744ebfde04 cmd/compile: eliminate stores to unread auto variables
This is a crude compiler pass to eliminate stores to auto variables
that are only ever written to.

Eliminates an unnecessary store to x from the following code:

func f() int {
	var x := 1
	return *(&x)
}

Fixes #19765.

Change-Id: If2c63a8ae67b8c590b6e0cc98a9610939a3eeffa
Reviewed-on: https://go-review.googlesource.com/38746
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-08-24 16:53:56 +00:00
Alberto Donizetti
8bca7ef607 cmd/compile: support placeholder name '$' in code generation tests
This change adds to the code-generation harness in asm_test.go support
for the use of a '$' placeholder name for test functions.

A few of uninformative function names are also changed to use the
placeholder, to confirm that the change works as expected.

Fixes #21500

Change-Id: Iba168bd85efc9822253305d003b06682cf8a6c5c
Reviewed-on: https://go-review.googlesource.com/57292
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-08-22 19:42:32 +00:00
Ilya Tocar
da34ddf24b cmd/compile/internal/ssa: combine more const stores
We already combine const stores up-to MOVQstoreconst.
Combine 2 64-bit stores of const zero into 1 sse store of 128-bit zero.

Shaves significant (>1%) amount of code from go tool:
/localdisk/itocar/golang/bin/go 10334877
go_old 10388125 [53248 bytes]

global text (code) = 51041 bytes (1.343944%)
read-only data = 663 bytes (0.039617%)
Total difference 51704 bytes (0.873981%)

Change-Id: I7bc40968023c3a69f379b10fbb433cdb11364f1b
Reviewed-on: https://go-review.googlesource.com/56250
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-08-17 17:40:40 +00:00
Alberto Donizetti
a0453a180f cmd/compile: combine x*n + y*n into (x+y)*n
There are a few cases where this can be useful. Apart from the obvious
(and silly)

  100*n + 200*n

where we generate one IMUL instead of two, consider:

  15*n + 31*n

Currently, the compiler strength-reduces both imuls, generating:

    0x0000 00000	MOVQ	"".n+8(SP), AX
	0x0005 00005 	MOVQ	AX, CX
	0x0008 00008 	SHLQ	$4, AX
	0x000c 00012 	SUBQ	CX, AX
	0x000f 00015 	MOVQ	CX, DX
	0x0012 00018 	SHLQ	$5, CX
	0x0016 00022 	SUBQ	DX, CX
	0x0019 00025 	ADDQ	CX, AX
	0x001c 00028 	MOVQ	AX, "".~r1+16(SP)
	0x0021 00033 	RET

But combining the imuls is both faster and shorter:

	0x0000 00000	MOVQ	"".n+8(SP), AX
	0x0005 00005 	IMULQ	$46, AX
	0x0009 00009	MOVQ	AX, "".~r1+16(SP)
	0x000e 00014 	RET

even without strength-reduction.

Moreover, consider:

  5*n + 7*(n+1) + 11*(n+2)

We already have a rule that rewrites 7(n+1) into 7n+7, so the
generated code (without imuls merging) looks like this:

	0x0000 00000 	MOVQ	"".n+8(SP), AX
	0x0005 00005 	LEAQ	(AX)(AX*4), CX
	0x0009 00009 	MOVQ	AX, DX
	0x000c 00012 	NEGQ	AX
	0x000f 00015 	LEAQ	(AX)(DX*8), AX
	0x0013 00019 	ADDQ	CX, AX
	0x0016 00022 	LEAQ	(DX)(CX*2), CX
	0x001a 00026 	LEAQ	29(AX)(CX*1), AX
	0x001f 00031 	MOVQ	AX, "".~r1+16(SP)

But with imuls merging, the 5n, 7n and 11n factors get merged, and the
generated code looks like this:

	0x0000 00000 	MOVQ	"".n+8(SP), AX
	0x0005 00005 	IMULQ	$23, AX
	0x0009 00009 	ADDQ	$29, AX
	0x000d 00013 	MOVQ	AX, "".~r1+16(SP)
	0x0012 00018 	RET

Which is both faster and shorter; that's also the exact same code that
clang and the intel c compiler generate for the above expression.

Change-Id: Ib4d5503f05d2f2efe31a1be14e2fe6cac33730a9
Reviewed-on: https://go-review.googlesource.com/55143
Reviewed-by: Keith Randall <khr@golang.org>
2017-08-16 16:51:59 +00:00
Cherry Zhang
f20944de78 cmd/compile: set/unset base register for better assembly print
For address of an auto or arg, on all non-x86 architectures
the assembler backend encodes the actual SP offset in the
instruction but leaves the offset in Prog unchanged. When the
assembly is printed in compile -S, it shows an offset
relative to pseudo FP/SP with an actual hardware SP base
register (e.g. R13 on ARM). This is confusing. Unset the
base register if it is indeed SP, so the assembly output is
consistent. If the base register isn't SP, it should be an
error and the error output contains the actual base register.

For address loading instructions, the base register isn't set
in the compiler on non-x86 architectures. Set it. Normally it
is SP and will be unset in the change mentioned above for
printing. If it is not, it will be an error and the error
output contains the actual base register.

No change in generated binary, only printed assembly. Passes
"go build -a -toolexec 'toolstash -cmp' std cmd" on all
architectures.

Fixes #21064.

Change-Id: Ifafe8d5f9b437efbe824b63b3cbc2f5f6cdc1fd5
Reviewed-on: https://go-review.googlesource.com/49432
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-08-02 12:24:02 +00:00
Ilya Tocar
3bdc2f3abf cmd/compile/internal/gc: speed-up small array comparison
Currently we inline array comparisons for arrays with at most 4 elements.
Compare arrays with small size, but more than 4 elements (e. g. [16]byte)
with larger compares. This provides very slightly smaller binaries,
and results in faster code.

ArrayEqual-6  7.41ns ± 0%  3.17ns ± 0%  -57.15%  (p=0.000 n=10+10)

For go tool:
global text (code) = -559 bytes (-0.014566%)

This also helps mapaccess1_faststr, and maps in general:

MapDelete/Str/1-6               195ns ± 1%     186ns ± 2%   -4.47%  (p=0.000 n=10+10)
MapDelete/Str/2-6               211ns ± 1%     177ns ± 1%  -16.01%  (p=0.000 n=10+10)
MapDelete/Str/4-6               225ns ± 1%     183ns ± 1%  -18.49%  (p=0.000 n=8+10)
MapStringKeysEight_16-6        31.3ns ± 0%    28.6ns ± 0%   -8.63%  (p=0.000 n=6+9)
MapStringKeysEight_32-6        29.2ns ± 0%    27.6ns ± 0%   -5.45%  (p=0.000 n=10+10)
MapStringKeysEight_64-6        29.1ns ± 1%    27.5ns ± 0%   -5.46%  (p=0.000 n=10+10)
MapStringKeysEight_1M-6        29.1ns ± 1%    27.6ns ± 0%   -5.49%  (p=0.000 n=10+10)

Change-Id: I9ec98e41b233031e0e96c4e13d86a324f628ed4a
Reviewed-on: https://go-review.googlesource.com/40771
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-06-01 15:46:16 +00:00
Josh Bleecher Snyder
ee69c21747 cmd/compile: don't use statictmps for SSA-able composite literals
The writebarrier test has to change.
Now that T23 composite literals are passed to the backend,
they get SSA'd, so writes to their fields are treated separately,
so the relevant part of the first write to t23 is now a dead store.
Preserve the intent of the test by splitting it up into two functions.

Reduces code size a bit:

name        old object-bytes  new object-bytes  delta
Template           386k ± 0%         386k ± 0%    ~     (all equal)
Unicode            202k ± 0%         202k ± 0%    ~     (all equal)
GoTypes           1.16M ± 0%        1.16M ± 0%    ~     (all equal)
Compiler          3.92M ± 0%        3.91M ± 0%  -0.19%  (p=0.008 n=5+5)
SSA               7.91M ± 0%        7.91M ± 0%    ~     (all equal)
Flate              228k ± 0%         228k ± 0%  -0.05%  (p=0.008 n=5+5)
GoParser           283k ± 0%         283k ± 0%    ~     (all equal)
Reflect            952k ± 0%         952k ± 0%  -0.06%  (p=0.008 n=5+5)
Tar                188k ± 0%         188k ± 0%  -0.09%  (p=0.008 n=5+5)
XML                406k ± 0%         406k ± 0%  -0.02%  (p=0.008 n=5+5)
[Geo mean]         649k              648k       -0.04%

Fixes #18872

Change-Id: Ifeed0f71f13849732999aa731cc2bf40c0f0e32a
Reviewed-on: https://go-review.googlesource.com/43154
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-05-11 18:28:40 +00:00
Cherry Zhang
fb0ccc5d0a cmd/internal/obj/arm64, cmd/compile: improve offset folding on ARM64
ARM64 assembler backend only accepts loads and stores with small
or aligned offset. The compiler therefore can only fold small or
aligned offsets into loads and stores. For locals and args, their
offsets to SP are not known until very late, and the compiler
makes conservative decision not folding some of them. However,
in most cases, the offset is indeed small or aligned, and can
be folded into load and store (but actually not).

This CL adds support of loads and stores with large and unaligned
offsets. When the offset doesn't fit into the instruction, it
uses two instructions and (for very large offset) the constant
pool. This way, the compiler doesn't need to be conservative,
and can simply fold the offset.

To make it work, the assembler's optab matching rules need to be
changed. Before, MOVD accepts C_UAUTO32K which matches multiple
of 8 between 0 and 32K, and also C_UAUTO16K, which may not be
multiple of 8 and does not fit into MOVD instruction. The
assembler errors in the latter case. This change makes it only
matches multiple of 8 (or offsets within ±256, which also fits
in instruction), and uses the large-or-unaligned-offset rule
for things doesn't fit (without error). Other sized move rules
are changed similarly.

Class C_UAUTO64K and C_UOREG64K are removed, as they are never
used.

In shared library, load/store of global is rewritten to using
GOT and temp register, which conflicts with the use of temp
register for assembling large offset. So the folding is disabled
for globals in shared library mode.

Reduce cmd/go binary size by 2%.

name                     old time/op    new time/op    delta
BinaryTree17-8              8.67s ± 0%     8.61s ± 0%   -0.60%  (p=0.000 n=9+10)
Fannkuch11-8                6.24s ± 0%     6.19s ± 0%   -0.83%  (p=0.000 n=10+9)
FmtFprintfEmpty-8           116ns ± 0%     116ns ± 0%     ~     (all equal)
FmtFprintfString-8          196ns ± 0%     192ns ± 0%   -1.89%  (p=0.000 n=10+10)
FmtFprintfInt-8             199ns ± 0%     198ns ± 0%   -0.35%  (p=0.001 n=9+10)
FmtFprintfIntInt-8          294ns ± 0%     293ns ± 0%   -0.34%  (p=0.000 n=8+8)
FmtFprintfPrefixedInt-8     318ns ± 1%     318ns ± 1%     ~     (p=1.000 n=10+10)
FmtFprintfFloat-8           537ns ± 0%     531ns ± 0%   -1.17%  (p=0.000 n=9+10)
FmtManyArgs-8              1.19µs ± 1%    1.18µs ± 1%   -1.41%  (p=0.001 n=10+10)
GobDecode-8                17.2ms ± 1%    17.3ms ± 2%     ~     (p=0.165 n=10+10)
GobEncode-8                14.7ms ± 1%    14.7ms ± 2%     ~     (p=0.631 n=10+10)
Gzip-8                      837ms ± 0%     836ms ± 0%   -0.14%  (p=0.006 n=9+10)
Gunzip-8                    141ms ± 0%     139ms ± 0%   -1.24%  (p=0.000 n=9+10)
HTTPClientServer-8          256µs ± 1%     253µs ± 1%   -1.35%  (p=0.000 n=10+10)
JSONEncode-8               40.1ms ± 1%    41.3ms ± 1%   +3.06%  (p=0.000 n=10+9)
JSONDecode-8                157ms ± 1%     156ms ± 1%   -0.83%  (p=0.001 n=9+8)
Mandelbrot200-8            8.94ms ± 0%    8.94ms ± 0%   +0.02%  (p=0.000 n=9+9)
GoParse-8                  8.69ms ± 0%    8.54ms ± 1%   -1.69%  (p=0.000 n=8+10)
RegexpMatchEasy0_32-8       227ns ± 1%     228ns ± 1%   +0.48%  (p=0.016 n=10+9)
RegexpMatchEasy0_1K-8      1.92µs ± 0%    1.63µs ± 0%  -15.08%  (p=0.000 n=10+9)
RegexpMatchEasy1_32-8       256ns ± 0%     251ns ± 0%   -2.19%  (p=0.000 n=10+9)
RegexpMatchEasy1_1K-8      2.38µs ± 0%    2.09µs ± 0%  -12.49%  (p=0.000 n=10+9)
RegexpMatchMedium_32-8      352ns ± 0%     354ns ± 0%   +0.39%  (p=0.002 n=10+9)
RegexpMatchMedium_1K-8      106µs ± 0%     106µs ± 0%   -0.05%  (p=0.005 n=10+9)
RegexpMatchHard_32-8       5.92µs ± 0%    5.89µs ± 0%   -0.40%  (p=0.000 n=9+8)
RegexpMatchHard_1K-8        180µs ± 0%     179µs ± 0%   -0.14%  (p=0.000 n=10+9)
Revcomp-8                   1.20s ± 0%     1.13s ± 0%   -6.29%  (p=0.000 n=9+8)
Template-8                  159ms ± 1%     154ms ± 1%   -3.14%  (p=0.000 n=9+10)
TimeParse-8                 800ns ± 3%     769ns ± 1%   -3.91%  (p=0.000 n=10+10)
TimeFormat-8                826ns ± 2%     817ns ± 2%   -1.04%  (p=0.050 n=10+10)
[Geo mean]                  145µs          143µs        -1.79%

Change-Id: I5fc42087cee9b54ea414f8ef6d6d020b80eb5985
Reviewed-on: https://go-review.googlesource.com/42172
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2017-05-09 19:41:00 +00:00
Martin Möhrmann
f9bec9eb42 cmd/compile: use MOVL instead of MOVQ for small constants on amd64
The encoding of MOVL to a register is 2 bytes shorter than for MOVQ.
The upper 32bit are automatically zeroed when MOVL to a register is used.

Replaces 1657 MOVQ by MOVL in the go binary.
Reduces go binary size by 4 kilobyte.

name                   old time/op    new time/op    delta
BinaryTree17              1.93s ± 0%     1.93s ± 0%  -0.32%  (p=0.000 n=9+9)
Fannkuch11                2.66s ± 0%     2.48s ± 0%  -6.60%  (p=0.000 n=9+9)
FmtFprintfEmpty          31.8ns ± 0%    31.6ns ± 0%  -0.63%  (p=0.000 n=10+10)
FmtFprintfString         52.0ns ± 0%    51.9ns ± 0%  -0.19%  (p=0.000 n=10+10)
FmtFprintfInt            55.6ns ± 0%    54.6ns ± 0%  -1.80%  (p=0.002 n=8+10)
FmtFprintfIntInt         87.7ns ± 0%    84.8ns ± 0%  -3.31%  (p=0.000 n=9+9)
FmtFprintfPrefixedInt    98.9ns ± 0%   102.0ns ± 0%  +3.10%  (p=0.000 n=10+10)
FmtFprintfFloat           165ns ± 0%     164ns ± 0%  -0.61%  (p=0.000 n=10+10)
FmtManyArgs               368ns ± 0%     361ns ± 0%  -1.98%  (p=0.000 n=8+10)
GobDecode                4.53ms ± 0%    4.58ms ± 0%  +1.08%  (p=0.000 n=9+10)
GobEncode                3.74ms ± 0%    3.73ms ± 0%  -0.27%  (p=0.000 n=10+10)
Gzip                      164ms ± 0%     163ms ± 0%  -0.48%  (p=0.000 n=10+10)
Gunzip                   26.7ms ± 0%    26.6ms ± 0%  -0.13%  (p=0.000 n=9+10)
HTTPClientServer         30.4µs ± 1%    30.3µs ± 1%  -0.41%  (p=0.016 n=10+10)
JSONEncode               10.9ms ± 0%    11.0ms ± 0%  +0.70%  (p=0.000 n=10+10)
JSONDecode               36.8ms ± 0%    37.0ms ± 0%  +0.59%  (p=0.000 n=9+10)
Mandelbrot200            3.20ms ± 0%    3.21ms ± 0%  +0.44%  (p=0.000 n=9+10)
GoParse                  2.35ms ± 0%    2.35ms ± 0%  +0.26%  (p=0.000 n=10+9)
RegexpMatchEasy0_32      58.3ns ± 0%    58.4ns ± 0%  +0.17%  (p=0.000 n=10+10)
RegexpMatchEasy0_1K       138ns ± 0%     142ns ± 0%  +2.68%  (p=0.000 n=10+10)
RegexpMatchEasy1_32      55.1ns ± 0%    55.6ns ± 1%    ~     (p=0.104 n=10+10)
RegexpMatchEasy1_1K       242ns ± 0%     243ns ± 0%  +0.41%  (p=0.000 n=10+10)
RegexpMatchMedium_32     87.4ns ± 0%    89.9ns ± 0%  +2.86%  (p=0.000 n=10+10)
RegexpMatchMedium_1K     27.4µs ± 0%    27.4µs ± 0%  +0.15%  (p=0.000 n=10+10)
RegexpMatchHard_32       1.30µs ± 0%    1.32µs ± 1%  +1.91%  (p=0.000 n=10+10)
RegexpMatchHard_1K       39.0µs ± 0%    39.5µs ± 0%  +1.38%  (p=0.000 n=10+10)
Revcomp                   316ms ± 0%     319ms ± 0%  +1.13%  (p=0.000 n=9+8)
Template                 40.6ms ± 0%    40.6ms ± 0%    ~     (p=0.123 n=10+10)
TimeParse                 224ns ± 0%     224ns ± 0%    ~     (all equal)
TimeFormat                230ns ± 0%     225ns ± 0%  -2.17%  (p=0.000 n=10+10)

Change-Id: I32a099b65f9e6d4ad7288ed48546655c534757d8
Reviewed-on: https://go-review.googlesource.com/38630
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-01 20:59:58 +00:00
Lynn Boger
9248ff46a8 cmd/compile: add rotates to PPC64.rules
This updates PPC64.rules to include rules to generate rotates
for ADD, OR, XOR operators that combine two opposite shifts
that sum to 32 or 64.

To support this change opcodes for ROTL and ROTLW were added to
be used like the rotldi and rotlwi extended mnemonics.

This provides the following improvement in sha3:

BenchmarkPermutationFunction-8     302.83       376.40       1.24x
BenchmarkSha3_512_MTU-8            98.64        121.92       1.24x
BenchmarkSha3_384_MTU-8            136.80       168.30       1.23x
BenchmarkSha3_256_MTU-8            169.21       211.29       1.25x
BenchmarkSha3_224_MTU-8            179.76       221.19       1.23x
BenchmarkShake128_MTU-8            212.87       263.23       1.24x
BenchmarkShake256_MTU-8            196.62       245.60       1.25x
BenchmarkShake256_16x-8            163.57       194.37       1.19x
BenchmarkShake256_1MiB-8           199.02       248.74       1.25x
BenchmarkSha3_512_1MiB-8           106.55       133.13       1.25x

Fixes #20030

Change-Id: I484c56f48395d32f53ff3ecb3ac6cb8191cfee44
Reviewed-on: https://go-review.googlesource.com/40992
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-20 18:05:22 +00:00
Keith Randall
7e07e635f3 cmd/compile: implement non-constant rotates
Makes math/bits.Rotate{Left,Right} fast on amd64.

name              old time/op  new time/op  delta
RotateLeft-12     7.42ns ± 6%  5.45ns ± 6%  -26.54%   (p=0.000 n=9+10)
RotateLeft8-12    4.77ns ± 5%  3.42ns ± 7%  -28.25%   (p=0.000 n=8+10)
RotateLeft16-12   4.82ns ± 8%  3.40ns ± 7%  -29.36%  (p=0.000 n=10+10)
RotateLeft32-12   4.87ns ± 7%  3.48ns ± 7%  -28.51%    (p=0.000 n=8+9)
RotateLeft64-12   5.23ns ±10%  3.35ns ± 6%  -35.97%   (p=0.000 n=9+10)
RotateRight-12    7.59ns ± 8%  5.71ns ± 1%  -24.72%   (p=0.000 n=10+8)
RotateRight8-12   4.98ns ± 7%  3.36ns ± 9%  -32.55%  (p=0.000 n=10+10)
RotateRight16-12  5.12ns ± 2%  3.45ns ± 5%  -32.62%  (p=0.000 n=10+10)
RotateRight32-12  4.80ns ± 6%  3.42ns ±16%  -28.68%  (p=0.000 n=10+10)
RotateRight64-12  4.78ns ± 6%  3.42ns ± 6%  -28.50%  (p=0.000 n=10+10)

Update #18940

Change-Id: Ie79fb5581c489ed4d3b859314c5e669a134c119b
Reviewed-on: https://go-review.googlesource.com/39711
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-04-17 23:19:45 +00:00
Josh Bleecher Snyder
3d0a898385 cmd/compile: improve output when TestAssembly build fails
Change-Id: Ibee84399d81463d3e7d5319626bb0d6b60b86bd9
Reviewed-on: https://go-review.googlesource.com/40861
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-17 03:12:34 +00:00
Josh Bleecher Snyder
0d36999a0f cmd/compile: make TestAssembly resilient to output ordering
To preserve reproducible builds, the text entries
during compilation will be sorted before being printed.
TestAssembly currently assumes that function init
comes after all user-defined functions.
Remove that assumption.
Instead of looking for "TEXT" to tell you where
a function ends--which may now yield lots of
non-function-code junk--look for a line beginning
with non-whitespace.

Updates #15756

Change-Id: Ibc82dba6143d769ef4c391afc360e523b1a51348
Reviewed-on: https://go-review.googlesource.com/39853
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-13 02:30:29 +00:00
Ilya Tocar
e4a500ce14 cmd/compile/internal/gc: improve comparison with constant strings
Currently we expand comparison with small constant strings into len check
and a sequence of byte comparisons. Generate 16/32/64-bit comparisons,
instead of bytewise on 386 and amd64. Also increase limits on what is
considered small constant string.
Shaves ~30kb (0.5%) from go executable.

This also updates test/prove.go to keep test case valid.

Change-Id: I99ae8871a1d00c96363c6d03d0b890782fa7e1d9
Reviewed-on: https://go-review.googlesource.com/38776
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-04-07 15:40:25 +00:00
Cherry Zhang
257b01f8f4 cmd/compile: use ANDconst to mask out leading/trailing bits on ARM64
For an AND that masks out leading or trailing bits, generic rules
rewrite it to a pair of shifts. On ARM64, the mask actually can
fit into an AND instruction. So we rewrite it back to AND.

Fixes #19857.

Change-Id: I479d7320ae4f29bb3f0056d5979bde4478063a8f
Reviewed-on: https://go-review.googlesource.com/39651
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-04-06 17:59:32 +00:00
Keith Randall
5cadc91b3c cmd/compile: intrinsics for math/bits.OnesCount
Popcount instructions on amd64 are not guaranteed to be
present, so we must guard their call.  Rewrite rules can't
generate control flow at the moment, so the intrinsifier
needs to generate that code.

name           old time/op  new time/op  delta
OnesCount-8    2.47ns ± 5%  1.04ns ± 2%  -57.70%  (p=0.000 n=10+10)
OnesCount16-8  1.05ns ± 1%  0.78ns ± 0%  -25.56%    (p=0.000 n=9+8)
OnesCount32-8  1.63ns ± 5%  1.04ns ± 2%  -35.96%  (p=0.000 n=10+10)
OnesCount64-8  2.45ns ± 0%  1.04ns ± 1%  -57.55%   (p=0.000 n=6+10)

Update #18616

Change-Id: I4aff2cc9aa93787898d7b22055fe272a7cf95673
Reviewed-on: https://go-review.googlesource.com/38320
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-04-04 02:40:11 +00:00
Keith Randall
63a72fd447 cmd/compile: strength-reduce floating point
x*2 -> x+x
x/c, c power of 2 -> x*(1/c)

Fixes #19827

Change-Id: I74c9f0b5b49b2ed26c0990314c7d1d5f9631b6f1
Reviewed-on: https://go-review.googlesource.com/39295
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-04-03 21:27:03 +00:00
Keith Randall
86dc86b4f9 cmd/compile: don't merge load+op if other op arg is still live
We want to merge a load and op into a single instruction

    l = LOAD ptr mem
    y = OP x l

into

    y = OPload x ptr mem

However, all of our OPload instructions require that y uses
the same register as x. If x is needed past this instruction, then
we must copy x somewhere else, losing the whole benefit of merging
the instructions in the first place.

Disable this optimization if x is live past the OP.

Also disable this optimization if the OP is in a deeper loop than the load.

Update #19595

Change-Id: I87f596aad7e91c9127bfb4705cbae47106e1e77a
Reviewed-on: https://go-review.googlesource.com/38337
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2017-03-23 15:53:04 +00:00