Commit graph

47 commits

Author SHA1 Message Date
Michael Munday
744ebfde04 cmd/compile: eliminate stores to unread auto variables
This is a crude compiler pass to eliminate stores to auto variables
that are only ever written to.

Eliminates an unnecessary store to x from the following code:

func f() int {
	var x := 1
	return *(&x)
}

Fixes #19765.

Change-Id: If2c63a8ae67b8c590b6e0cc98a9610939a3eeffa
Reviewed-on: https://go-review.googlesource.com/38746
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-08-24 16:53:56 +00:00
Alberto Donizetti
8bca7ef607 cmd/compile: support placeholder name '$' in code generation tests
This change adds to the code-generation harness in asm_test.go support
for the use of a '$' placeholder name for test functions.

A few of uninformative function names are also changed to use the
placeholder, to confirm that the change works as expected.

Fixes #21500

Change-Id: Iba168bd85efc9822253305d003b06682cf8a6c5c
Reviewed-on: https://go-review.googlesource.com/57292
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-08-22 19:42:32 +00:00
Ilya Tocar
da34ddf24b cmd/compile/internal/ssa: combine more const stores
We already combine const stores up-to MOVQstoreconst.
Combine 2 64-bit stores of const zero into 1 sse store of 128-bit zero.

Shaves significant (>1%) amount of code from go tool:
/localdisk/itocar/golang/bin/go 10334877
go_old 10388125 [53248 bytes]

global text (code) = 51041 bytes (1.343944%)
read-only data = 663 bytes (0.039617%)
Total difference 51704 bytes (0.873981%)

Change-Id: I7bc40968023c3a69f379b10fbb433cdb11364f1b
Reviewed-on: https://go-review.googlesource.com/56250
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-08-17 17:40:40 +00:00
Alberto Donizetti
a0453a180f cmd/compile: combine x*n + y*n into (x+y)*n
There are a few cases where this can be useful. Apart from the obvious
(and silly)

  100*n + 200*n

where we generate one IMUL instead of two, consider:

  15*n + 31*n

Currently, the compiler strength-reduces both imuls, generating:

    0x0000 00000	MOVQ	"".n+8(SP), AX
	0x0005 00005 	MOVQ	AX, CX
	0x0008 00008 	SHLQ	$4, AX
	0x000c 00012 	SUBQ	CX, AX
	0x000f 00015 	MOVQ	CX, DX
	0x0012 00018 	SHLQ	$5, CX
	0x0016 00022 	SUBQ	DX, CX
	0x0019 00025 	ADDQ	CX, AX
	0x001c 00028 	MOVQ	AX, "".~r1+16(SP)
	0x0021 00033 	RET

But combining the imuls is both faster and shorter:

	0x0000 00000	MOVQ	"".n+8(SP), AX
	0x0005 00005 	IMULQ	$46, AX
	0x0009 00009	MOVQ	AX, "".~r1+16(SP)
	0x000e 00014 	RET

even without strength-reduction.

Moreover, consider:

  5*n + 7*(n+1) + 11*(n+2)

We already have a rule that rewrites 7(n+1) into 7n+7, so the
generated code (without imuls merging) looks like this:

	0x0000 00000 	MOVQ	"".n+8(SP), AX
	0x0005 00005 	LEAQ	(AX)(AX*4), CX
	0x0009 00009 	MOVQ	AX, DX
	0x000c 00012 	NEGQ	AX
	0x000f 00015 	LEAQ	(AX)(DX*8), AX
	0x0013 00019 	ADDQ	CX, AX
	0x0016 00022 	LEAQ	(DX)(CX*2), CX
	0x001a 00026 	LEAQ	29(AX)(CX*1), AX
	0x001f 00031 	MOVQ	AX, "".~r1+16(SP)

But with imuls merging, the 5n, 7n and 11n factors get merged, and the
generated code looks like this:

	0x0000 00000 	MOVQ	"".n+8(SP), AX
	0x0005 00005 	IMULQ	$23, AX
	0x0009 00009 	ADDQ	$29, AX
	0x000d 00013 	MOVQ	AX, "".~r1+16(SP)
	0x0012 00018 	RET

Which is both faster and shorter; that's also the exact same code that
clang and the intel c compiler generate for the above expression.

Change-Id: Ib4d5503f05d2f2efe31a1be14e2fe6cac33730a9
Reviewed-on: https://go-review.googlesource.com/55143
Reviewed-by: Keith Randall <khr@golang.org>
2017-08-16 16:51:59 +00:00
Cherry Zhang
f20944de78 cmd/compile: set/unset base register for better assembly print
For address of an auto or arg, on all non-x86 architectures
the assembler backend encodes the actual SP offset in the
instruction but leaves the offset in Prog unchanged. When the
assembly is printed in compile -S, it shows an offset
relative to pseudo FP/SP with an actual hardware SP base
register (e.g. R13 on ARM). This is confusing. Unset the
base register if it is indeed SP, so the assembly output is
consistent. If the base register isn't SP, it should be an
error and the error output contains the actual base register.

For address loading instructions, the base register isn't set
in the compiler on non-x86 architectures. Set it. Normally it
is SP and will be unset in the change mentioned above for
printing. If it is not, it will be an error and the error
output contains the actual base register.

No change in generated binary, only printed assembly. Passes
"go build -a -toolexec 'toolstash -cmp' std cmd" on all
architectures.

Fixes #21064.

Change-Id: Ifafe8d5f9b437efbe824b63b3cbc2f5f6cdc1fd5
Reviewed-on: https://go-review.googlesource.com/49432
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-08-02 12:24:02 +00:00
Ilya Tocar
3bdc2f3abf cmd/compile/internal/gc: speed-up small array comparison
Currently we inline array comparisons for arrays with at most 4 elements.
Compare arrays with small size, but more than 4 elements (e. g. [16]byte)
with larger compares. This provides very slightly smaller binaries,
and results in faster code.

ArrayEqual-6  7.41ns ± 0%  3.17ns ± 0%  -57.15%  (p=0.000 n=10+10)

For go tool:
global text (code) = -559 bytes (-0.014566%)

This also helps mapaccess1_faststr, and maps in general:

MapDelete/Str/1-6               195ns ± 1%     186ns ± 2%   -4.47%  (p=0.000 n=10+10)
MapDelete/Str/2-6               211ns ± 1%     177ns ± 1%  -16.01%  (p=0.000 n=10+10)
MapDelete/Str/4-6               225ns ± 1%     183ns ± 1%  -18.49%  (p=0.000 n=8+10)
MapStringKeysEight_16-6        31.3ns ± 0%    28.6ns ± 0%   -8.63%  (p=0.000 n=6+9)
MapStringKeysEight_32-6        29.2ns ± 0%    27.6ns ± 0%   -5.45%  (p=0.000 n=10+10)
MapStringKeysEight_64-6        29.1ns ± 1%    27.5ns ± 0%   -5.46%  (p=0.000 n=10+10)
MapStringKeysEight_1M-6        29.1ns ± 1%    27.6ns ± 0%   -5.49%  (p=0.000 n=10+10)

Change-Id: I9ec98e41b233031e0e96c4e13d86a324f628ed4a
Reviewed-on: https://go-review.googlesource.com/40771
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-06-01 15:46:16 +00:00
Josh Bleecher Snyder
ee69c21747 cmd/compile: don't use statictmps for SSA-able composite literals
The writebarrier test has to change.
Now that T23 composite literals are passed to the backend,
they get SSA'd, so writes to their fields are treated separately,
so the relevant part of the first write to t23 is now a dead store.
Preserve the intent of the test by splitting it up into two functions.

Reduces code size a bit:

name        old object-bytes  new object-bytes  delta
Template           386k ± 0%         386k ± 0%    ~     (all equal)
Unicode            202k ± 0%         202k ± 0%    ~     (all equal)
GoTypes           1.16M ± 0%        1.16M ± 0%    ~     (all equal)
Compiler          3.92M ± 0%        3.91M ± 0%  -0.19%  (p=0.008 n=5+5)
SSA               7.91M ± 0%        7.91M ± 0%    ~     (all equal)
Flate              228k ± 0%         228k ± 0%  -0.05%  (p=0.008 n=5+5)
GoParser           283k ± 0%         283k ± 0%    ~     (all equal)
Reflect            952k ± 0%         952k ± 0%  -0.06%  (p=0.008 n=5+5)
Tar                188k ± 0%         188k ± 0%  -0.09%  (p=0.008 n=5+5)
XML                406k ± 0%         406k ± 0%  -0.02%  (p=0.008 n=5+5)
[Geo mean]         649k              648k       -0.04%

Fixes #18872

Change-Id: Ifeed0f71f13849732999aa731cc2bf40c0f0e32a
Reviewed-on: https://go-review.googlesource.com/43154
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-05-11 18:28:40 +00:00
Cherry Zhang
fb0ccc5d0a cmd/internal/obj/arm64, cmd/compile: improve offset folding on ARM64
ARM64 assembler backend only accepts loads and stores with small
or aligned offset. The compiler therefore can only fold small or
aligned offsets into loads and stores. For locals and args, their
offsets to SP are not known until very late, and the compiler
makes conservative decision not folding some of them. However,
in most cases, the offset is indeed small or aligned, and can
be folded into load and store (but actually not).

This CL adds support of loads and stores with large and unaligned
offsets. When the offset doesn't fit into the instruction, it
uses two instructions and (for very large offset) the constant
pool. This way, the compiler doesn't need to be conservative,
and can simply fold the offset.

To make it work, the assembler's optab matching rules need to be
changed. Before, MOVD accepts C_UAUTO32K which matches multiple
of 8 between 0 and 32K, and also C_UAUTO16K, which may not be
multiple of 8 and does not fit into MOVD instruction. The
assembler errors in the latter case. This change makes it only
matches multiple of 8 (or offsets within ±256, which also fits
in instruction), and uses the large-or-unaligned-offset rule
for things doesn't fit (without error). Other sized move rules
are changed similarly.

Class C_UAUTO64K and C_UOREG64K are removed, as they are never
used.

In shared library, load/store of global is rewritten to using
GOT and temp register, which conflicts with the use of temp
register for assembling large offset. So the folding is disabled
for globals in shared library mode.

Reduce cmd/go binary size by 2%.

name                     old time/op    new time/op    delta
BinaryTree17-8              8.67s ± 0%     8.61s ± 0%   -0.60%  (p=0.000 n=9+10)
Fannkuch11-8                6.24s ± 0%     6.19s ± 0%   -0.83%  (p=0.000 n=10+9)
FmtFprintfEmpty-8           116ns ± 0%     116ns ± 0%     ~     (all equal)
FmtFprintfString-8          196ns ± 0%     192ns ± 0%   -1.89%  (p=0.000 n=10+10)
FmtFprintfInt-8             199ns ± 0%     198ns ± 0%   -0.35%  (p=0.001 n=9+10)
FmtFprintfIntInt-8          294ns ± 0%     293ns ± 0%   -0.34%  (p=0.000 n=8+8)
FmtFprintfPrefixedInt-8     318ns ± 1%     318ns ± 1%     ~     (p=1.000 n=10+10)
FmtFprintfFloat-8           537ns ± 0%     531ns ± 0%   -1.17%  (p=0.000 n=9+10)
FmtManyArgs-8              1.19µs ± 1%    1.18µs ± 1%   -1.41%  (p=0.001 n=10+10)
GobDecode-8                17.2ms ± 1%    17.3ms ± 2%     ~     (p=0.165 n=10+10)
GobEncode-8                14.7ms ± 1%    14.7ms ± 2%     ~     (p=0.631 n=10+10)
Gzip-8                      837ms ± 0%     836ms ± 0%   -0.14%  (p=0.006 n=9+10)
Gunzip-8                    141ms ± 0%     139ms ± 0%   -1.24%  (p=0.000 n=9+10)
HTTPClientServer-8          256µs ± 1%     253µs ± 1%   -1.35%  (p=0.000 n=10+10)
JSONEncode-8               40.1ms ± 1%    41.3ms ± 1%   +3.06%  (p=0.000 n=10+9)
JSONDecode-8                157ms ± 1%     156ms ± 1%   -0.83%  (p=0.001 n=9+8)
Mandelbrot200-8            8.94ms ± 0%    8.94ms ± 0%   +0.02%  (p=0.000 n=9+9)
GoParse-8                  8.69ms ± 0%    8.54ms ± 1%   -1.69%  (p=0.000 n=8+10)
RegexpMatchEasy0_32-8       227ns ± 1%     228ns ± 1%   +0.48%  (p=0.016 n=10+9)
RegexpMatchEasy0_1K-8      1.92µs ± 0%    1.63µs ± 0%  -15.08%  (p=0.000 n=10+9)
RegexpMatchEasy1_32-8       256ns ± 0%     251ns ± 0%   -2.19%  (p=0.000 n=10+9)
RegexpMatchEasy1_1K-8      2.38µs ± 0%    2.09µs ± 0%  -12.49%  (p=0.000 n=10+9)
RegexpMatchMedium_32-8      352ns ± 0%     354ns ± 0%   +0.39%  (p=0.002 n=10+9)
RegexpMatchMedium_1K-8      106µs ± 0%     106µs ± 0%   -0.05%  (p=0.005 n=10+9)
RegexpMatchHard_32-8       5.92µs ± 0%    5.89µs ± 0%   -0.40%  (p=0.000 n=9+8)
RegexpMatchHard_1K-8        180µs ± 0%     179µs ± 0%   -0.14%  (p=0.000 n=10+9)
Revcomp-8                   1.20s ± 0%     1.13s ± 0%   -6.29%  (p=0.000 n=9+8)
Template-8                  159ms ± 1%     154ms ± 1%   -3.14%  (p=0.000 n=9+10)
TimeParse-8                 800ns ± 3%     769ns ± 1%   -3.91%  (p=0.000 n=10+10)
TimeFormat-8                826ns ± 2%     817ns ± 2%   -1.04%  (p=0.050 n=10+10)
[Geo mean]                  145µs          143µs        -1.79%

Change-Id: I5fc42087cee9b54ea414f8ef6d6d020b80eb5985
Reviewed-on: https://go-review.googlesource.com/42172
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2017-05-09 19:41:00 +00:00
Martin Möhrmann
f9bec9eb42 cmd/compile: use MOVL instead of MOVQ for small constants on amd64
The encoding of MOVL to a register is 2 bytes shorter than for MOVQ.
The upper 32bit are automatically zeroed when MOVL to a register is used.

Replaces 1657 MOVQ by MOVL in the go binary.
Reduces go binary size by 4 kilobyte.

name                   old time/op    new time/op    delta
BinaryTree17              1.93s ± 0%     1.93s ± 0%  -0.32%  (p=0.000 n=9+9)
Fannkuch11                2.66s ± 0%     2.48s ± 0%  -6.60%  (p=0.000 n=9+9)
FmtFprintfEmpty          31.8ns ± 0%    31.6ns ± 0%  -0.63%  (p=0.000 n=10+10)
FmtFprintfString         52.0ns ± 0%    51.9ns ± 0%  -0.19%  (p=0.000 n=10+10)
FmtFprintfInt            55.6ns ± 0%    54.6ns ± 0%  -1.80%  (p=0.002 n=8+10)
FmtFprintfIntInt         87.7ns ± 0%    84.8ns ± 0%  -3.31%  (p=0.000 n=9+9)
FmtFprintfPrefixedInt    98.9ns ± 0%   102.0ns ± 0%  +3.10%  (p=0.000 n=10+10)
FmtFprintfFloat           165ns ± 0%     164ns ± 0%  -0.61%  (p=0.000 n=10+10)
FmtManyArgs               368ns ± 0%     361ns ± 0%  -1.98%  (p=0.000 n=8+10)
GobDecode                4.53ms ± 0%    4.58ms ± 0%  +1.08%  (p=0.000 n=9+10)
GobEncode                3.74ms ± 0%    3.73ms ± 0%  -0.27%  (p=0.000 n=10+10)
Gzip                      164ms ± 0%     163ms ± 0%  -0.48%  (p=0.000 n=10+10)
Gunzip                   26.7ms ± 0%    26.6ms ± 0%  -0.13%  (p=0.000 n=9+10)
HTTPClientServer         30.4µs ± 1%    30.3µs ± 1%  -0.41%  (p=0.016 n=10+10)
JSONEncode               10.9ms ± 0%    11.0ms ± 0%  +0.70%  (p=0.000 n=10+10)
JSONDecode               36.8ms ± 0%    37.0ms ± 0%  +0.59%  (p=0.000 n=9+10)
Mandelbrot200            3.20ms ± 0%    3.21ms ± 0%  +0.44%  (p=0.000 n=9+10)
GoParse                  2.35ms ± 0%    2.35ms ± 0%  +0.26%  (p=0.000 n=10+9)
RegexpMatchEasy0_32      58.3ns ± 0%    58.4ns ± 0%  +0.17%  (p=0.000 n=10+10)
RegexpMatchEasy0_1K       138ns ± 0%     142ns ± 0%  +2.68%  (p=0.000 n=10+10)
RegexpMatchEasy1_32      55.1ns ± 0%    55.6ns ± 1%    ~     (p=0.104 n=10+10)
RegexpMatchEasy1_1K       242ns ± 0%     243ns ± 0%  +0.41%  (p=0.000 n=10+10)
RegexpMatchMedium_32     87.4ns ± 0%    89.9ns ± 0%  +2.86%  (p=0.000 n=10+10)
RegexpMatchMedium_1K     27.4µs ± 0%    27.4µs ± 0%  +0.15%  (p=0.000 n=10+10)
RegexpMatchHard_32       1.30µs ± 0%    1.32µs ± 1%  +1.91%  (p=0.000 n=10+10)
RegexpMatchHard_1K       39.0µs ± 0%    39.5µs ± 0%  +1.38%  (p=0.000 n=10+10)
Revcomp                   316ms ± 0%     319ms ± 0%  +1.13%  (p=0.000 n=9+8)
Template                 40.6ms ± 0%    40.6ms ± 0%    ~     (p=0.123 n=10+10)
TimeParse                 224ns ± 0%     224ns ± 0%    ~     (all equal)
TimeFormat                230ns ± 0%     225ns ± 0%  -2.17%  (p=0.000 n=10+10)

Change-Id: I32a099b65f9e6d4ad7288ed48546655c534757d8
Reviewed-on: https://go-review.googlesource.com/38630
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-01 20:59:58 +00:00
Lynn Boger
9248ff46a8 cmd/compile: add rotates to PPC64.rules
This updates PPC64.rules to include rules to generate rotates
for ADD, OR, XOR operators that combine two opposite shifts
that sum to 32 or 64.

To support this change opcodes for ROTL and ROTLW were added to
be used like the rotldi and rotlwi extended mnemonics.

This provides the following improvement in sha3:

BenchmarkPermutationFunction-8     302.83       376.40       1.24x
BenchmarkSha3_512_MTU-8            98.64        121.92       1.24x
BenchmarkSha3_384_MTU-8            136.80       168.30       1.23x
BenchmarkSha3_256_MTU-8            169.21       211.29       1.25x
BenchmarkSha3_224_MTU-8            179.76       221.19       1.23x
BenchmarkShake128_MTU-8            212.87       263.23       1.24x
BenchmarkShake256_MTU-8            196.62       245.60       1.25x
BenchmarkShake256_16x-8            163.57       194.37       1.19x
BenchmarkShake256_1MiB-8           199.02       248.74       1.25x
BenchmarkSha3_512_1MiB-8           106.55       133.13       1.25x

Fixes #20030

Change-Id: I484c56f48395d32f53ff3ecb3ac6cb8191cfee44
Reviewed-on: https://go-review.googlesource.com/40992
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-20 18:05:22 +00:00
Keith Randall
7e07e635f3 cmd/compile: implement non-constant rotates
Makes math/bits.Rotate{Left,Right} fast on amd64.

name              old time/op  new time/op  delta
RotateLeft-12     7.42ns ± 6%  5.45ns ± 6%  -26.54%   (p=0.000 n=9+10)
RotateLeft8-12    4.77ns ± 5%  3.42ns ± 7%  -28.25%   (p=0.000 n=8+10)
RotateLeft16-12   4.82ns ± 8%  3.40ns ± 7%  -29.36%  (p=0.000 n=10+10)
RotateLeft32-12   4.87ns ± 7%  3.48ns ± 7%  -28.51%    (p=0.000 n=8+9)
RotateLeft64-12   5.23ns ±10%  3.35ns ± 6%  -35.97%   (p=0.000 n=9+10)
RotateRight-12    7.59ns ± 8%  5.71ns ± 1%  -24.72%   (p=0.000 n=10+8)
RotateRight8-12   4.98ns ± 7%  3.36ns ± 9%  -32.55%  (p=0.000 n=10+10)
RotateRight16-12  5.12ns ± 2%  3.45ns ± 5%  -32.62%  (p=0.000 n=10+10)
RotateRight32-12  4.80ns ± 6%  3.42ns ±16%  -28.68%  (p=0.000 n=10+10)
RotateRight64-12  4.78ns ± 6%  3.42ns ± 6%  -28.50%  (p=0.000 n=10+10)

Update #18940

Change-Id: Ie79fb5581c489ed4d3b859314c5e669a134c119b
Reviewed-on: https://go-review.googlesource.com/39711
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-04-17 23:19:45 +00:00
Josh Bleecher Snyder
3d0a898385 cmd/compile: improve output when TestAssembly build fails
Change-Id: Ibee84399d81463d3e7d5319626bb0d6b60b86bd9
Reviewed-on: https://go-review.googlesource.com/40861
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-17 03:12:34 +00:00
Josh Bleecher Snyder
0d36999a0f cmd/compile: make TestAssembly resilient to output ordering
To preserve reproducible builds, the text entries
during compilation will be sorted before being printed.
TestAssembly currently assumes that function init
comes after all user-defined functions.
Remove that assumption.
Instead of looking for "TEXT" to tell you where
a function ends--which may now yield lots of
non-function-code junk--look for a line beginning
with non-whitespace.

Updates #15756

Change-Id: Ibc82dba6143d769ef4c391afc360e523b1a51348
Reviewed-on: https://go-review.googlesource.com/39853
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-13 02:30:29 +00:00
Ilya Tocar
e4a500ce14 cmd/compile/internal/gc: improve comparison with constant strings
Currently we expand comparison with small constant strings into len check
and a sequence of byte comparisons. Generate 16/32/64-bit comparisons,
instead of bytewise on 386 and amd64. Also increase limits on what is
considered small constant string.
Shaves ~30kb (0.5%) from go executable.

This also updates test/prove.go to keep test case valid.

Change-Id: I99ae8871a1d00c96363c6d03d0b890782fa7e1d9
Reviewed-on: https://go-review.googlesource.com/38776
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-04-07 15:40:25 +00:00
Cherry Zhang
257b01f8f4 cmd/compile: use ANDconst to mask out leading/trailing bits on ARM64
For an AND that masks out leading or trailing bits, generic rules
rewrite it to a pair of shifts. On ARM64, the mask actually can
fit into an AND instruction. So we rewrite it back to AND.

Fixes #19857.

Change-Id: I479d7320ae4f29bb3f0056d5979bde4478063a8f
Reviewed-on: https://go-review.googlesource.com/39651
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-04-06 17:59:32 +00:00
Keith Randall
5cadc91b3c cmd/compile: intrinsics for math/bits.OnesCount
Popcount instructions on amd64 are not guaranteed to be
present, so we must guard their call.  Rewrite rules can't
generate control flow at the moment, so the intrinsifier
needs to generate that code.

name           old time/op  new time/op  delta
OnesCount-8    2.47ns ± 5%  1.04ns ± 2%  -57.70%  (p=0.000 n=10+10)
OnesCount16-8  1.05ns ± 1%  0.78ns ± 0%  -25.56%    (p=0.000 n=9+8)
OnesCount32-8  1.63ns ± 5%  1.04ns ± 2%  -35.96%  (p=0.000 n=10+10)
OnesCount64-8  2.45ns ± 0%  1.04ns ± 1%  -57.55%   (p=0.000 n=6+10)

Update #18616

Change-Id: I4aff2cc9aa93787898d7b22055fe272a7cf95673
Reviewed-on: https://go-review.googlesource.com/38320
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-04-04 02:40:11 +00:00
Keith Randall
63a72fd447 cmd/compile: strength-reduce floating point
x*2 -> x+x
x/c, c power of 2 -> x*(1/c)

Fixes #19827

Change-Id: I74c9f0b5b49b2ed26c0990314c7d1d5f9631b6f1
Reviewed-on: https://go-review.googlesource.com/39295
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-04-03 21:27:03 +00:00
Keith Randall
86dc86b4f9 cmd/compile: don't merge load+op if other op arg is still live
We want to merge a load and op into a single instruction

    l = LOAD ptr mem
    y = OP x l

into

    y = OPload x ptr mem

However, all of our OPload instructions require that y uses
the same register as x. If x is needed past this instruction, then
we must copy x somewhere else, losing the whole benefit of merging
the instructions in the first place.

Disable this optimization if x is live past the OP.

Also disable this optimization if the OP is in a deeper loop than the load.

Update #19595

Change-Id: I87f596aad7e91c9127bfb4705cbae47106e1e77a
Reviewed-on: https://go-review.googlesource.com/38337
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2017-03-23 15:53:04 +00:00
Michael Munday
17570a9afb cmd/compile: emit fused multiply-{add,subtract} on ppc64x
A follow on to CL 36963 adding support for ppc64x.

Performance changes (as posted on the issue):

poly1305:
benchmark               old ns/op new ns/op delta
Benchmark64-16          172       151       -12.21%
Benchmark1K-16          1828      1523      -16.68%
Benchmark64Unaligned-16 172       151       -12.21%
Benchmark1KUnaligned-16 1827      1523      -16.64%

math:
BenchmarkAcos-16        43.9      39.9      -9.11%
BenchmarkAcosh-16       57.0      45.8      -19.65%
BenchmarkAsin-16        35.8      33.0      -7.82%
BenchmarkAsinh-16       68.6      60.8      -11.37%
BenchmarkAtan-16        19.8      16.2      -18.18%
BenchmarkAtanh-16       65.5      57.5      -12.21%
BenchmarkAtan2-16       45.4      34.2      -24.67%
BenchmarkGamma-16       37.6      26.0      -30.85%
BenchmarkLgamma-16      40.0      28.2      -29.50%
BenchmarkLog1p-16       35.1      29.1      -17.09%
BenchmarkSin-16         22.7      18.4      -18.94%
BenchmarkSincos-16      31.7      23.7      -25.24%
BenchmarkSinh-16        146       131       -10.27%
BenchmarkY0-16          130       107       -17.69%
BenchmarkY1-16          127       107       -15.75%
BenchmarkYn-16          278       235       -15.47%

Updates #17895.

Change-Id: I1c16199715d20c9c4bd97c4a950bcfa69eb688c1
Reviewed-on: https://go-review.googlesource.com/38095
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2017-03-20 20:01:29 +00:00
Keith Randall
495b167919 cmd/compile: intrinsics for math/bits.{Len,LeadingZeros}
name              old time/op  new time/op  delta
LeadingZeros-4    2.00ns ± 0%  1.34ns ± 1%  -33.02%  (p=0.000 n=8+10)
LeadingZeros16-4  1.62ns ± 0%  1.57ns ± 0%   -3.09%  (p=0.001 n=8+9)
LeadingZeros32-4  2.14ns ± 0%  1.48ns ± 0%  -30.84%  (p=0.002 n=8+10)
LeadingZeros64-4  2.06ns ± 1%  1.33ns ± 0%  -35.08%  (p=0.000 n=8+8)

8-bit args is a special case - the Go code is really fast because
it is just a single table lookup.  So I've disabled that for now.
Intrinsics were actually slower:
LeadingZeros8-4   1.22ns ± 3%  1.58ns ± 1%  +29.56%  (p=0.000 n=10+10)

Update #18616

Change-Id: Ia9c289b9ba59c583ea64060470315fd637e814cf
Reviewed-on: https://go-review.googlesource.com/38311
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-03-16 22:53:49 +00:00
Keith Randall
dd9892e31b cmd/compile: intrinsify math/bits.ReverseBytes
Update #18616

Change-Id: I0c2d643cbbeb131b4c9b12194697afa4af48e1d2
Reviewed-on: https://go-review.googlesource.com/38166
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-03-16 19:41:56 +00:00
Keith Randall
d5dc490519 cmd/compile: intrinsics for math/bits.TrailingZerosX
Implement math/bits.TrailingZerosX using intrinsics.

Generally reorganize the intrinsic spec a bit.
The instrinsics data structure is now built at init time.
This will make doing the other functions in math/bits easier.

Update sys.CtzX to return int instead of uint{64,32} so it
matches math/bits.TrailingZerosX.

Improve the intrinsics a bit for amd64.  We don't need the CMOV
for <64 bit versions.

Update #18616

Change-Id: Ic1c5339c943f961d830ae56f12674d7b29d4ff39
Reviewed-on: https://go-review.googlesource.com/38155
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-03-16 02:44:16 +00:00
Josh Bleecher Snyder
3a90bfb253 cmd/dist, cmd/compile: eliminate mergeEnvLists copies
This is now handled by os/exec.

Updates #12868

Change-Id: Ic21a6ff76a9b9517437ff1acf3a9195f9604bb45
Reviewed-on: https://go-review.googlesource.com/37698
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-02 22:26:23 +00:00
Josh Bleecher Snyder
2183135554 cmd/compile: recognize bit test patterns on amd64
Updates #18943

Change-Id: If3080d6133bb6d2710b57294da24c90251ab4e08
Reviewed-on: https://go-review.googlesource.com/36329
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-01 00:36:04 +00:00
Michael Munday
bd8a39b67a cmd/compile: emit fused multiply-{add,subtract} instructions on s390x
Explcitly block fused multiply-add pattern matching when a cast is used
after the multiplication, for example:

    - (a * b) + c        // can emit fused multiply-add
    - float64(a * b) + c // cannot emit fused multiply-add

float{32,64} and complex{64,128} casts of matching types are now kept
as OCONV operations rather than being replaced with OCONVNOP operations
because they now imply a rounding operation (and therefore aren't a
no-op anymore).

Operations (for example, multiplication) on complex types may utilize
fused multiply-add and -subtract instructions internally. There is no
way to disable this behavior at the moment.

Improves the performance of the floating point implementation of
poly1305:

name         old speed     new speed     delta
64           246MB/s ± 0%  275MB/s ± 0%  +11.48%   (p=0.000 n=10+8)
1K           312MB/s ± 0%  357MB/s ± 0%  +14.41%  (p=0.000 n=10+10)
64Unaligned  246MB/s ± 0%  274MB/s ± 0%  +11.43%  (p=0.000 n=10+10)
1KUnaligned  312MB/s ± 0%  357MB/s ± 0%  +14.39%   (p=0.000 n=10+8)

Updates #17895.

Change-Id: Ia771d275bb9150d1a598f8cc773444663de5ce16
Reviewed-on: https://go-review.googlesource.com/36963
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-02-28 15:34:20 +00:00
Josh Bleecher Snyder
e458264aca cmd/compile: fix dolinkobj flag in TestAssembly
Follow-up to CL 37270.

This considerably reduces the time to run the test.

Before:

real	0m7.638s
user	0m14.341s
sys	0m2.244s

After:

real	0m4.867s
user	0m7.107s
sys	0m1.842s

Change-Id: I8837a5da0979a1c365e1ce5874d81708249a4129
Reviewed-on: https://go-review.googlesource.com/37461
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <munday@ca.ibm.com>
2017-02-25 14:39:29 +00:00
Lorenzo Masini
fb1f47a77c cmd/compile: speed up TestAssembly
TestAssembly was very slow, leading to it being skipped by default.
This is not surprising, it separately invoked the compiler and
parsed the result many times.

Now the test assembles one source file for arch/os combination,
containing the relevant functions.

Tests for each arch/os run in parallel.

Now the test runs approximately 10x faster on my Intel(R) Core(TM)
i5-6600 CPU @ 3.30GHz.

Fixes #18966

Change-Id: I45ab97630b627a32e17900c109f790eb4c0e90d9
Reviewed-on: https://go-review.googlesource.com/37270
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-24 21:23:43 +00:00
Kirill Smelkov
4477fd097f cmd/compile/internal/ssa: combine 2 byte loads + shifts into word load + rolw 8 on AMD64
... and same for stores. This does for binary.BigEndian.Uint16() what
was already done for Uint32 and Uint64 with BSWAP in 10f75748 (CL 32222).

Here is how generated code changes e.g. for the following function
(omitting saying the same prologue/epilogue):

	func get16(b [2]byte) uint16 {
		return binary.BigEndian.Uint16(b[:])
	}

"".get16 t=1 size=21 args=0x10 locals=0x0

	// before
        0x0000 00000 (x.go:15)  MOVBLZX "".b+9(FP), AX
        0x0005 00005 (x.go:15)  MOVBLZX "".b+8(FP), CX
        0x000a 00010 (x.go:15)  SHLL    $8, CX
        0x000d 00013 (x.go:15)  ORL     CX, AX

	// after
	0x0000 00000 (x.go:15)	MOVWLZX	"".b+8(FP), AX
	0x0005 00005 (x.go:15)	ROLW	$8, AX

encoding/binary is speedup overall a bit:

name                    old time/op    new time/op    delta
ReadSlice1000Int32s-4     4.83µs ± 0%    4.83µs ± 0%     ~     (p=0.206 n=4+5)
ReadStruct-4              1.29µs ± 2%    1.28µs ± 1%   -1.27%  (p=0.032 n=4+5)
ReadInts-4                 384ns ± 1%     385ns ± 1%     ~     (p=0.968 n=4+5)
WriteInts-4                534ns ± 3%     526ns ± 0%   -1.54%  (p=0.048 n=4+5)
WriteSlice1000Int32s-4    5.02µs ± 0%    5.11µs ± 3%     ~     (p=0.175 n=4+5)
PutUint16-4               0.59ns ± 0%    0.49ns ± 2%  -16.95%  (p=0.016 n=4+5)
PutUint32-4               0.52ns ± 0%    0.52ns ± 0%     ~     (all equal)
PutUint64-4               0.53ns ± 0%    0.53ns ± 0%     ~     (all equal)
PutUvarint32-4            19.9ns ± 0%    19.9ns ± 1%     ~     (p=0.556 n=4+5)
PutUvarint64-4            54.5ns ± 1%    54.2ns ± 0%     ~     (p=0.333 n=4+5)

name                    old speed      new speed      delta
ReadSlice1000Int32s-4    829MB/s ± 0%   828MB/s ± 0%     ~     (p=0.190 n=4+5)
ReadStruct-4            58.0MB/s ± 2%  58.7MB/s ± 1%   +1.30%  (p=0.032 n=4+5)
ReadInts-4              78.0MB/s ± 1%  77.8MB/s ± 1%     ~     (p=0.968 n=4+5)
WriteInts-4             56.1MB/s ± 3%  57.0MB/s ± 0%     ~     (p=0.063 n=4+5)
WriteSlice1000Int32s-4   797MB/s ± 0%   783MB/s ± 3%     ~     (p=0.190 n=4+5)
PutUint16-4             3.37GB/s ± 0%  4.07GB/s ± 2%  +20.83%  (p=0.016 n=4+5)
PutUint32-4             7.73GB/s ± 0%  7.72GB/s ± 0%     ~     (p=0.556 n=4+5)
PutUint64-4             15.1GB/s ± 0%  15.1GB/s ± 0%     ~     (p=0.905 n=4+5)
PutUvarint32-4           201MB/s ± 0%   201MB/s ± 0%     ~     (p=0.905 n=4+5)
PutUvarint64-4           147MB/s ± 1%   147MB/s ± 0%     ~     (p=0.286 n=4+5)

( "a bit" only because most of the time is spent in reflection-like things
  there, not actual bytes decoding. Even for direct PutUint16 benchmark the
  looping adds overhead and lowers visible benefit. For code-generated encoders /
  decoders actual effect is more than 20% )

Adding Uint32 and Uint64 raw benchmarks too for completeness.

NOTE I had to adjust load-combining rule for bswap case to match first 2 bytes
loads as result of "2-bytes load+shift" -> "loadw + rorw 8" rewrite. Reason is:
for loads+shift, even e.g. into uint16 var

	var b []byte
	var v uin16
	v = uint16(b[1]) | uint16(b[0])<<8

the compiler eventually generates L(ong) shift - SHLLconst [8], probably
because it is more straightforward / other reasons to work on the whole
register. This way 2 bytes rewriting rule is using SHLLconst (not SHLWconst) in
its pattern, and then it always gets matched first, even if 2-byte rule comes
syntactically after 4-byte rule in AMD64.rules because 4-bytes rule seemingly
needs more applyRewrite() cycles to trigger. If 2-bytes rule gets matched for
inner half of

	var b []byte
	var v uin32
	v = uint32(b[3]) | uint32(b[2])<<8 | uint32(b[1])<<16 | uint32(b[0])<<24

and we keep 4-byte load rule unchanged, the result will be MOVW + RORW $8 and
then series of byte loads and shifts - not one MOVL + BSWAPL.

There is no such problem for stores: there compiler, since it probably knows
store destination is 2 bytes wide, uses SHRWconst 8 (not SHRLconst 8) and thus
2-byte store rule is not a subset of rule for 4-byte stores.

Fixes #17151  (int16 was last missing piece there)

Change-Id: Idc03ba965bfce2b94fef456b02ff6742194748f6
Reviewed-on: https://go-review.googlesource.com/34636
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-14 22:17:08 +00:00
Cherry Zhang
78200799a2 cmd/compile: undo special handling of zero-valued STRUCTLIT
CL 35261 introduces special handling of zero-valued STRUCTLIT for
efficient struct zeroing. But it didn't cover all use cases, for
example, CONVNOP STRUCTLIT is not handled.

On the other hand, CL 34566 handles zeroing earlier, so we don't
need the change in CL 35261 for efficient zeroing. Other uses of
zero-valued struct literals are very rare. So undo the change in
walk.go in CL 35261.

Add a test for efficient zeroing.

Fixes #19084.

Change-Id: I0807f7423fb44d47bf325b3c1ce9611a14953853
Reviewed-on: https://go-review.googlesource.com/36955
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-02-14 18:57:56 +00:00
Kirill Smelkov
bd91e3569a cmd/compile/internal/ssa: generate bswap/store for indexed bigendian byte stores too on AMD64
Commit 10f75748 (CL 32222) added rewrite rules to combine byte loads/stores +
shifts into larger loads/stores + bswap. For loads both MOVBload and
MOVBloadidx1 were handled but for store only MOVBstore was there without
MOVBstoreidx added to rewrite pattern. Fix it.

Here is how generated code changes for the following 2 functions
(ommitting staying the same prologue/epilogue):

    func put32(b []byte, i int, v uint32) {
            binary.BigEndian.PutUint32(b[i:], v)
    }

    func put64(b []byte, i int, v uint64) {
            binary.BigEndian.PutUint64(b[i:], v)
    }

"".put32 t=1 size=100 args=0x28 locals=0x0

	// before
	0x0032 00050 (x.go:5)	MOVL	CX, DX
	0x0034 00052 (x.go:5)	SHRL	$24, CX
	0x0037 00055 (x.go:5)	MOVQ	"".b+8(FP), BX
	0x003c 00060 (x.go:5)	MOVB	CL, (BX)(AX*1)
	0x003f 00063 (x.go:5)	MOVL	DX, CX
	0x0041 00065 (x.go:5)	SHRL	$16, DX
	0x0044 00068 (x.go:5)	MOVB	DL, 1(BX)(AX*1)
	0x0048 00072 (x.go:5)	MOVL	CX, DX
	0x004a 00074 (x.go:5)	SHRL	$8, CX
	0x004d 00077 (x.go:5)	MOVB	CL, 2(BX)(AX*1)
	0x0051 00081 (x.go:5)	MOVB	DL, 3(BX)(AX*1)

	// after
	0x0032 00050 (x.go:5)	BSWAPL	CX
	0x0034 00052 (x.go:5)	MOVQ	"".b+8(FP), DX
	0x0039 00057 (x.go:5)	MOVL	CX, (DX)(AX*1)

"".put64 t=1 size=155 args=0x28 locals=0x0

	// before
	0x0037 00055 (x.go:9)	MOVQ	CX, DX
	0x003a 00058 (x.go:9)	SHRQ	$56, CX
	0x003e 00062 (x.go:9)	MOVQ	"".b+8(FP), BX
	0x0043 00067 (x.go:9)	MOVB	CL, (BX)(AX*1)
	0x0046 00070 (x.go:9)	MOVQ	DX, CX
	0x0049 00073 (x.go:9)	SHRQ	$48, DX
	0x004d 00077 (x.go:9)	MOVB	DL, 1(BX)(AX*1)
	0x0051 00081 (x.go:9)	MOVQ	CX, DX
	0x0054 00084 (x.go:9)	SHRQ	$40, CX
	0x0058 00088 (x.go:9)	MOVB	CL, 2(BX)(AX*1)
	0x005c 00092 (x.go:9)	MOVQ	DX, CX
	0x005f 00095 (x.go:9)	SHRQ	$32, DX
	0x0063 00099 (x.go:9)	MOVB	DL, 3(BX)(AX*1)
	0x0067 00103 (x.go:9)	MOVQ	CX, DX
	0x006a 00106 (x.go:9)	SHRQ	$24, CX
	0x006e 00110 (x.go:9)	MOVB	CL, 4(BX)(AX*1)
	0x0072 00114 (x.go:9)	MOVQ	DX, CX
	0x0075 00117 (x.go:9)	SHRQ	$16, DX
	0x0079 00121 (x.go:9)	MOVB	DL, 5(BX)(AX*1)
	0x007d 00125 (x.go:9)	MOVQ	CX, DX
	0x0080 00128 (x.go:9)	SHRQ	$8, CX
	0x0084 00132 (x.go:9)	MOVB	CL, 6(BX)(AX*1)
	0x0088 00136 (x.go:9)	MOVB	DL, 7(BX)(AX*1)

	// after
	0x0033 00051 (x.go:9)	BSWAPQ	CX
	0x0036 00054 (x.go:9)	MOVQ	"".b+8(FP), DX
	0x003b 00059 (x.go:9)	MOVQ	CX, (DX)(AX*1)

Updates #17151

Change-Id: I3f4a7f28f210e62e153e60da5abd1d39508cc6c4
Reviewed-on: https://go-review.googlesource.com/34635
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2017-02-14 18:35:43 +00:00
Kirill Smelkov
e2948f7efe cmd/compile: Show arch/os when something in TestAssembly fails
It is not always obvious from the first glance when looking at
TestAssembly failure in which context the code was generated. For
example x86 and x86-64 are similar, and those of us who do not work with
assembly every day can even take s390x version as something similar to x86.

So when something fails lets print the whole test context - this
includes os and arch which were previously missing. An example failure:

before:

--- FAIL: TestAssembly (40.48s)
        asm_test.go:46: expected:       MOVWZ   \(.*\),
                go:
                import "encoding/binary"
                func f(b []byte) uint32 {
                        return binary.LittleEndian.Uint32(b)
                }

                asm:"".f t=1 size=160 args=0x20 locals=0x0
		...

after:

--- FAIL: TestAssembly (40.43s)
        asm_test.go:46: linux/s390x: expected:  MOVWZ   \(.*\),
                go:
                import "encoding/binary"
                func f(b []byte) uint32 {
                        return binary.LittleEndian.Uint32(b)
                }

                asm:"".f t=1 size=160 args=0x20 locals=0x0

Motivated-by: #18946#issuecomment-279491071

Change-Id: I61089ceec05da7a165718a7d69dec4227dd0e993
Reviewed-on: https://go-review.googlesource.com/36881
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-13 20:30:31 +00:00
Michael Munday
074b73b1b2 cmd/compile: fix s390x load-combining rules
MOVD{reg,nop} operations (added in CL 36256) inserted to preserve
type information were blocking the load-combining rules. Fix this
by merging type changes into loads wherever possible.

Fixes #19059.

Change-Id: I8a1df06eb0f231b40ae43107d4a3bd0b9c441b59
Reviewed-on: https://go-review.googlesource.com/36843
Run-TryBot: Michael Munday <munday@ca.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-13 20:04:14 +00:00
Keith Randall
b548eee3d9 cmd/compile: fix load-combining rules
CL 33632 reorders args of commutative ops in order to make
CSE for commutative ops more robust.  Unfortunately, that
broke the load-combining rules which depend on a certain ordering
of OR ops' arguments.

Introduce some additional rules that order OR ops' arguments
consistently so that the load-combining rules fire.

Note: there's also something else wrong with the s390x rules.
I've filed #19059 for that.

Fixes #18946

Change-Id: I0a5447196bd88a55ccee683c69a57b943a9972e1
Reviewed-on: https://go-review.googlesource.com/36911
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-02-13 18:29:51 +00:00
Josh Bleecher Snyder
5faba3057d cmd/compile: use constants directly for fast map access calls
CL 35554 taught order.go to use static variables
for constants that needed to be addressable for runtime routines.
However, there is one class of runtime routines that
do not actually need an addressable value: fast map access routines.
This CL teaches order.go to avoid using static variables
for addressability in those cases.
Instead, it avoids introducing a temp at all,
which the backend would just have to optimize away.

Fixes #19015.

Change-Id: I5ef780c604fac3fb48dabb23a344435e283cb832
Reviewed-on: https://go-review.googlesource.com/36693
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-02-10 04:57:20 +00:00
Keith Randall
01c8719f8b cmd/compile: move rotate instruction generation to SSA
Remove rotate generation from walk.  Remove OLROT and ssa.Lrot* opcodes.
Generate rotates during SSA lowering for architectures that have them.

This CL will allow rotates to be generated in more situations,
like when the shift values are determined to be constant
only after some analysis.

Fixes #18254

Change-Id: I8d6d684ff5ce2511aceaddfda98b908007851079
Reviewed-on: https://go-review.googlesource.com/34232
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-02-02 17:57:15 +00:00
Russ Cox
47ce87877b all: merge dev.inline into master
Change-Id: I7715581a04e513dcda9918e853fa6b1ddc703770
2017-02-01 09:47:23 -05:00
Kirill Smelkov
c44da14440 cmd/compile/internal/ssa: add tests for BSWAP on stores on AMD64
Commit 10f75748 (CL 32222) taught AMD64 backend to rewrite series of
byte loads or stores with corresponding shifts into a single long or
quad load or store + appropriate BSWAP. However it did not added test
for stores - only loads were tested.

Fix it.

NOTE Tests for indexed stores are not added because 10f75748 did not add
support for indexed stores - only indexed loads were handled then.

Change-Id: I48c867ebe7622ac8e691d43741feed1d40cca0d7
Reviewed-on: https://go-review.googlesource.com/34634
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-12-21 16:36:45 +00:00
Keith Randall
8d21691044 cmd/compile: test for correct zeroing
Make sure we generate the right code for zeroing a structure.

Check in after Matthew's CL (34564).

Update #18370

Change-Id: I987087f979d99227a880b34c44d9d4de6c25ba0c
Reviewed-on: https://go-review.googlesource.com/34565
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
2016-12-19 17:36:35 +00:00
Robert Griesemer
eab3707d6d [dev.inline] cmd/compile: rename various fields from Lineno to Pos
Various minor adjustments.

Change-Id: Iedfb97989f7bedaa3e9e8993b167e05f162434a7
Reviewed-on: https://go-review.googlesource.com/34136
Reviewed-by: David Lazar <lazard@golang.org>
2016-12-08 21:35:18 +00:00
Ilya Tocar
10f757486e cmd/compile/internal/ssa: generate bswap on AMD64
Generate bswap+load/store for reading/writing big endian data.
Helps encoding/binary.

name                    old time/op    new time/op    delta
ReadSlice1000Int32s-8     5.06µs ± 8%    4.58µs ± 8%   -9.50%        (p=0.000 n=10+10)
ReadStruct-8              1.07µs ± 0%    1.05µs ± 0%   -1.51%         (p=0.000 n=9+10)
ReadInts-8                 367ns ± 0%     363ns ± 0%   -1.15%          (p=0.000 n=8+9)
WriteInts-8                475ns ± 1%     469ns ± 0%   -1.45%        (p=0.000 n=10+10)
WriteSlice1000Int32s-8    5.03µs ± 3%    4.50µs ± 3%  -10.45%          (p=0.000 n=9+9)
PutUvarint32-8            17.2ns ± 0%    17.2ns ± 0%     ~     (all samples are equal)
PutUvarint64-8            46.7ns ± 0%    46.7ns ± 0%     ~           (p=0.509 n=10+10)

name                    old speed      new speed      delta
ReadSlice1000Int32s-8    791MB/s ± 8%   875MB/s ± 8%  +10.53%        (p=0.000 n=10+10)
ReadStruct-8            70.0MB/s ± 0%  71.1MB/s ± 0%   +1.54%         (p=0.000 n=9+10)
ReadInts-8              81.6MB/s ± 0%  82.6MB/s ± 0%   +1.21%          (p=0.000 n=9+9)
WriteInts-8             63.0MB/s ± 1%  63.9MB/s ± 0%   +1.45%        (p=0.000 n=10+10)
WriteSlice1000Int32s-8   796MB/s ± 4%   888MB/s ± 3%  +11.65%          (p=0.000 n=9+9)
PutUvarint32-8           233MB/s ± 0%   233MB/s ± 0%     ~           (p=0.089 n=10+10)
PutUvarint64-8           171MB/s ± 0%   171MB/s ± 0%     ~            (p=0.137 n=10+9)

Change-Id: Ia2dbdef92198eaa7e2af5443a8ed586d4b401ffb
Reviewed-on: https://go-review.googlesource.com/32222
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-11-03 12:34:12 +00:00
Josh Bleecher Snyder
66504485eb cmd/compile/internal/gc: make tests run faster
TestAssembly takes 20s on my machine,
which is too slow for normal operation.
Marking as -short has its dangers (#17472),
but hopefully we'll soon have a builder for that.

All the SSA tests are hermetic and not time sensitive
and can thus be run in parallel.
Reduces the cmd/compile/internal/gc test time during
all.bash on my laptop from 42s to 7s.

Updates #17751

Change-Id: Idd876421db23b9fa3475e8a9b3355a5dc92a5a29
Reviewed-on: https://go-review.googlesource.com/32585
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-11-03 01:07:08 +00:00
Keith Randall
26a6131bac cmd/compile: fix 4-byte unaligned load rules
The 2-byte rule was firing before the 4-byte rule, preventing
the 4-byte rule from firing.  Update the 4-byte rule to use
the results of the 2-byte rule instead.

Add some tests to make sure we don't regress again.

Fixes #17147

Change-Id: Icfeccd9f2b96450981086a52edd76afb3191410a
Reviewed-on: https://go-review.googlesource.com/29382
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-09-23 19:32:37 +00:00
Keith Randall
842b05832f all: use testing.GoToolPath instead of "go"
This change makes sure that tests are run with the correct
version of the go tool.  The correct version is the one that
we invoked with "go test", not the one that is first in our path.

Fixes #16577

Change-Id: If22c8f8c3ec9e7c35d094362873819f2fbb8559b
Reviewed-on: https://go-review.googlesource.com/28089
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-08-30 22:49:11 +00:00
Cherry Zhang
29f0984a35 cmd/compile: don't set line number to 0 when building SSA
The frontend may emit node with line number missing. In this case,
use the parent line number. Instead of changing every call site of
pushLine, do it in pushLine itself.

Fixes #16214.

Change-Id: I80390550b56e4d690fc770b01ff725b892ffd6dc
Reviewed-on: https://go-review.googlesource.com/24641
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-07-01 01:12:24 +00:00
David du Colombier
e29e0ba19a cmd/compile: fix TestAssembly on Plan 9
Since CL 23620, TestAssembly is failing on Plan 9.

In CL 23620, the process environment is passed to 'go tool compile'
after setting GOARCH. On Plan 9, if GOARCH is already set in the
process environment, it would take precedence. On Unix, it works
as expected because the first GOARCH found takes precedence.

This change uses the mergeEnvLists function from cmd/go/main.go
to merge the two environment lists such that variables with the
same name in "in" replace those in "out".

Change-Id: Idee22058343932ee18666dda331c562c89c33507
Reviewed-on: https://go-review.googlesource.com/23593
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David du Colombier <0intro@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-06-01 13:33:43 +00:00
Michael Hudson-Doyle
2885e07c25 cmd/compile: pass process env to 'go tool compile' in compileToAsm
In particular, this stops the test failing when GOROOT and GOROOT_FINAL are
different.

Change-Id: Ibf6cc0a173f1d965ee8aa31eee2698b223f1ceec
Reviewed-on: https://go-review.googlesource.com/23620
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-06-01 03:55:09 +00:00
Keith Randall
9369f22b84 cmd/compile: testing harness for checking generated assembly
Add a test which compiles a function and checks the
generated assembly to make sure certain patterns are present.
This test allows us to do white box tests of the compiler
to make sure optimizations don't regress.

Added a few simple tests for now.  More to come.

Change-Id: I4ab5ce5d95b9e04e7d0d9328ffae47b8d1f95e74
Reviewed-on: https://go-review.googlesource.com/23403
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-26 23:07:01 +00:00