Stowage/go - Remotebranch.eu

Stowage/go

mirror of https://github.com/golang/go.git synced 2025-12-08 06:10:04 +00:00

Author	SHA1	Message	Date
Michael Munday	744ebfde04	cmd/compile: eliminate stores to unread auto variables This is a crude compiler pass to eliminate stores to auto variables that are only ever written to. Eliminates an unnecessary store to x from the following code: func f() int { var x := 1 return *(&x) } Fixes #19765. Change-Id: If2c63a8ae67b8c590b6e0cc98a9610939a3eeffa Reviewed-on: https://go-review.googlesource.com/38746 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-24 16:53:56 +00:00
Alberto Donizetti	8bca7ef607	cmd/compile: support placeholder name '$' in code generation tests This change adds to the code-generation harness in asm_test.go support for the use of a '$' placeholder name for test functions. A few of uninformative function names are also changed to use the placeholder, to confirm that the change works as expected. Fixes #21500 Change-Id: Iba168bd85efc9822253305d003b06682cf8a6c5c Reviewed-on: https://go-review.googlesource.com/57292 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-22 19:42:32 +00:00
Ilya Tocar	da34ddf24b	cmd/compile/internal/ssa: combine more const stores We already combine const stores up-to MOVQstoreconst. Combine 2 64-bit stores of const zero into 1 sse store of 128-bit zero. Shaves significant (>1%) amount of code from go tool: /localdisk/itocar/golang/bin/go 10334877 go_old 10388125 [53248 bytes] global text (code) = 51041 bytes (1.343944%) read-only data = 663 bytes (0.039617%) Total difference 51704 bytes (0.873981%) Change-Id: I7bc40968023c3a69f379b10fbb433cdb11364f1b Reviewed-on: https://go-review.googlesource.com/56250 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-17 17:40:40 +00:00
Alberto Donizetti	a0453a180f	cmd/compile: combine xn + yn into (x+y)n There are a few cases where this can be useful. Apart from the obvious (and silly) 100n + 200n where we generate one IMUL instead of two, consider: 15n + 31n Currently, the compiler strength-reduces both imuls, generating: 0x0000 00000 MOVQ "".n+8(SP), AX 0x0005 00005 MOVQ AX, CX 0x0008 00008 SHLQ $4, AX 0x000c 00012 SUBQ CX, AX 0x000f 00015 MOVQ CX, DX 0x0012 00018 SHLQ $5, CX 0x0016 00022 SUBQ DX, CX 0x0019 00025 ADDQ CX, AX 0x001c 00028 MOVQ AX, "".~r1+16(SP) 0x0021 00033 RET But combining the imuls is both faster and shorter: 0x0000 00000 MOVQ "".n+8(SP), AX 0x0005 00005 IMULQ $46, AX 0x0009 00009 MOVQ AX, "".~r1+16(SP) 0x000e 00014 RET even without strength-reduction. Moreover, consider: 5n + 7(n+1) + 11(n+2) We already have a rule that rewrites 7(n+1) into 7n+7, so the generated code (without imuls merging) looks like this: 0x0000 00000 MOVQ "".n+8(SP), AX 0x0005 00005 LEAQ (AX)(AX4), CX 0x0009 00009 MOVQ AX, DX 0x000c 00012 NEGQ AX 0x000f 00015 LEAQ (AX)(DX8), AX 0x0013 00019 ADDQ CX, AX 0x0016 00022 LEAQ (DX)(CX2), CX 0x001a 00026 LEAQ 29(AX)(CX1), AX 0x001f 00031 MOVQ AX, "".~r1+16(SP) But with imuls merging, the 5n, 7n and 11n factors get merged, and the generated code looks like this: 0x0000 00000 MOVQ "".n+8(SP), AX 0x0005 00005 IMULQ $23, AX 0x0009 00009 ADDQ $29, AX 0x000d 00013 MOVQ AX, "".~r1+16(SP) 0x0012 00018 RET Which is both faster and shorter; that's also the exact same code that clang and the intel c compiler generate for the above expression. Change-Id: Ib4d5503f05d2f2efe31a1be14e2fe6cac33730a9 Reviewed-on: https://go-review.googlesource.com/55143 Reviewed-by: Keith Randall <khr@golang.org>	2017-08-16 16:51:59 +00:00
Cherry Zhang	f20944de78	cmd/compile: set/unset base register for better assembly print For address of an auto or arg, on all non-x86 architectures the assembler backend encodes the actual SP offset in the instruction but leaves the offset in Prog unchanged. When the assembly is printed in compile -S, it shows an offset relative to pseudo FP/SP with an actual hardware SP base register (e.g. R13 on ARM). This is confusing. Unset the base register if it is indeed SP, so the assembly output is consistent. If the base register isn't SP, it should be an error and the error output contains the actual base register. For address loading instructions, the base register isn't set in the compiler on non-x86 architectures. Set it. Normally it is SP and will be unset in the change mentioned above for printing. If it is not, it will be an error and the error output contains the actual base register. No change in generated binary, only printed assembly. Passes "go build -a -toolexec 'toolstash -cmp' std cmd" on all architectures. Fixes #21064. Change-Id: Ifafe8d5f9b437efbe824b63b3cbc2f5f6cdc1fd5 Reviewed-on: https://go-review.googlesource.com/49432 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2017-08-02 12:24:02 +00:00
Ilya Tocar	3bdc2f3abf	cmd/compile/internal/gc: speed-up small array comparison Currently we inline array comparisons for arrays with at most 4 elements. Compare arrays with small size, but more than 4 elements (e. g. [16]byte) with larger compares. This provides very slightly smaller binaries, and results in faster code. ArrayEqual-6 7.41ns ± 0% 3.17ns ± 0% -57.15% (p=0.000 n=10+10) For go tool: global text (code) = -559 bytes (-0.014566%) This also helps mapaccess1_faststr, and maps in general: MapDelete/Str/1-6 195ns ± 1% 186ns ± 2% -4.47% (p=0.000 n=10+10) MapDelete/Str/2-6 211ns ± 1% 177ns ± 1% -16.01% (p=0.000 n=10+10) MapDelete/Str/4-6 225ns ± 1% 183ns ± 1% -18.49% (p=0.000 n=8+10) MapStringKeysEight_16-6 31.3ns ± 0% 28.6ns ± 0% -8.63% (p=0.000 n=6+9) MapStringKeysEight_32-6 29.2ns ± 0% 27.6ns ± 0% -5.45% (p=0.000 n=10+10) MapStringKeysEight_64-6 29.1ns ± 1% 27.5ns ± 0% -5.46% (p=0.000 n=10+10) MapStringKeysEight_1M-6 29.1ns ± 1% 27.6ns ± 0% -5.49% (p=0.000 n=10+10) Change-Id: I9ec98e41b233031e0e96c4e13d86a324f628ed4a Reviewed-on: https://go-review.googlesource.com/40771 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-06-01 15:46:16 +00:00
Josh Bleecher Snyder	ee69c21747	cmd/compile: don't use statictmps for SSA-able composite literals The writebarrier test has to change. Now that T23 composite literals are passed to the backend, they get SSA'd, so writes to their fields are treated separately, so the relevant part of the first write to t23 is now a dead store. Preserve the intent of the test by splitting it up into two functions. Reduces code size a bit: name old object-bytes new object-bytes delta Template 386k ± 0% 386k ± 0% ~ (all equal) Unicode 202k ± 0% 202k ± 0% ~ (all equal) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (all equal) Compiler 3.92M ± 0% 3.91M ± 0% -0.19% (p=0.008 n=5+5) SSA 7.91M ± 0% 7.91M ± 0% ~ (all equal) Flate 228k ± 0% 228k ± 0% -0.05% (p=0.008 n=5+5) GoParser 283k ± 0% 283k ± 0% ~ (all equal) Reflect 952k ± 0% 952k ± 0% -0.06% (p=0.008 n=5+5) Tar 188k ± 0% 188k ± 0% -0.09% (p=0.008 n=5+5) XML 406k ± 0% 406k ± 0% -0.02% (p=0.008 n=5+5) [Geo mean] 649k 648k -0.04% Fixes #18872 Change-Id: Ifeed0f71f13849732999aa731cc2bf40c0f0e32a Reviewed-on: https://go-review.googlesource.com/43154 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-05-11 18:28:40 +00:00
Cherry Zhang	fb0ccc5d0a	cmd/internal/obj/arm64, cmd/compile: improve offset folding on ARM64 ARM64 assembler backend only accepts loads and stores with small or aligned offset. The compiler therefore can only fold small or aligned offsets into loads and stores. For locals and args, their offsets to SP are not known until very late, and the compiler makes conservative decision not folding some of them. However, in most cases, the offset is indeed small or aligned, and can be folded into load and store (but actually not). This CL adds support of loads and stores with large and unaligned offsets. When the offset doesn't fit into the instruction, it uses two instructions and (for very large offset) the constant pool. This way, the compiler doesn't need to be conservative, and can simply fold the offset. To make it work, the assembler's optab matching rules need to be changed. Before, MOVD accepts C_UAUTO32K which matches multiple of 8 between 0 and 32K, and also C_UAUTO16K, which may not be multiple of 8 and does not fit into MOVD instruction. The assembler errors in the latter case. This change makes it only matches multiple of 8 (or offsets within ±256, which also fits in instruction), and uses the large-or-unaligned-offset rule for things doesn't fit (without error). Other sized move rules are changed similarly. Class C_UAUTO64K and C_UOREG64K are removed, as they are never used. In shared library, load/store of global is rewritten to using GOT and temp register, which conflicts with the use of temp register for assembling large offset. So the folding is disabled for globals in shared library mode. Reduce cmd/go binary size by 2%. name old time/op new time/op delta BinaryTree17-8 8.67s ± 0% 8.61s ± 0% -0.60% (p=0.000 n=9+10) Fannkuch11-8 6.24s ± 0% 6.19s ± 0% -0.83% (p=0.000 n=10+9) FmtFprintfEmpty-8 116ns ± 0% 116ns ± 0% ~ (all equal) FmtFprintfString-8 196ns ± 0% 192ns ± 0% -1.89% (p=0.000 n=10+10) FmtFprintfInt-8 199ns ± 0% 198ns ± 0% -0.35% (p=0.001 n=9+10) FmtFprintfIntInt-8 294ns ± 0% 293ns ± 0% -0.34% (p=0.000 n=8+8) FmtFprintfPrefixedInt-8 318ns ± 1% 318ns ± 1% ~ (p=1.000 n=10+10) FmtFprintfFloat-8 537ns ± 0% 531ns ± 0% -1.17% (p=0.000 n=9+10) FmtManyArgs-8 1.19µs ± 1% 1.18µs ± 1% -1.41% (p=0.001 n=10+10) GobDecode-8 17.2ms ± 1% 17.3ms ± 2% ~ (p=0.165 n=10+10) GobEncode-8 14.7ms ± 1% 14.7ms ± 2% ~ (p=0.631 n=10+10) Gzip-8 837ms ± 0% 836ms ± 0% -0.14% (p=0.006 n=9+10) Gunzip-8 141ms ± 0% 139ms ± 0% -1.24% (p=0.000 n=9+10) HTTPClientServer-8 256µs ± 1% 253µs ± 1% -1.35% (p=0.000 n=10+10) JSONEncode-8 40.1ms ± 1% 41.3ms ± 1% +3.06% (p=0.000 n=10+9) JSONDecode-8 157ms ± 1% 156ms ± 1% -0.83% (p=0.001 n=9+8) Mandelbrot200-8 8.94ms ± 0% 8.94ms ± 0% +0.02% (p=0.000 n=9+9) GoParse-8 8.69ms ± 0% 8.54ms ± 1% -1.69% (p=0.000 n=8+10) RegexpMatchEasy0_32-8 227ns ± 1% 228ns ± 1% +0.48% (p=0.016 n=10+9) RegexpMatchEasy0_1K-8 1.92µs ± 0% 1.63µs ± 0% -15.08% (p=0.000 n=10+9) RegexpMatchEasy1_32-8 256ns ± 0% 251ns ± 0% -2.19% (p=0.000 n=10+9) RegexpMatchEasy1_1K-8 2.38µs ± 0% 2.09µs ± 0% -12.49% (p=0.000 n=10+9) RegexpMatchMedium_32-8 352ns ± 0% 354ns ± 0% +0.39% (p=0.002 n=10+9) RegexpMatchMedium_1K-8 106µs ± 0% 106µs ± 0% -0.05% (p=0.005 n=10+9) RegexpMatchHard_32-8 5.92µs ± 0% 5.89µs ± 0% -0.40% (p=0.000 n=9+8) RegexpMatchHard_1K-8 180µs ± 0% 179µs ± 0% -0.14% (p=0.000 n=10+9) Revcomp-8 1.20s ± 0% 1.13s ± 0% -6.29% (p=0.000 n=9+8) Template-8 159ms ± 1% 154ms ± 1% -3.14% (p=0.000 n=9+10) TimeParse-8 800ns ± 3% 769ns ± 1% -3.91% (p=0.000 n=10+10) TimeFormat-8 826ns ± 2% 817ns ± 2% -1.04% (p=0.050 n=10+10) [Geo mean] 145µs 143µs -1.79% Change-Id: I5fc42087cee9b54ea414f8ef6d6d020b80eb5985 Reviewed-on: https://go-review.googlesource.com/42172 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com>	2017-05-09 19:41:00 +00:00
Martin Möhrmann	f9bec9eb42	cmd/compile: use MOVL instead of MOVQ for small constants on amd64 The encoding of MOVL to a register is 2 bytes shorter than for MOVQ. The upper 32bit are automatically zeroed when MOVL to a register is used. Replaces 1657 MOVQ by MOVL in the go binary. Reduces go binary size by 4 kilobyte. name old time/op new time/op delta BinaryTree17 1.93s ± 0% 1.93s ± 0% -0.32% (p=0.000 n=9+9) Fannkuch11 2.66s ± 0% 2.48s ± 0% -6.60% (p=0.000 n=9+9) FmtFprintfEmpty 31.8ns ± 0% 31.6ns ± 0% -0.63% (p=0.000 n=10+10) FmtFprintfString 52.0ns ± 0% 51.9ns ± 0% -0.19% (p=0.000 n=10+10) FmtFprintfInt 55.6ns ± 0% 54.6ns ± 0% -1.80% (p=0.002 n=8+10) FmtFprintfIntInt 87.7ns ± 0% 84.8ns ± 0% -3.31% (p=0.000 n=9+9) FmtFprintfPrefixedInt 98.9ns ± 0% 102.0ns ± 0% +3.10% (p=0.000 n=10+10) FmtFprintfFloat 165ns ± 0% 164ns ± 0% -0.61% (p=0.000 n=10+10) FmtManyArgs 368ns ± 0% 361ns ± 0% -1.98% (p=0.000 n=8+10) GobDecode 4.53ms ± 0% 4.58ms ± 0% +1.08% (p=0.000 n=9+10) GobEncode 3.74ms ± 0% 3.73ms ± 0% -0.27% (p=0.000 n=10+10) Gzip 164ms ± 0% 163ms ± 0% -0.48% (p=0.000 n=10+10) Gunzip 26.7ms ± 0% 26.6ms ± 0% -0.13% (p=0.000 n=9+10) HTTPClientServer 30.4µs ± 1% 30.3µs ± 1% -0.41% (p=0.016 n=10+10) JSONEncode 10.9ms ± 0% 11.0ms ± 0% +0.70% (p=0.000 n=10+10) JSONDecode 36.8ms ± 0% 37.0ms ± 0% +0.59% (p=0.000 n=9+10) Mandelbrot200 3.20ms ± 0% 3.21ms ± 0% +0.44% (p=0.000 n=9+10) GoParse 2.35ms ± 0% 2.35ms ± 0% +0.26% (p=0.000 n=10+9) RegexpMatchEasy0_32 58.3ns ± 0% 58.4ns ± 0% +0.17% (p=0.000 n=10+10) RegexpMatchEasy0_1K 138ns ± 0% 142ns ± 0% +2.68% (p=0.000 n=10+10) RegexpMatchEasy1_32 55.1ns ± 0% 55.6ns ± 1% ~ (p=0.104 n=10+10) RegexpMatchEasy1_1K 242ns ± 0% 243ns ± 0% +0.41% (p=0.000 n=10+10) RegexpMatchMedium_32 87.4ns ± 0% 89.9ns ± 0% +2.86% (p=0.000 n=10+10) RegexpMatchMedium_1K 27.4µs ± 0% 27.4µs ± 0% +0.15% (p=0.000 n=10+10) RegexpMatchHard_32 1.30µs ± 0% 1.32µs ± 1% +1.91% (p=0.000 n=10+10) RegexpMatchHard_1K 39.0µs ± 0% 39.5µs ± 0% +1.38% (p=0.000 n=10+10) Revcomp 316ms ± 0% 319ms ± 0% +1.13% (p=0.000 n=9+8) Template 40.6ms ± 0% 40.6ms ± 0% ~ (p=0.123 n=10+10) TimeParse 224ns ± 0% 224ns ± 0% ~ (all equal) TimeFormat 230ns ± 0% 225ns ± 0% -2.17% (p=0.000 n=10+10) Change-Id: I32a099b65f9e6d4ad7288ed48546655c534757d8 Reviewed-on: https://go-review.googlesource.com/38630 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-05-01 20:59:58 +00:00
Lynn Boger	9248ff46a8	cmd/compile: add rotates to PPC64.rules This updates PPC64.rules to include rules to generate rotates for ADD, OR, XOR operators that combine two opposite shifts that sum to 32 or 64. To support this change opcodes for ROTL and ROTLW were added to be used like the rotldi and rotlwi extended mnemonics. This provides the following improvement in sha3: BenchmarkPermutationFunction-8 302.83 376.40 1.24x BenchmarkSha3_512_MTU-8 98.64 121.92 1.24x BenchmarkSha3_384_MTU-8 136.80 168.30 1.23x BenchmarkSha3_256_MTU-8 169.21 211.29 1.25x BenchmarkSha3_224_MTU-8 179.76 221.19 1.23x BenchmarkShake128_MTU-8 212.87 263.23 1.24x BenchmarkShake256_MTU-8 196.62 245.60 1.25x BenchmarkShake256_16x-8 163.57 194.37 1.19x BenchmarkShake256_1MiB-8 199.02 248.74 1.25x BenchmarkSha3_512_1MiB-8 106.55 133.13 1.25x Fixes #20030 Change-Id: I484c56f48395d32f53ff3ecb3ac6cb8191cfee44 Reviewed-on: https://go-review.googlesource.com/40992 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-04-20 18:05:22 +00:00
Keith Randall	7e07e635f3	cmd/compile: implement non-constant rotates Makes math/bits.Rotate{Left,Right} fast on amd64. name old time/op new time/op delta RotateLeft-12 7.42ns ± 6% 5.45ns ± 6% -26.54% (p=0.000 n=9+10) RotateLeft8-12 4.77ns ± 5% 3.42ns ± 7% -28.25% (p=0.000 n=8+10) RotateLeft16-12 4.82ns ± 8% 3.40ns ± 7% -29.36% (p=0.000 n=10+10) RotateLeft32-12 4.87ns ± 7% 3.48ns ± 7% -28.51% (p=0.000 n=8+9) RotateLeft64-12 5.23ns ±10% 3.35ns ± 6% -35.97% (p=0.000 n=9+10) RotateRight-12 7.59ns ± 8% 5.71ns ± 1% -24.72% (p=0.000 n=10+8) RotateRight8-12 4.98ns ± 7% 3.36ns ± 9% -32.55% (p=0.000 n=10+10) RotateRight16-12 5.12ns ± 2% 3.45ns ± 5% -32.62% (p=0.000 n=10+10) RotateRight32-12 4.80ns ± 6% 3.42ns ±16% -28.68% (p=0.000 n=10+10) RotateRight64-12 4.78ns ± 6% 3.42ns ± 6% -28.50% (p=0.000 n=10+10) Update #18940 Change-Id: Ie79fb5581c489ed4d3b859314c5e669a134c119b Reviewed-on: https://go-review.googlesource.com/39711 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2017-04-17 23:19:45 +00:00
Josh Bleecher Snyder	3d0a898385	cmd/compile: improve output when TestAssembly build fails Change-Id: Ibee84399d81463d3e7d5319626bb0d6b60b86bd9 Reviewed-on: https://go-review.googlesource.com/40861 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-04-17 03:12:34 +00:00
Josh Bleecher Snyder	0d36999a0f	cmd/compile: make TestAssembly resilient to output ordering To preserve reproducible builds, the text entries during compilation will be sorted before being printed. TestAssembly currently assumes that function init comes after all user-defined functions. Remove that assumption. Instead of looking for "TEXT" to tell you where a function ends--which may now yield lots of non-function-code junk--look for a line beginning with non-whitespace. Updates #15756 Change-Id: Ibc82dba6143d769ef4c391afc360e523b1a51348 Reviewed-on: https://go-review.googlesource.com/39853 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2017-04-13 02:30:29 +00:00
Ilya Tocar	e4a500ce14	cmd/compile/internal/gc: improve comparison with constant strings Currently we expand comparison with small constant strings into len check and a sequence of byte comparisons. Generate 16/32/64-bit comparisons, instead of bytewise on 386 and amd64. Also increase limits on what is considered small constant string. Shaves ~30kb (0.5%) from go executable. This also updates test/prove.go to keep test case valid. Change-Id: I99ae8871a1d00c96363c6d03d0b890782fa7e1d9 Reviewed-on: https://go-review.googlesource.com/38776 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2017-04-07 15:40:25 +00:00
Cherry Zhang	257b01f8f4	cmd/compile: use ANDconst to mask out leading/trailing bits on ARM64 For an AND that masks out leading or trailing bits, generic rules rewrite it to a pair of shifts. On ARM64, the mask actually can fit into an AND instruction. So we rewrite it back to AND. Fixes #19857. Change-Id: I479d7320ae4f29bb3f0056d5979bde4478063a8f Reviewed-on: https://go-review.googlesource.com/39651 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2017-04-06 17:59:32 +00:00
Keith Randall	5cadc91b3c	cmd/compile: intrinsics for math/bits.OnesCount Popcount instructions on amd64 are not guaranteed to be present, so we must guard their call. Rewrite rules can't generate control flow at the moment, so the intrinsifier needs to generate that code. name old time/op new time/op delta OnesCount-8 2.47ns ± 5% 1.04ns ± 2% -57.70% (p=0.000 n=10+10) OnesCount16-8 1.05ns ± 1% 0.78ns ± 0% -25.56% (p=0.000 n=9+8) OnesCount32-8 1.63ns ± 5% 1.04ns ± 2% -35.96% (p=0.000 n=10+10) OnesCount64-8 2.45ns ± 0% 1.04ns ± 1% -57.55% (p=0.000 n=6+10) Update #18616 Change-Id: I4aff2cc9aa93787898d7b22055fe272a7cf95673 Reviewed-on: https://go-review.googlesource.com/38320 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-04-04 02:40:11 +00:00
Keith Randall	63a72fd447	cmd/compile: strength-reduce floating point x2 -> x+x x/c, c power of 2 -> x(1/c) Fixes #19827 Change-Id: I74c9f0b5b49b2ed26c0990314c7d1d5f9631b6f1 Reviewed-on: https://go-review.googlesource.com/39295 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2017-04-03 21:27:03 +00:00
Keith Randall	86dc86b4f9	cmd/compile: don't merge load+op if other op arg is still live We want to merge a load and op into a single instruction l = LOAD ptr mem y = OP x l into y = OPload x ptr mem However, all of our OPload instructions require that y uses the same register as x. If x is needed past this instruction, then we must copy x somewhere else, losing the whole benefit of merging the instructions in the first place. Disable this optimization if x is live past the OP. Also disable this optimization if the OP is in a deeper loop than the load. Update #19595 Change-Id: I87f596aad7e91c9127bfb4705cbae47106e1e77a Reviewed-on: https://go-review.googlesource.com/38337 Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2017-03-23 15:53:04 +00:00
Michael Munday	17570a9afb	cmd/compile: emit fused multiply-{add,subtract} on ppc64x A follow on to CL 36963 adding support for ppc64x. Performance changes (as posted on the issue): poly1305: benchmark old ns/op new ns/op delta Benchmark64-16 172 151 -12.21% Benchmark1K-16 1828 1523 -16.68% Benchmark64Unaligned-16 172 151 -12.21% Benchmark1KUnaligned-16 1827 1523 -16.64% math: BenchmarkAcos-16 43.9 39.9 -9.11% BenchmarkAcosh-16 57.0 45.8 -19.65% BenchmarkAsin-16 35.8 33.0 -7.82% BenchmarkAsinh-16 68.6 60.8 -11.37% BenchmarkAtan-16 19.8 16.2 -18.18% BenchmarkAtanh-16 65.5 57.5 -12.21% BenchmarkAtan2-16 45.4 34.2 -24.67% BenchmarkGamma-16 37.6 26.0 -30.85% BenchmarkLgamma-16 40.0 28.2 -29.50% BenchmarkLog1p-16 35.1 29.1 -17.09% BenchmarkSin-16 22.7 18.4 -18.94% BenchmarkSincos-16 31.7 23.7 -25.24% BenchmarkSinh-16 146 131 -10.27% BenchmarkY0-16 130 107 -17.69% BenchmarkY1-16 127 107 -15.75% BenchmarkYn-16 278 235 -15.47% Updates #17895. Change-Id: I1c16199715d20c9c4bd97c4a950bcfa69eb688c1 Reviewed-on: https://go-review.googlesource.com/38095 Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>	2017-03-20 20:01:29 +00:00
Keith Randall	495b167919	cmd/compile: intrinsics for math/bits.{Len,LeadingZeros} name old time/op new time/op delta LeadingZeros-4 2.00ns ± 0% 1.34ns ± 1% -33.02% (p=0.000 n=8+10) LeadingZeros16-4 1.62ns ± 0% 1.57ns ± 0% -3.09% (p=0.001 n=8+9) LeadingZeros32-4 2.14ns ± 0% 1.48ns ± 0% -30.84% (p=0.002 n=8+10) LeadingZeros64-4 2.06ns ± 1% 1.33ns ± 0% -35.08% (p=0.000 n=8+8) 8-bit args is a special case - the Go code is really fast because it is just a single table lookup. So I've disabled that for now. Intrinsics were actually slower: LeadingZeros8-4 1.22ns ± 3% 1.58ns ± 1% +29.56% (p=0.000 n=10+10) Update #18616 Change-Id: Ia9c289b9ba59c583ea64060470315fd637e814cf Reviewed-on: https://go-review.googlesource.com/38311 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-03-16 22:53:49 +00:00
Keith Randall	dd9892e31b	cmd/compile: intrinsify math/bits.ReverseBytes Update #18616 Change-Id: I0c2d643cbbeb131b4c9b12194697afa4af48e1d2 Reviewed-on: https://go-review.googlesource.com/38166 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-03-16 19:41:56 +00:00
Keith Randall	d5dc490519	cmd/compile: intrinsics for math/bits.TrailingZerosX Implement math/bits.TrailingZerosX using intrinsics. Generally reorganize the intrinsic spec a bit. The instrinsics data structure is now built at init time. This will make doing the other functions in math/bits easier. Update sys.CtzX to return int instead of uint{64,32} so it matches math/bits.TrailingZerosX. Improve the intrinsics a bit for amd64. We don't need the CMOV for <64 bit versions. Update #18616 Change-Id: Ic1c5339c943f961d830ae56f12674d7b29d4ff39 Reviewed-on: https://go-review.googlesource.com/38155 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-03-16 02:44:16 +00:00
Josh Bleecher Snyder	3a90bfb253	cmd/dist, cmd/compile: eliminate mergeEnvLists copies This is now handled by os/exec. Updates #12868 Change-Id: Ic21a6ff76a9b9517437ff1acf3a9195f9604bb45 Reviewed-on: https://go-review.googlesource.com/37698 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-03-02 22:26:23 +00:00
Josh Bleecher Snyder	2183135554	cmd/compile: recognize bit test patterns on amd64 Updates #18943 Change-Id: If3080d6133bb6d2710b57294da24c90251ab4e08 Reviewed-on: https://go-review.googlesource.com/36329 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-01 00:36:04 +00:00
Michael Munday	bd8a39b67a	cmd/compile: emit fused multiply-{add,subtract} instructions on s390x Explcitly block fused multiply-add pattern matching when a cast is used after the multiplication, for example: - (a * b) + c // can emit fused multiply-add - float64(a * b) + c // cannot emit fused multiply-add float{32,64} and complex{64,128} casts of matching types are now kept as OCONV operations rather than being replaced with OCONVNOP operations because they now imply a rounding operation (and therefore aren't a no-op anymore). Operations (for example, multiplication) on complex types may utilize fused multiply-add and -subtract instructions internally. There is no way to disable this behavior at the moment. Improves the performance of the floating point implementation of poly1305: name old speed new speed delta 64 246MB/s ± 0% 275MB/s ± 0% +11.48% (p=0.000 n=10+8) 1K 312MB/s ± 0% 357MB/s ± 0% +14.41% (p=0.000 n=10+10) 64Unaligned 246MB/s ± 0% 274MB/s ± 0% +11.43% (p=0.000 n=10+10) 1KUnaligned 312MB/s ± 0% 357MB/s ± 0% +14.39% (p=0.000 n=10+8) Updates #17895. Change-Id: Ia771d275bb9150d1a598f8cc773444663de5ce16 Reviewed-on: https://go-review.googlesource.com/36963 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-02-28 15:34:20 +00:00
Josh Bleecher Snyder	e458264aca	cmd/compile: fix dolinkobj flag in TestAssembly Follow-up to CL 37270. This considerably reduces the time to run the test. Before: real 0m7.638s user 0m14.341s sys 0m2.244s After: real 0m4.867s user 0m7.107s sys 0m1.842s Change-Id: I8837a5da0979a1c365e1ce5874d81708249a4129 Reviewed-on: https://go-review.googlesource.com/37461 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <munday@ca.ibm.com>	2017-02-25 14:39:29 +00:00
Lorenzo Masini	fb1f47a77c	cmd/compile: speed up TestAssembly TestAssembly was very slow, leading to it being skipped by default. This is not surprising, it separately invoked the compiler and parsed the result many times. Now the test assembles one source file for arch/os combination, containing the relevant functions. Tests for each arch/os run in parallel. Now the test runs approximately 10x faster on my Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz. Fixes #18966 Change-Id: I45ab97630b627a32e17900c109f790eb4c0e90d9 Reviewed-on: https://go-review.googlesource.com/37270 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-02-24 21:23:43 +00:00
Kirill Smelkov	4477fd097f	cmd/compile/internal/ssa: combine 2 byte loads + shifts into word load + rolw 8 on AMD64 ... and same for stores. This does for binary.BigEndian.Uint16() what was already done for Uint32 and Uint64 with BSWAP in `10f75748` (CL 32222). Here is how generated code changes e.g. for the following function (omitting saying the same prologue/epilogue): func get16(b [2]byte) uint16 { return binary.BigEndian.Uint16(b[:]) } "".get16 t=1 size=21 args=0x10 locals=0x0 // before 0x0000 00000 (x.go:15) MOVBLZX "".b+9(FP), AX 0x0005 00005 (x.go:15) MOVBLZX "".b+8(FP), CX 0x000a 00010 (x.go:15) SHLL $8, CX 0x000d 00013 (x.go:15) ORL CX, AX // after 0x0000 00000 (x.go:15) MOVWLZX "".b+8(FP), AX 0x0005 00005 (x.go:15) ROLW $8, AX encoding/binary is speedup overall a bit: name old time/op new time/op delta ReadSlice1000Int32s-4 4.83µs ± 0% 4.83µs ± 0% ~ (p=0.206 n=4+5) ReadStruct-4 1.29µs ± 2% 1.28µs ± 1% -1.27% (p=0.032 n=4+5) ReadInts-4 384ns ± 1% 385ns ± 1% ~ (p=0.968 n=4+5) WriteInts-4 534ns ± 3% 526ns ± 0% -1.54% (p=0.048 n=4+5) WriteSlice1000Int32s-4 5.02µs ± 0% 5.11µs ± 3% ~ (p=0.175 n=4+5) PutUint16-4 0.59ns ± 0% 0.49ns ± 2% -16.95% (p=0.016 n=4+5) PutUint32-4 0.52ns ± 0% 0.52ns ± 0% ~ (all equal) PutUint64-4 0.53ns ± 0% 0.53ns ± 0% ~ (all equal) PutUvarint32-4 19.9ns ± 0% 19.9ns ± 1% ~ (p=0.556 n=4+5) PutUvarint64-4 54.5ns ± 1% 54.2ns ± 0% ~ (p=0.333 n=4+5) name old speed new speed delta ReadSlice1000Int32s-4 829MB/s ± 0% 828MB/s ± 0% ~ (p=0.190 n=4+5) ReadStruct-4 58.0MB/s ± 2% 58.7MB/s ± 1% +1.30% (p=0.032 n=4+5) ReadInts-4 78.0MB/s ± 1% 77.8MB/s ± 1% ~ (p=0.968 n=4+5) WriteInts-4 56.1MB/s ± 3% 57.0MB/s ± 0% ~ (p=0.063 n=4+5) WriteSlice1000Int32s-4 797MB/s ± 0% 783MB/s ± 3% ~ (p=0.190 n=4+5) PutUint16-4 3.37GB/s ± 0% 4.07GB/s ± 2% +20.83% (p=0.016 n=4+5) PutUint32-4 7.73GB/s ± 0% 7.72GB/s ± 0% ~ (p=0.556 n=4+5) PutUint64-4 15.1GB/s ± 0% 15.1GB/s ± 0% ~ (p=0.905 n=4+5) PutUvarint32-4 201MB/s ± 0% 201MB/s ± 0% ~ (p=0.905 n=4+5) PutUvarint64-4 147MB/s ± 1% 147MB/s ± 0% ~ (p=0.286 n=4+5) ( "a bit" only because most of the time is spent in reflection-like things there, not actual bytes decoding. Even for direct PutUint16 benchmark the looping adds overhead and lowers visible benefit. For code-generated encoders / decoders actual effect is more than 20% ) Adding Uint32 and Uint64 raw benchmarks too for completeness. NOTE I had to adjust load-combining rule for bswap case to match first 2 bytes loads as result of "2-bytes load+shift" -> "loadw + rorw 8" rewrite. Reason is: for loads+shift, even e.g. into uint16 var var b []byte var v uin16 v = uint16(b[1]) \| uint16(b[0])<<8 the compiler eventually generates L(ong) shift - SHLLconst [8], probably because it is more straightforward / other reasons to work on the whole register. This way 2 bytes rewriting rule is using SHLLconst (not SHLWconst) in its pattern, and then it always gets matched first, even if 2-byte rule comes syntactically after 4-byte rule in AMD64.rules because 4-bytes rule seemingly needs more applyRewrite() cycles to trigger. If 2-bytes rule gets matched for inner half of var b []byte var v uin32 v = uint32(b[3]) \| uint32(b[2])<<8 \| uint32(b[1])<<16 \| uint32(b[0])<<24 and we keep 4-byte load rule unchanged, the result will be MOVW + RORW $8 and then series of byte loads and shifts - not one MOVL + BSWAPL. There is no such problem for stores: there compiler, since it probably knows store destination is 2 bytes wide, uses SHRWconst 8 (not SHRLconst 8) and thus 2-byte store rule is not a subset of rule for 4-byte stores. Fixes #17151 (int16 was last missing piece there) Change-Id: Idc03ba965bfce2b94fef456b02ff6742194748f6 Reviewed-on: https://go-review.googlesource.com/34636 Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-02-14 22:17:08 +00:00
Cherry Zhang	78200799a2	cmd/compile: undo special handling of zero-valued STRUCTLIT CL 35261 introduces special handling of zero-valued STRUCTLIT for efficient struct zeroing. But it didn't cover all use cases, for example, CONVNOP STRUCTLIT is not handled. On the other hand, CL 34566 handles zeroing earlier, so we don't need the change in CL 35261 for efficient zeroing. Other uses of zero-valued struct literals are very rare. So undo the change in walk.go in CL 35261. Add a test for efficient zeroing. Fixes #19084. Change-Id: I0807f7423fb44d47bf325b3c1ce9611a14953853 Reviewed-on: https://go-review.googlesource.com/36955 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2017-02-14 18:57:56 +00:00
Kirill Smelkov	bd91e3569a	cmd/compile/internal/ssa: generate bswap/store for indexed bigendian byte stores too on AMD64 Commit `10f75748` (CL 32222) added rewrite rules to combine byte loads/stores + shifts into larger loads/stores + bswap. For loads both MOVBload and MOVBloadidx1 were handled but for store only MOVBstore was there without MOVBstoreidx added to rewrite pattern. Fix it. Here is how generated code changes for the following 2 functions (ommitting staying the same prologue/epilogue): func put32(b []byte, i int, v uint32) { binary.BigEndian.PutUint32(b[i:], v) } func put64(b []byte, i int, v uint64) { binary.BigEndian.PutUint64(b[i:], v) } "".put32 t=1 size=100 args=0x28 locals=0x0 // before 0x0032 00050 (x.go:5) MOVL CX, DX 0x0034 00052 (x.go:5) SHRL $24, CX 0x0037 00055 (x.go:5) MOVQ "".b+8(FP), BX 0x003c 00060 (x.go:5) MOVB CL, (BX)(AX1) 0x003f 00063 (x.go:5) MOVL DX, CX 0x0041 00065 (x.go:5) SHRL $16, DX 0x0044 00068 (x.go:5) MOVB DL, 1(BX)(AX1) 0x0048 00072 (x.go:5) MOVL CX, DX 0x004a 00074 (x.go:5) SHRL $8, CX 0x004d 00077 (x.go:5) MOVB CL, 2(BX)(AX1) 0x0051 00081 (x.go:5) MOVB DL, 3(BX)(AX1) // after 0x0032 00050 (x.go:5) BSWAPL CX 0x0034 00052 (x.go:5) MOVQ "".b+8(FP), DX 0x0039 00057 (x.go:5) MOVL CX, (DX)(AX1) "".put64 t=1 size=155 args=0x28 locals=0x0 // before 0x0037 00055 (x.go:9) MOVQ CX, DX 0x003a 00058 (x.go:9) SHRQ $56, CX 0x003e 00062 (x.go:9) MOVQ "".b+8(FP), BX 0x0043 00067 (x.go:9) MOVB CL, (BX)(AX1) 0x0046 00070 (x.go:9) MOVQ DX, CX 0x0049 00073 (x.go:9) SHRQ $48, DX 0x004d 00077 (x.go:9) MOVB DL, 1(BX)(AX1) 0x0051 00081 (x.go:9) MOVQ CX, DX 0x0054 00084 (x.go:9) SHRQ $40, CX 0x0058 00088 (x.go:9) MOVB CL, 2(BX)(AX1) 0x005c 00092 (x.go:9) MOVQ DX, CX 0x005f 00095 (x.go:9) SHRQ $32, DX 0x0063 00099 (x.go:9) MOVB DL, 3(BX)(AX1) 0x0067 00103 (x.go:9) MOVQ CX, DX 0x006a 00106 (x.go:9) SHRQ $24, CX 0x006e 00110 (x.go:9) MOVB CL, 4(BX)(AX1) 0x0072 00114 (x.go:9) MOVQ DX, CX 0x0075 00117 (x.go:9) SHRQ $16, DX 0x0079 00121 (x.go:9) MOVB DL, 5(BX)(AX1) 0x007d 00125 (x.go:9) MOVQ CX, DX 0x0080 00128 (x.go:9) SHRQ $8, CX 0x0084 00132 (x.go:9) MOVB CL, 6(BX)(AX1) 0x0088 00136 (x.go:9) MOVB DL, 7(BX)(AX1) // after 0x0033 00051 (x.go:9) BSWAPQ CX 0x0036 00054 (x.go:9) MOVQ "".b+8(FP), DX 0x003b 00059 (x.go:9) MOVQ CX, (DX)(AX1) Updates #17151 Change-Id: I3f4a7f28f210e62e153e60da5abd1d39508cc6c4 Reviewed-on: https://go-review.googlesource.com/34635 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2017-02-14 18:35:43 +00:00
Kirill Smelkov	e2948f7efe	cmd/compile: Show arch/os when something in TestAssembly fails It is not always obvious from the first glance when looking at TestAssembly failure in which context the code was generated. For example x86 and x86-64 are similar, and those of us who do not work with assembly every day can even take s390x version as something similar to x86. So when something fails lets print the whole test context - this includes os and arch which were previously missing. An example failure: before: --- FAIL: TestAssembly (40.48s) asm_test.go:46: expected: MOVWZ $.$, go: import "encoding/binary" func f(b []byte) uint32 { return binary.LittleEndian.Uint32(b) } asm:"".f t=1 size=160 args=0x20 locals=0x0 ... after: --- FAIL: TestAssembly (40.43s) asm_test.go:46: linux/s390x: expected: MOVWZ $.$, go: import "encoding/binary" func f(b []byte) uint32 { return binary.LittleEndian.Uint32(b) } asm:"".f t=1 size=160 args=0x20 locals=0x0 Motivated-by: #18946#issuecomment-279491071 Change-Id: I61089ceec05da7a165718a7d69dec4227dd0e993 Reviewed-on: https://go-review.googlesource.com/36881 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-02-13 20:30:31 +00:00
Michael Munday	074b73b1b2	cmd/compile: fix s390x load-combining rules MOVD{reg,nop} operations (added in CL 36256) inserted to preserve type information were blocking the load-combining rules. Fix this by merging type changes into loads wherever possible. Fixes #19059. Change-Id: I8a1df06eb0f231b40ae43107d4a3bd0b9c441b59 Reviewed-on: https://go-review.googlesource.com/36843 Run-TryBot: Michael Munday <munday@ca.ibm.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-02-13 20:04:14 +00:00
Keith Randall	b548eee3d9	cmd/compile: fix load-combining rules CL 33632 reorders args of commutative ops in order to make CSE for commutative ops more robust. Unfortunately, that broke the load-combining rules which depend on a certain ordering of OR ops' arguments. Introduce some additional rules that order OR ops' arguments consistently so that the load-combining rules fire. Note: there's also something else wrong with the s390x rules. I've filed #19059 for that. Fixes #18946 Change-Id: I0a5447196bd88a55ccee683c69a57b943a9972e1 Reviewed-on: https://go-review.googlesource.com/36911 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2017-02-13 18:29:51 +00:00
Josh Bleecher Snyder	5faba3057d	cmd/compile: use constants directly for fast map access calls CL 35554 taught order.go to use static variables for constants that needed to be addressable for runtime routines. However, there is one class of runtime routines that do not actually need an addressable value: fast map access routines. This CL teaches order.go to avoid using static variables for addressability in those cases. Instead, it avoids introducing a temp at all, which the backend would just have to optimize away. Fixes #19015. Change-Id: I5ef780c604fac3fb48dabb23a344435e283cb832 Reviewed-on: https://go-review.googlesource.com/36693 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-02-10 04:57:20 +00:00
Keith Randall	01c8719f8b	cmd/compile: move rotate instruction generation to SSA Remove rotate generation from walk. Remove OLROT and ssa.Lrot* opcodes. Generate rotates during SSA lowering for architectures that have them. This CL will allow rotates to be generated in more situations, like when the shift values are determined to be constant only after some analysis. Fixes #18254 Change-Id: I8d6d684ff5ce2511aceaddfda98b908007851079 Reviewed-on: https://go-review.googlesource.com/34232 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-02-02 17:57:15 +00:00
Russ Cox	47ce87877b	all: merge dev.inline into master Change-Id: I7715581a04e513dcda9918e853fa6b1ddc703770	2017-02-01 09:47:23 -05:00
Kirill Smelkov	c44da14440	cmd/compile/internal/ssa: add tests for BSWAP on stores on AMD64 Commit `10f75748` (CL 32222) taught AMD64 backend to rewrite series of byte loads or stores with corresponding shifts into a single long or quad load or store + appropriate BSWAP. However it did not added test for stores - only loads were tested. Fix it. NOTE Tests for indexed stores are not added because `10f75748` did not add support for indexed stores - only indexed loads were handled then. Change-Id: I48c867ebe7622ac8e691d43741feed1d40cca0d7 Reviewed-on: https://go-review.googlesource.com/34634 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-12-21 16:36:45 +00:00
Keith Randall	8d21691044	cmd/compile: test for correct zeroing Make sure we generate the right code for zeroing a structure. Check in after Matthew's CL (34564). Update #18370 Change-Id: I987087f979d99227a880b34c44d9d4de6c25ba0c Reviewed-on: https://go-review.googlesource.com/34565 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Keith Randall <khr@golang.org>	2016-12-19 17:36:35 +00:00
Robert Griesemer	eab3707d6d	[dev.inline] cmd/compile: rename various fields from Lineno to Pos Various minor adjustments. Change-Id: Iedfb97989f7bedaa3e9e8993b167e05f162434a7 Reviewed-on: https://go-review.googlesource.com/34136 Reviewed-by: David Lazar <lazard@golang.org>	2016-12-08 21:35:18 +00:00
Ilya Tocar	10f757486e	cmd/compile/internal/ssa: generate bswap on AMD64 Generate bswap+load/store for reading/writing big endian data. Helps encoding/binary. name old time/op new time/op delta ReadSlice1000Int32s-8 5.06µs ± 8% 4.58µs ± 8% -9.50% (p=0.000 n=10+10) ReadStruct-8 1.07µs ± 0% 1.05µs ± 0% -1.51% (p=0.000 n=9+10) ReadInts-8 367ns ± 0% 363ns ± 0% -1.15% (p=0.000 n=8+9) WriteInts-8 475ns ± 1% 469ns ± 0% -1.45% (p=0.000 n=10+10) WriteSlice1000Int32s-8 5.03µs ± 3% 4.50µs ± 3% -10.45% (p=0.000 n=9+9) PutUvarint32-8 17.2ns ± 0% 17.2ns ± 0% ~ (all samples are equal) PutUvarint64-8 46.7ns ± 0% 46.7ns ± 0% ~ (p=0.509 n=10+10) name old speed new speed delta ReadSlice1000Int32s-8 791MB/s ± 8% 875MB/s ± 8% +10.53% (p=0.000 n=10+10) ReadStruct-8 70.0MB/s ± 0% 71.1MB/s ± 0% +1.54% (p=0.000 n=9+10) ReadInts-8 81.6MB/s ± 0% 82.6MB/s ± 0% +1.21% (p=0.000 n=9+9) WriteInts-8 63.0MB/s ± 1% 63.9MB/s ± 0% +1.45% (p=0.000 n=10+10) WriteSlice1000Int32s-8 796MB/s ± 4% 888MB/s ± 3% +11.65% (p=0.000 n=9+9) PutUvarint32-8 233MB/s ± 0% 233MB/s ± 0% ~ (p=0.089 n=10+10) PutUvarint64-8 171MB/s ± 0% 171MB/s ± 0% ~ (p=0.137 n=10+9) Change-Id: Ia2dbdef92198eaa7e2af5443a8ed586d4b401ffb Reviewed-on: https://go-review.googlesource.com/32222 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-11-03 12:34:12 +00:00
Josh Bleecher Snyder	66504485eb	cmd/compile/internal/gc: make tests run faster TestAssembly takes 20s on my machine, which is too slow for normal operation. Marking as -short has its dangers (#17472), but hopefully we'll soon have a builder for that. All the SSA tests are hermetic and not time sensitive and can thus be run in parallel. Reduces the cmd/compile/internal/gc test time during all.bash on my laptop from 42s to 7s. Updates #17751 Change-Id: Idd876421db23b9fa3475e8a9b3355a5dc92a5a29 Reviewed-on: https://go-review.googlesource.com/32585 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-11-03 01:07:08 +00:00
Keith Randall	26a6131bac	cmd/compile: fix 4-byte unaligned load rules The 2-byte rule was firing before the 4-byte rule, preventing the 4-byte rule from firing. Update the 4-byte rule to use the results of the 2-byte rule instead. Add some tests to make sure we don't regress again. Fixes #17147 Change-Id: Icfeccd9f2b96450981086a52edd76afb3191410a Reviewed-on: https://go-review.googlesource.com/29382 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-09-23 19:32:37 +00:00
Keith Randall	842b05832f	all: use testing.GoToolPath instead of "go" This change makes sure that tests are run with the correct version of the go tool. The correct version is the one that we invoked with "go test", not the one that is first in our path. Fixes #16577 Change-Id: If22c8f8c3ec9e7c35d094362873819f2fbb8559b Reviewed-on: https://go-review.googlesource.com/28089 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-08-30 22:49:11 +00:00
Cherry Zhang	29f0984a35	cmd/compile: don't set line number to 0 when building SSA The frontend may emit node with line number missing. In this case, use the parent line number. Instead of changing every call site of pushLine, do it in pushLine itself. Fixes #16214. Change-Id: I80390550b56e4d690fc770b01ff725b892ffd6dc Reviewed-on: https://go-review.googlesource.com/24641 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-07-01 01:12:24 +00:00
David du Colombier	e29e0ba19a	cmd/compile: fix TestAssembly on Plan 9 Since CL 23620, TestAssembly is failing on Plan 9. In CL 23620, the process environment is passed to 'go tool compile' after setting GOARCH. On Plan 9, if GOARCH is already set in the process environment, it would take precedence. On Unix, it works as expected because the first GOARCH found takes precedence. This change uses the mergeEnvLists function from cmd/go/main.go to merge the two environment lists such that variables with the same name in "in" replace those in "out". Change-Id: Idee22058343932ee18666dda331c562c89c33507 Reviewed-on: https://go-review.googlesource.com/23593 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David du Colombier <0intro@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-06-01 13:33:43 +00:00
Michael Hudson-Doyle	2885e07c25	cmd/compile: pass process env to 'go tool compile' in compileToAsm In particular, this stops the test failing when GOROOT and GOROOT_FINAL are different. Change-Id: Ibf6cc0a173f1d965ee8aa31eee2698b223f1ceec Reviewed-on: https://go-review.googlesource.com/23620 Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2016-06-01 03:55:09 +00:00
Keith Randall	9369f22b84	cmd/compile: testing harness for checking generated assembly Add a test which compiles a function and checks the generated assembly to make sure certain patterns are present. This test allows us to do white box tests of the compiler to make sure optimizations don't regress. Added a few simple tests for now. More to come. Change-Id: I4ab5ce5d95b9e04e7d0d9328ffae47b8d1f95e74 Reviewed-on: https://go-review.googlesource.com/23403 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-05-26 23:07:01 +00:00

47 commits