Stowage/go - Remotebranch.eu

Stowage/go

mirror of https://github.com/golang/go.git synced 2025-12-08 06:10:04 +00:00

Author	SHA1	Message	Date
Giovanni Bajo	0cacc4d0e2	cmd/compile: fold LEAQ/ADDQconst into SETx ops This saves an instruction and a register. The new rules match ~4900 times during all.bash. Change-Id: I2f867c5e70262004e31f545f3bb89e939c45b718 Reviewed-on: https://go-review.googlesource.com/94767 Reviewed-by: Keith Randall <khr@golang.org>	2018-02-20 22:32:35 +00:00
philhofer	2d0172c3a7	cmd/compile/internal/ssa: emit csel on arm64 Introduce a new SSA pass to generate CondSelect intstrutions, and add CondSelect lowering rules for arm64. In order to make the CSEL instruction easier to optimize, and to simplify the introduction of CSNEG, CSINC, and CSINV in the future, modify the CSEL instruction to accept a condition code in the aux field. Notably, this change makes the go1 Gzip benchmark more than 10% faster. Benchmarks on a Cavium ThunderX: name old time/op new time/op delta BinaryTree17-96 15.9s ± 6% 16.0s ± 4% ~ (p=0.968 n=10+9) Fannkuch11-96 7.17s ± 0% 7.00s ± 0% -2.43% (p=0.000 n=8+9) FmtFprintfEmpty-96 208ns ± 1% 207ns ± 0% ~ (p=0.152 n=10+8) FmtFprintfString-96 379ns ± 0% 375ns ± 0% -0.95% (p=0.000 n=10+9) FmtFprintfInt-96 385ns ± 0% 383ns ± 0% -0.52% (p=0.000 n=9+10) FmtFprintfIntInt-96 591ns ± 0% 586ns ± 0% -0.85% (p=0.006 n=7+9) FmtFprintfPrefixedInt-96 656ns ± 0% 667ns ± 0% +1.71% (p=0.000 n=10+10) FmtFprintfFloat-96 967ns ± 0% 984ns ± 0% +1.78% (p=0.000 n=10+10) FmtManyArgs-96 2.35µs ± 0% 2.25µs ± 0% -4.63% (p=0.000 n=9+8) GobDecode-96 31.0ms ± 0% 30.8ms ± 0% -0.36% (p=0.006 n=9+9) GobEncode-96 24.4ms ± 0% 24.5ms ± 0% +0.30% (p=0.000 n=9+9) Gzip-96 1.60s ± 0% 1.43s ± 0% -10.58% (p=0.000 n=9+10) Gunzip-96 167ms ± 0% 169ms ± 0% +0.83% (p=0.000 n=8+9) HTTPClientServer-96 311µs ± 1% 308µs ± 0% -0.75% (p=0.000 n=10+10) JSONEncode-96 65.0ms ± 0% 64.8ms ± 0% -0.25% (p=0.000 n=9+8) JSONDecode-96 262ms ± 1% 261ms ± 1% ~ (p=0.579 n=10+10) Mandelbrot200-96 18.0ms ± 0% 18.1ms ± 0% +0.17% (p=0.000 n=8+10) GoParse-96 14.0ms ± 0% 14.1ms ± 1% +0.42% (p=0.003 n=9+10) RegexpMatchEasy0_32-96 644ns ± 2% 645ns ± 2% ~ (p=0.836 n=10+10) RegexpMatchEasy0_1K-96 3.70µs ± 0% 3.49µs ± 0% -5.58% (p=0.000 n=10+10) RegexpMatchEasy1_32-96 662ns ± 2% 657ns ± 2% ~ (p=0.137 n=10+10) RegexpMatchEasy1_1K-96 4.47µs ± 0% 4.31µs ± 0% -3.48% (p=0.000 n=10+10) RegexpMatchMedium_32-96 844ns ± 2% 849ns ± 1% ~ (p=0.208 n=10+10) RegexpMatchMedium_1K-96 179µs ± 0% 182µs ± 0% +1.20% (p=0.000 n=10+10) RegexpMatchHard_32-96 10.0µs ± 0% 10.1µs ± 0% +0.48% (p=0.000 n=10+9) RegexpMatchHard_1K-96 297µs ± 0% 297µs ± 0% -0.14% (p=0.000 n=10+10) Revcomp-96 3.08s ± 0% 3.13s ± 0% +1.56% (p=0.000 n=9+9) Template-96 276ms ± 2% 275ms ± 1% ~ (p=0.393 n=10+10) TimeParse-96 1.37µs ± 0% 1.36µs ± 0% -0.53% (p=0.000 n=10+7) TimeFormat-96 1.40µs ± 0% 1.42µs ± 0% +0.97% (p=0.000 n=10+10) [Geo mean] 264µs 262µs -0.77% Change-Id: Ie54eee4b3092af53e6da3baa6d1755098f57f3a2 Reviewed-on: https://go-review.googlesource.com/55670 Run-TryBot: Philip Hofer <phofer@umich.edu> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2018-02-20 06:00:54 +00:00
Chad Rosier	07f0f09563	cmd/compile: make math.Ceil/Floor/Round/Trunc intrinsics on arm64 name old time/op new time/op delta Ceil 550ns ± 0% 486ns ± 7% -11.64% (p=0.000 n=13+18) Floor 495ns ±19% 512ns ±12% ~ (p=0.164 n=20+20) Round 550ns ± 0% 487ns ± 8% -11.49% (p=0.000 n=12+19) Trunc 563ns ± 7% 488ns ±13% -13.44% (p=0.000 n=15+2) Change-Id: I53f234b160b3c026a277506e2cf977d150379464 Reviewed-on: https://go-review.googlesource.com/88295 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-02-16 15:37:57 +00:00
Balaram Makam	fcba05148f	cmd/compile: arm64 intrinsics for math/bits.OnesCount This adds math/bits intrinsics for OnesCount on arm64. name old time/op new time/op delta OnesCount 3.81ns ± 0% 1.60ns ± 0% -57.96% (p=0.000 n=7+8) OnesCount8 1.60ns ± 0% 1.60ns ± 0% ~ (all equal) OnesCount16 2.41ns ± 0% 1.60ns ± 0% -33.61% (p=0.000 n=8+8) OnesCount32 4.17ns ± 0% 1.60ns ± 0% -61.58% (p=0.000 n=8+8) OnesCount64 3.80ns ± 0% 1.60ns ± 0% -57.84% (p=0.000 n=8+8) Update #18616 Conflicts: src/cmd/compile/internal/gc/asm_test.go Change-Id: I63ac2f63acafdb1f60656ab8a56be0b326eec5cb Reviewed-on: https://go-review.googlesource.com/90835 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-02-15 23:00:20 +00:00
Chad Rosier	51932c326f	cmd/compile: improve absorb shifts optimization for arm64 Current absorb shifts optimization can generate dead Value nodes which increase use count of other live nodes. It will impact other optimizations (such as combined loads) which are enabled based on specific use count. This patch fixes the issue by decreasing the use count of nodes referenced by dead Value nodes generated by absorb shifts optimization. Performance impacts on go1 benchmarks (data collected on A57@2GHzx8): name old time/op new time/op delta BinaryTree17-8 6.28s ± 2% 6.24s ± 1% ~ (p=0.065 n=10+9) Fannkuch11-8 6.32s ± 0% 6.33s ± 0% +0.17% (p=0.000 n=10+10) FmtFprintfEmpty-8 98.9ns ± 0% 99.2ns ± 0% +0.34% (p=0.000 n=9+7) FmtFprintfString-8 183ns ± 1% 182ns ± 1% -1.01% (p=0.005 n=9+10) FmtFprintfInt-8 199ns ± 1% 202ns ± 1% +1.41% (p=0.000 n=10+9) FmtFprintfIntInt-8 272ns ± 1% 276ns ± 3% +1.36% (p=0.015 n=10+10) FmtFprintfPrefixedInt-8 367ns ± 1% 369ns ± 1% +0.68% (p=0.042 n=10+10) FmtFprintfFloat-8 491ns ± 1% 493ns ± 1% ~ (p=0.064 n=10+10) FmtManyArgs-8 1.31µs ± 1% 1.32µs ± 1% +0.39% (p=0.042 n=8+9) GobDecode-8 17.0ms ± 2% 16.2ms ± 2% -4.74% (p=0.000 n=10+10) GobEncode-8 13.7ms ± 2% 13.4ms ± 1% -2.40% (p=0.000 n=10+9) Gzip-8 844ms ± 0% 737ms ± 0% -12.70% (p=0.000 n=10+10) Gunzip-8 84.4ms ± 1% 83.9ms ± 0% -0.55% (p=0.000 n=10+8) HTTPClientServer-8 122µs ± 1% 124µs ± 1% +1.75% (p=0.000 n=10+9) JSONEncode-8 34.9ms ± 1% 32.4ms ± 0% -7.11% (p=0.000 n=10+9) JSONDecode-8 150ms ± 0% 146ms ± 1% -2.84% (p=0.000 n=7+10) Mandelbrot200-8 10.0ms ± 0% 10.0ms ± 0% ~ (p=0.529 n=10+10) GoParse-8 8.18ms ± 1% 8.03ms ± 0% -1.93% (p=0.000 n=10+10) RegexpMatchEasy0_32-8 209ns ± 0% 209ns ± 0% ~ (p=0.248 n=10+9) RegexpMatchEasy0_1K-8 789ns ± 1% 790ns ± 0% ~ (p=0.361 n=10+10) RegexpMatchEasy1_32-8 202ns ± 0% 202ns ± 1% ~ (p=0.137 n=8+10) RegexpMatchEasy1_1K-8 1.12µs ± 2% 1.12µs ± 1% ~ (p=0.810 n=10+10) RegexpMatchMedium_32-8 298ns ± 0% 298ns ± 0% ~ (p=0.443 n=10+9) RegexpMatchMedium_1K-8 83.0µs ± 5% 78.6µs ± 0% -5.37% (p=0.000 n=10+10) RegexpMatchHard_32-8 4.32µs ± 0% 4.26µs ± 0% -1.47% (p=0.000 n=10+10) RegexpMatchHard_1K-8 132µs ± 4% 126µs ± 0% -4.41% (p=0.000 n=10+9) Revcomp-8 1.11s ± 0% 1.11s ± 0% +0.14% (p=0.017 n=10+9) Template-8 155ms ± 1% 155ms ± 1% ~ (p=0.796 n=10+10) TimeParse-8 774ns ± 1% 785ns ± 1% +1.41% (p=0.001 n=10+10) TimeFormat-8 788ns ± 1% 806ns ± 1% +2.24% (p=0.000 n=10+9) name old speed new speed delta GobDecode-8 45.2MB/s ± 2% 47.5MB/s ± 2% +4.96% (p=0.000 n=10+10) GobEncode-8 56.0MB/s ± 2% 57.4MB/s ± 1% +2.44% (p=0.000 n=10+9) Gzip-8 23.0MB/s ± 0% 26.3MB/s ± 0% +14.55% (p=0.000 n=10+10) Gunzip-8 230MB/s ± 1% 231MB/s ± 0% +0.55% (p=0.000 n=10+8) JSONEncode-8 55.6MB/s ± 1% 59.9MB/s ± 0% +7.65% (p=0.000 n=10+9) JSONDecode-8 12.9MB/s ± 0% 13.3MB/s ± 1% +2.94% (p=0.000 n=7+10) GoParse-8 7.08MB/s ± 1% 7.22MB/s ± 0% +1.95% (p=0.000 n=10+10) RegexpMatchEasy0_32-8 153MB/s ± 0% 153MB/s ± 0% -0.16% (p=0.023 n=10+10) RegexpMatchEasy0_1K-8 1.30GB/s ± 1% 1.30GB/s ± 0% ~ (p=0.393 n=10+10) RegexpMatchEasy1_32-8 158MB/s ± 0% 158MB/s ± 0% ~ (p=0.684 n=10+10) RegexpMatchEasy1_1K-8 915MB/s ± 2% 918MB/s ± 1% ~ (p=0.796 n=10+10) RegexpMatchMedium_32-8 3.35MB/s ± 0% 3.35MB/s ± 0% ~ (p=1.000 n=10+9) RegexpMatchMedium_1K-8 12.3MB/s ± 5% 13.0MB/s ± 0% +5.56% (p=0.000 n=10+10) RegexpMatchHard_32-8 7.40MB/s ± 0% 7.51MB/s ± 0% +1.50% (p=0.000 n=10+10) RegexpMatchHard_1K-8 7.75MB/s ± 4% 8.10MB/s ± 0% +4.52% (p=0.000 n=10+8) Revcomp-8 229MB/s ± 0% 228MB/s ± 0% -0.14% (p=0.017 n=10+9) Template-8 12.5MB/s ± 1% 12.5MB/s ± 1% ~ (p=0.780 n=10+10) Change-Id: I103389f168eac79f6af44e8fef93acc2a7a4ac96 Reviewed-on: https://go-review.googlesource.com/88415 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-02-15 20:54:50 +00:00
Chad Rosier	cdd961630c	cmd/compile: generate tbz/tbnz when comparing against zero on arm64 The tbz/tbnz checks the sign bit to determine if the value is >= 0 or < 0. go1 benchmark results: name old speed new speed delta JSONEncode 94.4MB/s ± 1% 95.7MB/s ± 0% +1.36% (p=0.000 n=10+9) JSONDecode 19.7MB/s ± 1% 19.9MB/s ± 1% +1.08% (p=0.000 n=9+10) Gzip 45.5MB/s ± 0% 46.0MB/s ± 0% +1.06% (p=0.000 n=10+10) Revcomp 376MB/s ± 0% 379MB/s ± 0% +0.69% (p=0.000 n=10+10) RegexpMatchHard_1K 12.6MB/s ± 0% 12.7MB/s ± 0% +0.57% (p=0.000 n=10+8) RegexpMatchMedium_32 3.21MB/s ± 0% 3.22MB/s ± 0% +0.31% (p=0.000 n=9+10) RegexpMatchEasy1_1K 1.27GB/s ± 0% 1.27GB/s ± 0% +0.23% (p=0.000 n=9+9) RegexpMatchHard_32 11.4MB/s ± 0% 11.4MB/s ± 1% +0.19% (p=0.036 n=10+8) RegexpMatchEasy0_1K 1.77GB/s ± 0% 1.77GB/s ± 0% +0.13% (p=0.000 n=9+10) RegexpMatchMedium_1K 19.3MB/s ± 0% 19.3MB/s ± 0% +0.04% (p=0.008 n=10+8) RegexpMatchEasy0_32 131MB/s ± 0% 131MB/s ± 0% ~ (p=0.211 n=10+10) GobDecode 57.5MB/s ± 1% 57.6MB/s ± 2% ~ (p=0.469 n=10+10) GobEncode 58.6MB/s ± 1% 58.5MB/s ± 2% ~ (p=0.781 n=10+10) GoParse 9.40MB/s ± 0% 9.39MB/s ± 0% -0.19% (p=0.005 n=10+9) RegexpMatchEasy1_32 133MB/s ± 0% 133MB/s ± 0% -0.48% (p=0.000 n=10+10) Template 20.9MB/s ± 0% 20.6MB/s ± 0% -1.54% (p=0.000 n=8+10) Change-Id: I411efe44db35c3962445618d5a47c12e31b3925b Reviewed-on: https://go-review.googlesource.com/92715 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-02-14 15:52:41 +00:00
Austin Clements	2010189407	runtime: remove legacy eager write barrier Now that the buffered write barrier is implemented for all architectures, we can remove the old eager write barrier implementation. This CL removes the implementation from the runtime, support in the compiler for calling it, and updates some compiler tests that relied on the old eager barrier support. It also makes sure that all of the useful comments from the old write barrier implementation still have a place to live. Fixes #22460. Updates #21640 since this fixes the layering concerns of the write barrier (but not the other things in that issue). Change-Id: I580f93c152e89607e0a72fe43370237ba97bae74 Reviewed-on: https://go-review.googlesource.com/92705 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-02-13 16:34:46 +00:00
Keith Randall	23e8e197b0	cmd/compile: use unsigned loads for multi-element comparisons When loading multiple elements of an array into a single register, make sure we treat them as unsigned. When treated as signed, the upper bits might all be set, causing the shift-or combo to clobber the values higher in the register. Fixes #23719. Change-Id: Ic87da03e9bd0fe2c60bb214b99f846e4e9446052 Reviewed-on: https://go-review.googlesource.com/92335 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2018-02-06 18:24:33 +00:00
Caleb Spare	67fdf587dc	cmd/compile: don't combine 64-bit loads/stores on amd64 This causes a performance regression for some calls. Fixes #23424. Updates #6853. Change-Id: Id1db652d5aca0ce631a3417c0c056d6637fefa9e Reviewed-on: https://go-review.googlesource.com/88135 Run-TryBot: Caleb Spare <cespare@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-01-17 22:05:33 +00:00
Keith Randall	fa1f52c5f6	cmd/compile: always nil check before interface call Fixes #22703 The fix was already done by Cherry for defer/go of an interface call (CL 23820). We just need to do it everywhere. Change-Id: I0115d22e443931fe1bcce44c93c4d0770b5fd268 Reviewed-on: https://go-review.googlesource.com/77450 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-11-14 05:39:45 +00:00
Alberto Donizetti	33a9f01729	cmd/compile: add mul by ±2ⁿ code-generation tests for arm/arm64 This change adds code generation tests for multiplication by ±2ⁿ for arm and arm64, in preparation for a future CL which will remove the relevant architecture-specific SSA rules (the reduction is already performed by rules in generic.rules added in CL 36323). Change-Id: Iebdd5c3bb2fc632c85888569ff0c49f78569a862 Reviewed-on: https://go-review.googlesource.com/75752 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-11-04 10:28:27 +00:00
Lynn Boger	bb1fd3b5ff	cmd/compile: add rules to improve consecutive byte loads and stores on ppc64le This adds new rules to recognize consecutive byte loads and stores and lowers them to loads and stores such as lhz, lwz, ld, sth, stw, std. This change only covers the little endian cases on little endian machines, such as is found in encoding/binary UintXX or PutUintXX for little endian. Big endian will be done later. Updates were also made to binary_test.go to allow the benchmark for Uint and PutUint to actually use those functions because the way they were written, those functions were being optimized out. Testcases were also added to cmd/compile/internal/gc/asm_test.go. Updates #22496 The following improvement can be found in golang.org/x/crypto poly1305: Benchmark64-16 142 114 -19.72% Benchmark1K-16 1717 1424 -17.06% Benchmark64Unaligned-16 142 113 -20.42% Benchmark1KUnaligned-16 1721 1428 -17.02% chacha20poly1305: BenchmarkChacha20Poly1305Open_64-16 1012 885 -12.55% BenchmarkChacha20Poly1305Seal_64-16 971 836 -13.90% BenchmarkChacha20Poly1305Open_1350-16 11113 9539 -14.16% BenchmarkChacha20Poly1305Seal_1350-16 11013 9392 -14.72% BenchmarkChacha20Poly1305Open_8K-16 61074 53431 -12.51% BenchmarkChacha20Poly1305Seal_8K-16 61214 54806 -10.47% Other improvements of around 10% found in crypto/tls. Results after updating encoding/binary/binary_test.go: BenchmarkLittleEndianPutUint64-16 1.87 0.93 -50.27% BenchmarkLittleEndianPutUint32-16 1.19 0.93 -21.85% BenchmarkLittleEndianPutUint16-16 1.16 1.03 -11.21% Change-Id: I7bbe2fbcbd11362d58662fecd907a0c07e6ca2fb Reviewed-on: https://go-review.googlesource.com/74410 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <mike.munday@ibm.com>	2017-11-03 18:46:59 +00:00
Ilya Tocar	f3884680fc	cmd/compile/internal/ssa: inline memmove with known size Replace calls to memmove with known (constant) size, with OpMove. Do it only if it is safe from aliasing point of view. Helps with code like this: append(buf,"const str"...) In strconv this provides nice benefit: Quote-6 731ns ± 2% 647ns ± 3% -11.41% (p=0.000 n=10+10) QuoteRune-6 117ns ± 5% 111ns ± 1% -4.54% (p=0.000 n=10+10) AppendQuote-6 475ns ± 0% 396ns ± 0% -16.59% (p=0.000 n=9+10) AppendQuoteRune-6 32.0ns ± 0% 27.4ns ± 0% -14.41% (p=0.000 n=8+9) Change-Id: I7704f5c51b46aed2d8f033de74c75140fc35036c Reviewed-on: https://go-review.googlesource.com/54394 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-11-02 20:30:25 +00:00
Michael Munday	4745604bcb	cmd/compile: intrinsify math.RoundToEven on s390x The new RoundToEven function can be implemented as a single FIDBR instruction on s390x. name old time/op new time/op delta RoundToEven 5.32ns ± 1% 0.86ns ± 1% -83.86% (p=0.000 n=10+10) Change-Id: Iaf597e57a0d1085961701e3c75ff4f6f6dcebb5f Reviewed-on: https://go-review.googlesource.com/74350 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-10-31 18:04:27 +00:00
Michael Munday	96cdacb971	cmd/asm, cmd/compile: optimize math.Abs and math.Copysign on s390x This change adds three new instructions: - LPDFR: load positive (math.Abs(x)) - LNDFR: load negative (-math.Abs(x)) - CPSDR: copy sign (math.Copysign(x, y)) By making use of GPR <-> FPR moves we can now compile math.Abs and math.Copysign to these instructions using SSA rules. This CL also adds new rules to merge address generation into combined load operations. This makes GPR <-> FPR move matching more reliable. name old time/op new time/op delta Copysign 1.85ns ± 0% 1.40ns ± 1% -24.65% (p=0.000 n=8+10) Abs 1.58ns ± 1% 0.73ns ± 1% -53.64% (p=0.000 n=10+10) The geo mean improvement for all math package benchmarks was 4.6%. Change-Id: I0cec35c5c1b3fb45243bf666b56b57faca981bc9 Reviewed-on: https://go-review.googlesource.com/73950 Run-TryBot: Michael Munday <mike.munday@ibm.com> Reviewed-by: Keith Randall <khr@golang.org>	2017-10-30 23:42:51 +00:00
Austin Clements	7e343134d3	cmd/compile: compiler support for buffered write barrier This CL implements the compiler support for calling the buffered write barrier added by the previous CL. Since the buffered write barrier is only implemented on amd64 right now, this still supports the old, eager write barrier as well. There's little overhead to supporting both and this way a few tests in test/fixedbugs that expect to have liveness maps at write barrier calls can easily opt-in to the old, eager barrier. This significantly improves the performance of the write barrier: name old time/op new time/op delta WriteBarrier-12 73.5ns ±20% 19.2ns ±27% -73.90% (p=0.000 n=19+18) It also reduces the size of binaries because the write barrier call is more compact: name old object-bytes new object-bytes delta Template 398k ± 0% 393k ± 0% -1.14% (p=0.008 n=5+5) Unicode 208k ± 0% 206k ± 0% -1.00% (p=0.008 n=5+5) GoTypes 1.18M ± 0% 1.15M ± 0% -2.00% (p=0.008 n=5+5) Compiler 4.05M ± 0% 3.88M ± 0% -4.26% (p=0.008 n=5+5) SSA 8.25M ± 0% 8.11M ± 0% -1.59% (p=0.008 n=5+5) Flate 228k ± 0% 224k ± 0% -1.83% (p=0.008 n=5+5) GoParser 295k ± 0% 284k ± 0% -3.62% (p=0.008 n=5+5) Reflect 1.00M ± 0% 0.99M ± 0% -0.70% (p=0.008 n=5+5) Tar 339k ± 0% 333k ± 0% -1.67% (p=0.008 n=5+5) XML 404k ± 0% 395k ± 0% -2.10% (p=0.008 n=5+5) [Geo mean] 704k 690k -2.00% name old exe-bytes new exe-bytes delta HelloSize 1.05M ± 0% 1.04M ± 0% -1.55% (p=0.008 n=5+5) https://perf.golang.org/search?q=upload:20171027.1 (Amusingly, this also reduces compiler allocations by 0.75%, which, combined with the better write barrier, speeds up the compiler overall by 2.10%. See the perf link.) It slightly improves the performance of most of the go1 benchmarks and improves the performance of the x/benchmarks: name old time/op new time/op delta BinaryTree17-12 2.40s ± 1% 2.47s ± 1% +2.69% (p=0.000 n=19+19) Fannkuch11-12 2.95s ± 0% 2.95s ± 0% +0.21% (p=0.000 n=20+19) FmtFprintfEmpty-12 41.8ns ± 4% 41.4ns ± 2% -1.03% (p=0.014 n=20+20) FmtFprintfString-12 68.7ns ± 2% 67.5ns ± 1% -1.75% (p=0.000 n=20+17) FmtFprintfInt-12 79.0ns ± 3% 77.1ns ± 1% -2.40% (p=0.000 n=19+17) FmtFprintfIntInt-12 127ns ± 1% 123ns ± 3% -3.42% (p=0.000 n=20+20) FmtFprintfPrefixedInt-12 152ns ± 1% 150ns ± 1% -1.02% (p=0.000 n=18+17) FmtFprintfFloat-12 211ns ± 1% 209ns ± 0% -0.99% (p=0.000 n=20+16) FmtManyArgs-12 500ns ± 0% 496ns ± 0% -0.73% (p=0.000 n=17+20) GobDecode-12 6.44ms ± 1% 6.53ms ± 0% +1.28% (p=0.000 n=20+19) GobEncode-12 5.46ms ± 0% 5.46ms ± 1% ~ (p=0.550 n=19+20) Gzip-12 220ms ± 1% 216ms ± 0% -1.75% (p=0.000 n=19+19) Gunzip-12 38.8ms ± 0% 38.6ms ± 0% -0.30% (p=0.000 n=18+19) HTTPClientServer-12 79.0µs ± 1% 78.2µs ± 1% -1.01% (p=0.000 n=20+20) JSONEncode-12 11.9ms ± 0% 11.9ms ± 0% -0.29% (p=0.000 n=20+19) JSONDecode-12 52.6ms ± 0% 52.2ms ± 0% -0.68% (p=0.000 n=19+20) Mandelbrot200-12 3.69ms ± 0% 3.68ms ± 0% -0.36% (p=0.000 n=20+20) GoParse-12 3.13ms ± 1% 3.18ms ± 1% +1.67% (p=0.000 n=19+20) RegexpMatchEasy0_32-12 73.2ns ± 1% 72.3ns ± 1% -1.19% (p=0.000 n=19+18) RegexpMatchEasy0_1K-12 241ns ± 0% 239ns ± 0% -0.83% (p=0.000 n=17+16) RegexpMatchEasy1_32-12 68.6ns ± 1% 69.0ns ± 1% +0.47% (p=0.015 n=18+16) RegexpMatchEasy1_1K-12 364ns ± 0% 361ns ± 0% -0.67% (p=0.000 n=16+17) RegexpMatchMedium_32-12 104ns ± 1% 103ns ± 1% -0.79% (p=0.001 n=20+15) RegexpMatchMedium_1K-12 33.8µs ± 3% 34.0µs ± 2% ~ (p=0.267 n=20+19) RegexpMatchHard_32-12 1.64µs ± 1% 1.62µs ± 2% -1.25% (p=0.000 n=19+18) RegexpMatchHard_1K-12 49.2µs ± 0% 48.7µs ± 1% -0.93% (p=0.000 n=19+18) Revcomp-12 391ms ± 5% 396ms ± 7% ~ (p=0.154 n=19+19) Template-12 63.1ms ± 0% 59.5ms ± 0% -5.76% (p=0.000 n=18+19) TimeParse-12 307ns ± 0% 306ns ± 0% -0.39% (p=0.000 n=19+17) TimeFormat-12 325ns ± 0% 323ns ± 0% -0.50% (p=0.000 n=19+19) [Geo mean] 47.3µs 46.9µs -0.67% https://perf.golang.org/search?q=upload:20171026.1 name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.25ms ± 1% 2.20ms ± 1% -2.31% (p=0.000 n=18+18) HTTP-12 12.6µs ± 0% 12.6µs ± 0% -0.72% (p=0.000 n=18+17) JSON-12 11.0ms ± 0% 11.0ms ± 1% -0.68% (p=0.000 n=17+19) https://perf.golang.org/search?q=upload:20171026.2 Updates #14951. Updates #22460. Change-Id: Id4c0932890a1d41020071bec73b8522b1367d3e7 Reviewed-on: https://go-review.googlesource.com/73712 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-10-30 18:12:46 +00:00
Lynn Boger	4d0151ede5	cmd/compile,cmd/internal/obj/ppc64: make math.Abs,math.Copysign instrinsics on ppc64x This adds support for math Abs, Copysign to be instrinsics on ppc64x. New instruction FCPSGN is added to generate fcpsgn. Some new rules are added to improve the int<->float conversions that are generated mainly due to the Float64bits and Float64frombits in the math package. PPC64.rules is also modified as suggested in the review for CL 63290. Improvements: benchmark old ns/op new ns/op delta BenchmarkAbs-16 1.12 0.69 -38.39% BenchmarkCopysign-16 1.30 0.93 -28.46% BenchmarkNextafter32-16 9.34 8.05 -13.81% BenchmarkFrexp-16 8.81 7.60 -13.73% Others that used Copysign also saw smaller improvements. I attempted to make this work using rules since that seems to be preferred, but due to the use of Float64bits and Float64frombits in these functions, several rules had to be added and even then not all cases were matched. Using rules became too complicated and seemed too fragile for these. Updates #21390 Change-Id: Ia265da9a18355e08000818a4fba1a40e9e031995 Reviewed-on: https://go-review.googlesource.com/67130 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Keith Randall <khr@golang.org>	2017-10-30 13:56:39 +00:00
Hugues Bruant	3c46f49f94	cmd/compile: fix incorrect go:noinline usage This pragma is not actually honored by the compiler. The tests implicitly relied on the inliner being unable to inline closures with captured variables, which will soon change. Fixes #22208 Change-Id: I13abc9c930b9156d43ec216f8efb768952a29439 Reviewed-on: https://go-review.googlesource.com/73211 Reviewed-by: Michael Munday <mike.munday@ibm.com>	2017-10-30 07:48:21 +00:00
Aliaksandr Valialkin	0011cfbe2b	cmd/compile: optimize signed non-negative div/mod by a power of 2 This CL optimizes assembly for len() or cap() division by a power of 2 constants: func lenDiv(s []int) int { return len(s) / 16 } amd64 assembly before the CL: MOVQ "".s+16(SP), AX MOVQ AX, CX SARQ $63, AX SHRQ $60, AX ADDQ CX, AX SARQ $4, AX MOVQ AX, "".~r1+32(SP) RET amd64 assembly after the CL: MOVQ "".s+16(SP), AX SHRQ $4, AX MOVQ AX, "".~r1+32(SP) RET The CL relies on the fact that len() and cap() result cannot be negative. Trigger stats for the added SSA rules on linux/amd64 when running make.bash: 46 Div64 12 Mod64 The added SSA rules may trigger on more cases in the future when SSA values will be populated with the info on their lower bounds. For instance: func f(i int16) int16 { if i < 3 { return -1 } // Lower bound of i is 3 here -> i is non-negative, // so unsigned arithmetics may be used here. return i % 16 } Change-Id: I8bc6be5a03e71157ced533c01416451ff6f1a7f0 Reviewed-on: https://go-review.googlesource.com/65530 Reviewed-by: Keith Randall <khr@golang.org>	2017-10-06 15:15:39 +00:00
Alberto Donizetti	03614562ca	cmd/compile: remove x86 arch-specific rules for +2ⁿ multiplication amd64 and 386 have rules to reduce multiplication by a positive power of two, but a more general reduction (both for positive and negative powers of two) is already performed by generic rules that were added in CL 36323 to replace walkmul (see lines 166:173 in generic.rules). The x86 and amd64 rules are never triggered during all.bash and can be removed, reducing rules duplication. The change also adds a few code generation tests for amd64 and 386. Change-Id: I566d48186643bd722a4c0137fe94e513b8b20e36 Reviewed-on: https://go-review.googlesource.com/68450 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-10-06 09:30:57 +00:00
Ilya Tocar	6b8a3c8889	cmd/compile/internal/amd64: add SETccmem Combine setcc and store of result into setcc that writes directly to memory. Triggers 200+ times in go tool. Fixes #21630 Change-Id: Iafa22607426f4120140c88fae4b9aecb46e0bba8 Reviewed-on: https://go-review.googlesource.com/67950 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-10-05 20:53:28 +00:00
Michael Munday	7582494e06	cmd/compile: add s390x intrinsics for Ceil, Floor, Round and Trunc Ceil, Floor and Trunc are pre-existing intrinsics. Round is a new function and has been added as an intrinsic in this CL. All of the functions can be implemented as a single 'LOAD FP INTEGER' instruction, FIDBR, on s390x. name old time/op new time/op delta Ceil 2.34ns ± 0% 0.85ns ± 0% -63.74% (p=0.000 n=5+4) Floor 2.33ns ± 0% 0.85ns ± 1% -63.35% (p=0.008 n=5+5) Round 4.23ns ± 0% 0.85ns ± 0% -79.89% (p=0.000 n=5+4) Trunc 2.35ns ± 0% 0.85ns ± 0% -63.83% (p=0.029 n=4+4) Change-Id: Idee7ba24a2899d12bf9afee4eedd6b4aaad3c510 Reviewed-on: https://go-review.googlesource.com/63890 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-09-20 10:01:35 +00:00
Michael Munday	95b146e8eb	cmd/compile: improve floating point constant propagation Add generic rules to propagate floating point constants through comparisons and integer conversions. These new rules seldom trigger in the standard library so there is no performance change, however I think it is worth adding them anyway for completeness. Change-Id: I9db5222746508a2996f1cafb72f4e0cf2541de07 Reviewed-on: https://go-review.googlesource.com/63795 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-09-14 23:08:33 +00:00
Lynn Boger	fa3fe2e3c6	cmd/compile, math/bits: add rotate rules to PPC64.rules This adds rules to match the code in math/bits RotateLeft, RotateLeft32, and RotateLef64 to allow them to be inlined. The rules are complicated because the code in these function use different types, and the non-const version of these shifts generate Mask and Carry instructions that become subexpressions during the match process. Also adds a testcase to asm_test.go. Improvement in math/bits: BenchmarkRotateLeft-16 1.57 1.32 -15.92% BenchmarkRotateLeft32-16 1.60 1.37 -14.37% BenchmarkRotateLeft64-16 1.57 1.32 -15.92% Updates #21390 Change-Id: Ib6f17669ecc9cab54f18d690be27e2225ca654a4 Reviewed-on: https://go-review.googlesource.com/59932 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2017-09-11 20:44:22 +00:00
Michael Munday	9da29b687f	cmd/compile: propagate constants through math.Float{32,64}{,from}bits This CL adds generic SSA rules to propagate constants through raw bits conversions between floats and integers. This allows constants to propagate through some math functions. For example, math.Copysign(0, -1) is now constant folded to a load of -0.0. Requires a fix to the ARM assembler which loaded -0.0 as +0.0. Change-Id: I52649a4691077c7414f19d17bb599a6743c23ac2 Reviewed-on: https://go-review.googlesource.com/62250 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-09-08 17:24:03 +00:00
Keith Randall	aed1c119fd	cmd/compile: fix assembly test Bad merge, missed changing to keyed literal structs. Bug introduced in CL 56252 Change-Id: I55cccff4990bd25e6387f6c90919ee5866900d7f Reviewed-on: https://go-review.googlesource.com/61290 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-09-03 16:24:24 +00:00
Cholerae Hu	fb165eaffd	cmd/compile: combine xn - yn into (x-y)*n Do the similar thing to CL 55143 to reduce IMUL. Change-Id: I1bd38f618058e3cd74fac181f003610ea13f2294 Reviewed-on: https://go-review.googlesource.com/56252 Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-09-03 14:29:38 +00:00
Cherry Zhang	7846500a5a	cmd/compile: remove redundant constant shift rules Normal shift rules plus constant folding are enough to generate efficient shift-by-constant instructions. Add test to make sure we don't generate comparisons for constant shifts. TODO: there are still constant shift rules on PPC64. If they are removed, the constant folding rules are not enough to remove all the test and mask stuff for constant shifts. Leave them in for now. Fixes #20663. Change-Id: I724cc324aa8607762d0c8aacf9bfa641bda5c2a1 Reviewed-on: https://go-review.googlesource.com/60330 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-31 02:08:48 +00:00
Keith Randall	2b079c3c04	cmd/compile: use keyed struct for asm tests Just to make it clearer which regexps are positive and which regexps are negative. Change-Id: Ia190e89be28048fcae2491506f552afad90a5f85 Reviewed-on: https://go-review.googlesource.com/59490 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-08-28 17:34:25 +00:00
David du Colombier	adbfdfe377	cmd/compile: don't use MOVOstore for move on plan9/amd64 The SSA compiler currently generates MOVOstore instructions to optimize 16 bytes moves on AMD64 architecture. However, we can't use the MOVOstore instruction on Plan 9, because floating point operations are not allowed in the note handler. We rely on the useSSE flag to disable the use of the MOVOstore instruction on Plan 9 and replace it by two MOVQstore instructions. Fixes #21625 Change-Id: Idfefcceadccafe1752b059b5fe113ce566c0e71c Reviewed-on: https://go-review.googlesource.com/59171 Run-TryBot: David du Colombier <0intro@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2017-08-28 16:21:28 +00:00
Ilya Tocar	9c99512d18	cmd/compile/internal/ssa: combine consecutive loads and stores on amd64 Sometimes (often for calls) we generate code like this: MOVQ (addr),AX MOVQ 8(addr),BX MOVQ AX,(otheraddr) MOVQ BX,8(otheraddr) Replace it with MOVUPS (addr),X0 MOVUPS X0,(otheraddr) For completeness do the same for 8,16,32-bit loads/stores too. Shaves 1% from code sections of go tool. /localdisk/itocar/golang/bin/go 10293917 go_old 10334877 [40960 bytes] read-only data = 682 bytes (0.040769%) global text (code) = 38961 bytes (1.036503%) Total difference 39643 bytes (0.674628%) Updates #6853 Change-Id: I1f0d2f60273a63a079b58927cd1c4e3429d2e7ae Reviewed-on: https://go-review.googlesource.com/57130 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-25 20:05:17 +00:00
Keith Randall	fb05948d9e	cmd/compile,math: improve code generation for math.Abs Implement int reg <-> fp reg moves on amd64. If we see a load to int reg followed by an int->fp move, then we can just load to the fp reg instead. Same for stores. math.Abs is now: MOVQ "".x+8(SP), AX SHLQ $1, AX SHRQ $1, AX MOVQ AX, "".~r1+16(SP) math.Copysign is now: MOVQ "".x+8(SP), AX SHLQ $1, AX SHRQ $1, AX MOVQ "".y+16(SP), CX SHRQ $63, CX SHLQ $63, CX ORQ CX, AX MOVQ AX, "".~r2+24(SP) math.Float64bits is now: MOVSD "".x+8(SP), X0 MOVSD X0, "".~r1+16(SP) (it would be nicer to use a non-SSE reg for this, nothing is perfect) And due to the fix for #21440, the inlined version of these improve as well. name old time/op new time/op delta Abs 1.38ns ± 5% 0.89ns ±10% -35.54% (p=0.000 n=10+10) Copysign 1.56ns ± 7% 1.35ns ± 6% -13.77% (p=0.000 n=9+10) Fixes #13095 Change-Id: Ibd7f2792412a6668608780b0688a77062e1f1499 Reviewed-on: https://go-review.googlesource.com/58732 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2017-08-25 19:15:01 +00:00
Michael Munday	744ebfde04	cmd/compile: eliminate stores to unread auto variables This is a crude compiler pass to eliminate stores to auto variables that are only ever written to. Eliminates an unnecessary store to x from the following code: func f() int { var x := 1 return *(&x) } Fixes #19765. Change-Id: If2c63a8ae67b8c590b6e0cc98a9610939a3eeffa Reviewed-on: https://go-review.googlesource.com/38746 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-24 16:53:56 +00:00
Alberto Donizetti	8bca7ef607	cmd/compile: support placeholder name '$' in code generation tests This change adds to the code-generation harness in asm_test.go support for the use of a '$' placeholder name for test functions. A few of uninformative function names are also changed to use the placeholder, to confirm that the change works as expected. Fixes #21500 Change-Id: Iba168bd85efc9822253305d003b06682cf8a6c5c Reviewed-on: https://go-review.googlesource.com/57292 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-22 19:42:32 +00:00
Ilya Tocar	da34ddf24b	cmd/compile/internal/ssa: combine more const stores We already combine const stores up-to MOVQstoreconst. Combine 2 64-bit stores of const zero into 1 sse store of 128-bit zero. Shaves significant (>1%) amount of code from go tool: /localdisk/itocar/golang/bin/go 10334877 go_old 10388125 [53248 bytes] global text (code) = 51041 bytes (1.343944%) read-only data = 663 bytes (0.039617%) Total difference 51704 bytes (0.873981%) Change-Id: I7bc40968023c3a69f379b10fbb433cdb11364f1b Reviewed-on: https://go-review.googlesource.com/56250 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-17 17:40:40 +00:00
Alberto Donizetti	a0453a180f	cmd/compile: combine xn + yn into (x+y)n There are a few cases where this can be useful. Apart from the obvious (and silly) 100n + 200n where we generate one IMUL instead of two, consider: 15n + 31n Currently, the compiler strength-reduces both imuls, generating: 0x0000 00000 MOVQ "".n+8(SP), AX 0x0005 00005 MOVQ AX, CX 0x0008 00008 SHLQ $4, AX 0x000c 00012 SUBQ CX, AX 0x000f 00015 MOVQ CX, DX 0x0012 00018 SHLQ $5, CX 0x0016 00022 SUBQ DX, CX 0x0019 00025 ADDQ CX, AX 0x001c 00028 MOVQ AX, "".~r1+16(SP) 0x0021 00033 RET But combining the imuls is both faster and shorter: 0x0000 00000 MOVQ "".n+8(SP), AX 0x0005 00005 IMULQ $46, AX 0x0009 00009 MOVQ AX, "".~r1+16(SP) 0x000e 00014 RET even without strength-reduction. Moreover, consider: 5n + 7(n+1) + 11(n+2) We already have a rule that rewrites 7(n+1) into 7n+7, so the generated code (without imuls merging) looks like this: 0x0000 00000 MOVQ "".n+8(SP), AX 0x0005 00005 LEAQ (AX)(AX4), CX 0x0009 00009 MOVQ AX, DX 0x000c 00012 NEGQ AX 0x000f 00015 LEAQ (AX)(DX8), AX 0x0013 00019 ADDQ CX, AX 0x0016 00022 LEAQ (DX)(CX2), CX 0x001a 00026 LEAQ 29(AX)(CX1), AX 0x001f 00031 MOVQ AX, "".~r1+16(SP) But with imuls merging, the 5n, 7n and 11n factors get merged, and the generated code looks like this: 0x0000 00000 MOVQ "".n+8(SP), AX 0x0005 00005 IMULQ $23, AX 0x0009 00009 ADDQ $29, AX 0x000d 00013 MOVQ AX, "".~r1+16(SP) 0x0012 00018 RET Which is both faster and shorter; that's also the exact same code that clang and the intel c compiler generate for the above expression. Change-Id: Ib4d5503f05d2f2efe31a1be14e2fe6cac33730a9 Reviewed-on: https://go-review.googlesource.com/55143 Reviewed-by: Keith Randall <khr@golang.org>	2017-08-16 16:51:59 +00:00
Cherry Zhang	f20944de78	cmd/compile: set/unset base register for better assembly print For address of an auto or arg, on all non-x86 architectures the assembler backend encodes the actual SP offset in the instruction but leaves the offset in Prog unchanged. When the assembly is printed in compile -S, it shows an offset relative to pseudo FP/SP with an actual hardware SP base register (e.g. R13 on ARM). This is confusing. Unset the base register if it is indeed SP, so the assembly output is consistent. If the base register isn't SP, it should be an error and the error output contains the actual base register. For address loading instructions, the base register isn't set in the compiler on non-x86 architectures. Set it. Normally it is SP and will be unset in the change mentioned above for printing. If it is not, it will be an error and the error output contains the actual base register. No change in generated binary, only printed assembly. Passes "go build -a -toolexec 'toolstash -cmp' std cmd" on all architectures. Fixes #21064. Change-Id: Ifafe8d5f9b437efbe824b63b3cbc2f5f6cdc1fd5 Reviewed-on: https://go-review.googlesource.com/49432 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2017-08-02 12:24:02 +00:00
Ilya Tocar	3bdc2f3abf	cmd/compile/internal/gc: speed-up small array comparison Currently we inline array comparisons for arrays with at most 4 elements. Compare arrays with small size, but more than 4 elements (e. g. [16]byte) with larger compares. This provides very slightly smaller binaries, and results in faster code. ArrayEqual-6 7.41ns ± 0% 3.17ns ± 0% -57.15% (p=0.000 n=10+10) For go tool: global text (code) = -559 bytes (-0.014566%) This also helps mapaccess1_faststr, and maps in general: MapDelete/Str/1-6 195ns ± 1% 186ns ± 2% -4.47% (p=0.000 n=10+10) MapDelete/Str/2-6 211ns ± 1% 177ns ± 1% -16.01% (p=0.000 n=10+10) MapDelete/Str/4-6 225ns ± 1% 183ns ± 1% -18.49% (p=0.000 n=8+10) MapStringKeysEight_16-6 31.3ns ± 0% 28.6ns ± 0% -8.63% (p=0.000 n=6+9) MapStringKeysEight_32-6 29.2ns ± 0% 27.6ns ± 0% -5.45% (p=0.000 n=10+10) MapStringKeysEight_64-6 29.1ns ± 1% 27.5ns ± 0% -5.46% (p=0.000 n=10+10) MapStringKeysEight_1M-6 29.1ns ± 1% 27.6ns ± 0% -5.49% (p=0.000 n=10+10) Change-Id: I9ec98e41b233031e0e96c4e13d86a324f628ed4a Reviewed-on: https://go-review.googlesource.com/40771 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-06-01 15:46:16 +00:00
Josh Bleecher Snyder	ee69c21747	cmd/compile: don't use statictmps for SSA-able composite literals The writebarrier test has to change. Now that T23 composite literals are passed to the backend, they get SSA'd, so writes to their fields are treated separately, so the relevant part of the first write to t23 is now a dead store. Preserve the intent of the test by splitting it up into two functions. Reduces code size a bit: name old object-bytes new object-bytes delta Template 386k ± 0% 386k ± 0% ~ (all equal) Unicode 202k ± 0% 202k ± 0% ~ (all equal) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (all equal) Compiler 3.92M ± 0% 3.91M ± 0% -0.19% (p=0.008 n=5+5) SSA 7.91M ± 0% 7.91M ± 0% ~ (all equal) Flate 228k ± 0% 228k ± 0% -0.05% (p=0.008 n=5+5) GoParser 283k ± 0% 283k ± 0% ~ (all equal) Reflect 952k ± 0% 952k ± 0% -0.06% (p=0.008 n=5+5) Tar 188k ± 0% 188k ± 0% -0.09% (p=0.008 n=5+5) XML 406k ± 0% 406k ± 0% -0.02% (p=0.008 n=5+5) [Geo mean] 649k 648k -0.04% Fixes #18872 Change-Id: Ifeed0f71f13849732999aa731cc2bf40c0f0e32a Reviewed-on: https://go-review.googlesource.com/43154 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-05-11 18:28:40 +00:00
Cherry Zhang	fb0ccc5d0a	cmd/internal/obj/arm64, cmd/compile: improve offset folding on ARM64 ARM64 assembler backend only accepts loads and stores with small or aligned offset. The compiler therefore can only fold small or aligned offsets into loads and stores. For locals and args, their offsets to SP are not known until very late, and the compiler makes conservative decision not folding some of them. However, in most cases, the offset is indeed small or aligned, and can be folded into load and store (but actually not). This CL adds support of loads and stores with large and unaligned offsets. When the offset doesn't fit into the instruction, it uses two instructions and (for very large offset) the constant pool. This way, the compiler doesn't need to be conservative, and can simply fold the offset. To make it work, the assembler's optab matching rules need to be changed. Before, MOVD accepts C_UAUTO32K which matches multiple of 8 between 0 and 32K, and also C_UAUTO16K, which may not be multiple of 8 and does not fit into MOVD instruction. The assembler errors in the latter case. This change makes it only matches multiple of 8 (or offsets within ±256, which also fits in instruction), and uses the large-or-unaligned-offset rule for things doesn't fit (without error). Other sized move rules are changed similarly. Class C_UAUTO64K and C_UOREG64K are removed, as they are never used. In shared library, load/store of global is rewritten to using GOT and temp register, which conflicts with the use of temp register for assembling large offset. So the folding is disabled for globals in shared library mode. Reduce cmd/go binary size by 2%. name old time/op new time/op delta BinaryTree17-8 8.67s ± 0% 8.61s ± 0% -0.60% (p=0.000 n=9+10) Fannkuch11-8 6.24s ± 0% 6.19s ± 0% -0.83% (p=0.000 n=10+9) FmtFprintfEmpty-8 116ns ± 0% 116ns ± 0% ~ (all equal) FmtFprintfString-8 196ns ± 0% 192ns ± 0% -1.89% (p=0.000 n=10+10) FmtFprintfInt-8 199ns ± 0% 198ns ± 0% -0.35% (p=0.001 n=9+10) FmtFprintfIntInt-8 294ns ± 0% 293ns ± 0% -0.34% (p=0.000 n=8+8) FmtFprintfPrefixedInt-8 318ns ± 1% 318ns ± 1% ~ (p=1.000 n=10+10) FmtFprintfFloat-8 537ns ± 0% 531ns ± 0% -1.17% (p=0.000 n=9+10) FmtManyArgs-8 1.19µs ± 1% 1.18µs ± 1% -1.41% (p=0.001 n=10+10) GobDecode-8 17.2ms ± 1% 17.3ms ± 2% ~ (p=0.165 n=10+10) GobEncode-8 14.7ms ± 1% 14.7ms ± 2% ~ (p=0.631 n=10+10) Gzip-8 837ms ± 0% 836ms ± 0% -0.14% (p=0.006 n=9+10) Gunzip-8 141ms ± 0% 139ms ± 0% -1.24% (p=0.000 n=9+10) HTTPClientServer-8 256µs ± 1% 253µs ± 1% -1.35% (p=0.000 n=10+10) JSONEncode-8 40.1ms ± 1% 41.3ms ± 1% +3.06% (p=0.000 n=10+9) JSONDecode-8 157ms ± 1% 156ms ± 1% -0.83% (p=0.001 n=9+8) Mandelbrot200-8 8.94ms ± 0% 8.94ms ± 0% +0.02% (p=0.000 n=9+9) GoParse-8 8.69ms ± 0% 8.54ms ± 1% -1.69% (p=0.000 n=8+10) RegexpMatchEasy0_32-8 227ns ± 1% 228ns ± 1% +0.48% (p=0.016 n=10+9) RegexpMatchEasy0_1K-8 1.92µs ± 0% 1.63µs ± 0% -15.08% (p=0.000 n=10+9) RegexpMatchEasy1_32-8 256ns ± 0% 251ns ± 0% -2.19% (p=0.000 n=10+9) RegexpMatchEasy1_1K-8 2.38µs ± 0% 2.09µs ± 0% -12.49% (p=0.000 n=10+9) RegexpMatchMedium_32-8 352ns ± 0% 354ns ± 0% +0.39% (p=0.002 n=10+9) RegexpMatchMedium_1K-8 106µs ± 0% 106µs ± 0% -0.05% (p=0.005 n=10+9) RegexpMatchHard_32-8 5.92µs ± 0% 5.89µs ± 0% -0.40% (p=0.000 n=9+8) RegexpMatchHard_1K-8 180µs ± 0% 179µs ± 0% -0.14% (p=0.000 n=10+9) Revcomp-8 1.20s ± 0% 1.13s ± 0% -6.29% (p=0.000 n=9+8) Template-8 159ms ± 1% 154ms ± 1% -3.14% (p=0.000 n=9+10) TimeParse-8 800ns ± 3% 769ns ± 1% -3.91% (p=0.000 n=10+10) TimeFormat-8 826ns ± 2% 817ns ± 2% -1.04% (p=0.050 n=10+10) [Geo mean] 145µs 143µs -1.79% Change-Id: I5fc42087cee9b54ea414f8ef6d6d020b80eb5985 Reviewed-on: https://go-review.googlesource.com/42172 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com>	2017-05-09 19:41:00 +00:00
Martin Möhrmann	f9bec9eb42	cmd/compile: use MOVL instead of MOVQ for small constants on amd64 The encoding of MOVL to a register is 2 bytes shorter than for MOVQ. The upper 32bit are automatically zeroed when MOVL to a register is used. Replaces 1657 MOVQ by MOVL in the go binary. Reduces go binary size by 4 kilobyte. name old time/op new time/op delta BinaryTree17 1.93s ± 0% 1.93s ± 0% -0.32% (p=0.000 n=9+9) Fannkuch11 2.66s ± 0% 2.48s ± 0% -6.60% (p=0.000 n=9+9) FmtFprintfEmpty 31.8ns ± 0% 31.6ns ± 0% -0.63% (p=0.000 n=10+10) FmtFprintfString 52.0ns ± 0% 51.9ns ± 0% -0.19% (p=0.000 n=10+10) FmtFprintfInt 55.6ns ± 0% 54.6ns ± 0% -1.80% (p=0.002 n=8+10) FmtFprintfIntInt 87.7ns ± 0% 84.8ns ± 0% -3.31% (p=0.000 n=9+9) FmtFprintfPrefixedInt 98.9ns ± 0% 102.0ns ± 0% +3.10% (p=0.000 n=10+10) FmtFprintfFloat 165ns ± 0% 164ns ± 0% -0.61% (p=0.000 n=10+10) FmtManyArgs 368ns ± 0% 361ns ± 0% -1.98% (p=0.000 n=8+10) GobDecode 4.53ms ± 0% 4.58ms ± 0% +1.08% (p=0.000 n=9+10) GobEncode 3.74ms ± 0% 3.73ms ± 0% -0.27% (p=0.000 n=10+10) Gzip 164ms ± 0% 163ms ± 0% -0.48% (p=0.000 n=10+10) Gunzip 26.7ms ± 0% 26.6ms ± 0% -0.13% (p=0.000 n=9+10) HTTPClientServer 30.4µs ± 1% 30.3µs ± 1% -0.41% (p=0.016 n=10+10) JSONEncode 10.9ms ± 0% 11.0ms ± 0% +0.70% (p=0.000 n=10+10) JSONDecode 36.8ms ± 0% 37.0ms ± 0% +0.59% (p=0.000 n=9+10) Mandelbrot200 3.20ms ± 0% 3.21ms ± 0% +0.44% (p=0.000 n=9+10) GoParse 2.35ms ± 0% 2.35ms ± 0% +0.26% (p=0.000 n=10+9) RegexpMatchEasy0_32 58.3ns ± 0% 58.4ns ± 0% +0.17% (p=0.000 n=10+10) RegexpMatchEasy0_1K 138ns ± 0% 142ns ± 0% +2.68% (p=0.000 n=10+10) RegexpMatchEasy1_32 55.1ns ± 0% 55.6ns ± 1% ~ (p=0.104 n=10+10) RegexpMatchEasy1_1K 242ns ± 0% 243ns ± 0% +0.41% (p=0.000 n=10+10) RegexpMatchMedium_32 87.4ns ± 0% 89.9ns ± 0% +2.86% (p=0.000 n=10+10) RegexpMatchMedium_1K 27.4µs ± 0% 27.4µs ± 0% +0.15% (p=0.000 n=10+10) RegexpMatchHard_32 1.30µs ± 0% 1.32µs ± 1% +1.91% (p=0.000 n=10+10) RegexpMatchHard_1K 39.0µs ± 0% 39.5µs ± 0% +1.38% (p=0.000 n=10+10) Revcomp 316ms ± 0% 319ms ± 0% +1.13% (p=0.000 n=9+8) Template 40.6ms ± 0% 40.6ms ± 0% ~ (p=0.123 n=10+10) TimeParse 224ns ± 0% 224ns ± 0% ~ (all equal) TimeFormat 230ns ± 0% 225ns ± 0% -2.17% (p=0.000 n=10+10) Change-Id: I32a099b65f9e6d4ad7288ed48546655c534757d8 Reviewed-on: https://go-review.googlesource.com/38630 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-05-01 20:59:58 +00:00
Lynn Boger	9248ff46a8	cmd/compile: add rotates to PPC64.rules This updates PPC64.rules to include rules to generate rotates for ADD, OR, XOR operators that combine two opposite shifts that sum to 32 or 64. To support this change opcodes for ROTL and ROTLW were added to be used like the rotldi and rotlwi extended mnemonics. This provides the following improvement in sha3: BenchmarkPermutationFunction-8 302.83 376.40 1.24x BenchmarkSha3_512_MTU-8 98.64 121.92 1.24x BenchmarkSha3_384_MTU-8 136.80 168.30 1.23x BenchmarkSha3_256_MTU-8 169.21 211.29 1.25x BenchmarkSha3_224_MTU-8 179.76 221.19 1.23x BenchmarkShake128_MTU-8 212.87 263.23 1.24x BenchmarkShake256_MTU-8 196.62 245.60 1.25x BenchmarkShake256_16x-8 163.57 194.37 1.19x BenchmarkShake256_1MiB-8 199.02 248.74 1.25x BenchmarkSha3_512_1MiB-8 106.55 133.13 1.25x Fixes #20030 Change-Id: I484c56f48395d32f53ff3ecb3ac6cb8191cfee44 Reviewed-on: https://go-review.googlesource.com/40992 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-04-20 18:05:22 +00:00
Keith Randall	7e07e635f3	cmd/compile: implement non-constant rotates Makes math/bits.Rotate{Left,Right} fast on amd64. name old time/op new time/op delta RotateLeft-12 7.42ns ± 6% 5.45ns ± 6% -26.54% (p=0.000 n=9+10) RotateLeft8-12 4.77ns ± 5% 3.42ns ± 7% -28.25% (p=0.000 n=8+10) RotateLeft16-12 4.82ns ± 8% 3.40ns ± 7% -29.36% (p=0.000 n=10+10) RotateLeft32-12 4.87ns ± 7% 3.48ns ± 7% -28.51% (p=0.000 n=8+9) RotateLeft64-12 5.23ns ±10% 3.35ns ± 6% -35.97% (p=0.000 n=9+10) RotateRight-12 7.59ns ± 8% 5.71ns ± 1% -24.72% (p=0.000 n=10+8) RotateRight8-12 4.98ns ± 7% 3.36ns ± 9% -32.55% (p=0.000 n=10+10) RotateRight16-12 5.12ns ± 2% 3.45ns ± 5% -32.62% (p=0.000 n=10+10) RotateRight32-12 4.80ns ± 6% 3.42ns ±16% -28.68% (p=0.000 n=10+10) RotateRight64-12 4.78ns ± 6% 3.42ns ± 6% -28.50% (p=0.000 n=10+10) Update #18940 Change-Id: Ie79fb5581c489ed4d3b859314c5e669a134c119b Reviewed-on: https://go-review.googlesource.com/39711 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2017-04-17 23:19:45 +00:00
Josh Bleecher Snyder	3d0a898385	cmd/compile: improve output when TestAssembly build fails Change-Id: Ibee84399d81463d3e7d5319626bb0d6b60b86bd9 Reviewed-on: https://go-review.googlesource.com/40861 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-04-17 03:12:34 +00:00
Josh Bleecher Snyder	0d36999a0f	cmd/compile: make TestAssembly resilient to output ordering To preserve reproducible builds, the text entries during compilation will be sorted before being printed. TestAssembly currently assumes that function init comes after all user-defined functions. Remove that assumption. Instead of looking for "TEXT" to tell you where a function ends--which may now yield lots of non-function-code junk--look for a line beginning with non-whitespace. Updates #15756 Change-Id: Ibc82dba6143d769ef4c391afc360e523b1a51348 Reviewed-on: https://go-review.googlesource.com/39853 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2017-04-13 02:30:29 +00:00
Ilya Tocar	e4a500ce14	cmd/compile/internal/gc: improve comparison with constant strings Currently we expand comparison with small constant strings into len check and a sequence of byte comparisons. Generate 16/32/64-bit comparisons, instead of bytewise on 386 and amd64. Also increase limits on what is considered small constant string. Shaves ~30kb (0.5%) from go executable. This also updates test/prove.go to keep test case valid. Change-Id: I99ae8871a1d00c96363c6d03d0b890782fa7e1d9 Reviewed-on: https://go-review.googlesource.com/38776 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2017-04-07 15:40:25 +00:00
Cherry Zhang	257b01f8f4	cmd/compile: use ANDconst to mask out leading/trailing bits on ARM64 For an AND that masks out leading or trailing bits, generic rules rewrite it to a pair of shifts. On ARM64, the mask actually can fit into an AND instruction. So we rewrite it back to AND. Fixes #19857. Change-Id: I479d7320ae4f29bb3f0056d5979bde4478063a8f Reviewed-on: https://go-review.googlesource.com/39651 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2017-04-06 17:59:32 +00:00
Keith Randall	5cadc91b3c	cmd/compile: intrinsics for math/bits.OnesCount Popcount instructions on amd64 are not guaranteed to be present, so we must guard their call. Rewrite rules can't generate control flow at the moment, so the intrinsifier needs to generate that code. name old time/op new time/op delta OnesCount-8 2.47ns ± 5% 1.04ns ± 2% -57.70% (p=0.000 n=10+10) OnesCount16-8 1.05ns ± 1% 0.78ns ± 0% -25.56% (p=0.000 n=9+8) OnesCount32-8 1.63ns ± 5% 1.04ns ± 2% -35.96% (p=0.000 n=10+10) OnesCount64-8 2.45ns ± 0% 1.04ns ± 1% -57.55% (p=0.000 n=6+10) Update #18616 Change-Id: I4aff2cc9aa93787898d7b22055fe272a7cf95673 Reviewed-on: https://go-review.googlesource.com/38320 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-04-04 02:40:11 +00:00
Keith Randall	63a72fd447	cmd/compile: strength-reduce floating point x2 -> x+x x/c, c power of 2 -> x(1/c) Fixes #19827 Change-Id: I74c9f0b5b49b2ed26c0990314c7d1d5f9631b6f1 Reviewed-on: https://go-review.googlesource.com/39295 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2017-04-03 21:27:03 +00:00
Keith Randall	86dc86b4f9	cmd/compile: don't merge load+op if other op arg is still live We want to merge a load and op into a single instruction l = LOAD ptr mem y = OP x l into y = OPload x ptr mem However, all of our OPload instructions require that y uses the same register as x. If x is needed past this instruction, then we must copy x somewhere else, losing the whole benefit of merging the instructions in the first place. Disable this optimization if x is live past the OP. Also disable this optimization if the OP is in a deeper loop than the load. Update #19595 Change-Id: I87f596aad7e91c9127bfb4705cbae47106e1e77a Reviewed-on: https://go-review.googlesource.com/38337 Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2017-03-23 15:53:04 +00:00

1 2

79 commits