Stowage/go - Remotebranch.eu

Stowage/go

mirror of https://github.com/golang/go.git synced 2025-12-08 06:10:04 +00:00

Author	SHA1	Message	Date
Giovanni Bajo	68def82008	cmd/compile: fix bit-test rules for highest bit Bit-test rules failed to match when matching the highest bit of a word because operands in SSA are signed int64. Fix them by treating them as unsigned (and correctly handling 32-bit operands as well). Tests will be added in next CL. Change-Id: I491c4e88e7e2f87e9bb72bd0d9fa5d4025b90736 Reviewed-on: https://go-review.googlesource.com/94765 Reviewed-by: Keith Randall <khr@golang.org>	2018-02-27 00:51:40 +00:00
Chad Rosier	ecd9e8a2fe	cmd/compile/internal/ssa: combine zero stores into larger stores on arm64 This reduces the go tool binary on arm64 by 12k. go1 results on Amberwing: name old time/op new time/op delta RegexpMatchEasy0_32 249ns ± 0% 249ns ± 0% ~ (p=0.087 n=10+10) RegexpMatchEasy0_1K 584ns ± 0% 584ns ± 0% ~ (all equal) RegexpMatchEasy1_32 246ns ± 0% 246ns ± 0% ~ (p=1.000 n=10+10) RegexpMatchEasy1_1K 806ns ± 0% 806ns ± 0% ~ (p=0.706 n=10+9) RegexpMatchMedium_32 314ns ± 0% 314ns ± 0% ~ (all equal) RegexpMatchMedium_1K 52.1µs ± 0% 52.1µs ± 0% ~ (p=0.245 n=10+8) RegexpMatchHard_32 2.75µs ± 1% 2.75µs ± 1% ~ (p=0.690 n=10+10) RegexpMatchHard_1K 78.9µs ± 0% 78.9µs ± 1% ~ (p=0.295 n=9+9) FmtFprintfEmpty 58.5ns ± 0% 58.5ns ± 0% ~ (all equal) FmtFprintfString 112ns ± 0% 112ns ± 0% ~ (all equal) FmtFprintfInt 117ns ± 0% 116ns ± 0% -0.85% (p=0.000 n=10+10) FmtFprintfIntInt 181ns ± 0% 181ns ± 0% ~ (all equal) FmtFprintfPrefixedInt 222ns ± 0% 224ns ± 0% +0.90% (p=0.000 n=9+10) FmtFprintfFloat 318ns ± 1% 322ns ± 0% ~ (p=0.059 n=10+8) FmtManyArgs 736ns ± 1% 735ns ± 0% ~ (p=0.206 n=9+9) Gzip 437ms ± 0% 436ms ± 0% -0.25% (p=0.000 n=10+10) HTTPClientServer 89.8µs ± 1% 90.2µs ± 2% ~ (p=0.393 n=10+10) JSONEncode 20.1ms ± 1% 20.2ms ± 1% ~ (p=0.065 n=9+10) JSONDecode 94.2ms ± 1% 93.9ms ± 1% -0.42% (p=0.043 n=10+10) GobDecode 12.7ms ± 1% 12.8ms ± 2% +0.94% (p=0.019 n=10+10) GobEncode 12.1ms ± 0% 12.1ms ± 0% ~ (p=0.052 n=10+10) Mandelbrot200 5.06ms ± 0% 5.05ms ± 0% -0.04% (p=0.000 n=9+10) TimeParse 450ns ± 3% 446ns ± 0% ~ (p=0.238 n=10+9) TimeFormat 485ns ± 1% 483ns ± 1% ~ (p=0.073 n=10+10) Template 90.4ms ± 0% 90.7ms ± 0% +0.29% (p=0.000 n=8+10) GoParse 6.01ms ± 0% 6.03ms ± 0% +0.35% (p=0.000 n=10+10) BinaryTree17 11.7s ± 0% 11.7s ± 0% ~ (p=0.481 n=10+10) Revcomp 669ms ± 0% 669ms ± 0% ~ (p=0.315 n=10+10) Fannkuch11 3.40s ± 0% 3.37s ± 0% -0.92% (p=0.000 n=10+10) [Geo mean] 67.9µs 67.9µs +0.02% name old speed new speed delta RegexpMatchEasy0_32 128MB/s ± 0% 128MB/s ± 0% -0.08% (p=0.003 n=8+10) RegexpMatchEasy0_1K 1.75GB/s ± 0% 1.75GB/s ± 0% ~ (p=0.642 n=8+10) RegexpMatchEasy1_32 130MB/s ± 0% 130MB/s ± 0% ~ (p=0.690 n=10+9) RegexpMatchEasy1_1K 1.27GB/s ± 0% 1.27GB/s ± 0% ~ (p=0.661 n=10+9) RegexpMatchMedium_32 3.18MB/s ± 0% 3.18MB/s ± 0% ~ (all equal) RegexpMatchMedium_1K 19.7MB/s ± 0% 19.6MB/s ± 0% ~ (p=0.190 n=10+9) RegexpMatchHard_32 11.6MB/s ± 0% 11.6MB/s ± 1% ~ (p=0.669 n=10+10) RegexpMatchHard_1K 13.0MB/s ± 0% 13.0MB/s ± 0% ~ (p=0.718 n=9+9) Gzip 44.4MB/s ± 0% 44.5MB/s ± 0% +0.24% (p=0.000 n=10+10) JSONEncode 96.5MB/s ± 1% 96.1MB/s ± 1% ~ (p=0.065 n=9+10) JSONDecode 20.6MB/s ± 1% 20.7MB/s ± 1% +0.42% (p=0.041 n=10+10) GobDecode 60.6MB/s ± 1% 60.0MB/s ± 2% -0.92% (p=0.016 n=10+10) GobEncode 63.4MB/s ± 0% 63.6MB/s ± 0% ~ (p=0.055 n=10+10) Template 21.5MB/s ± 0% 21.4MB/s ± 0% -0.30% (p=0.000 n=9+10) GoParse 9.64MB/s ± 0% 9.61MB/s ± 0% -0.36% (p=0.000 n=10+10) Revcomp 380MB/s ± 0% 380MB/s ± 0% ~ (p=0.323 n=10+10) [Geo mean] 56.0MB/s 55.9MB/s -0.07% Change-Id: Ia732fa57fbcf4767d72382516d9f16705d177736 Reviewed-on: https://go-review.googlesource.com/96435 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-02-27 00:07:25 +00:00
Ilya Tocar	f4d9c30901	cmd/compile/internal/amd64: use appropriate NEG for div Currently we generate NEGQ for DIV{Q,L,W}. By generating NEGL and NEGW, we will reduce code size, because NEGL doesn't require rex prefix. This also guarantees that upper 32 bits are zeroed, so we can revert CL 85736, and remove zero-extensions of DIVL results. Also adds test for redundant zero extend elimination. Fixes #23310 Change-Id: Ic58c3104c255a71371a06e09d10a975bbe5df587 Reviewed-on: https://go-review.googlesource.com/96815 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2018-02-26 20:09:21 +00:00
philhofer	2d0172c3a7	cmd/compile/internal/ssa: emit csel on arm64 Introduce a new SSA pass to generate CondSelect intstrutions, and add CondSelect lowering rules for arm64. In order to make the CSEL instruction easier to optimize, and to simplify the introduction of CSNEG, CSINC, and CSINV in the future, modify the CSEL instruction to accept a condition code in the aux field. Notably, this change makes the go1 Gzip benchmark more than 10% faster. Benchmarks on a Cavium ThunderX: name old time/op new time/op delta BinaryTree17-96 15.9s ± 6% 16.0s ± 4% ~ (p=0.968 n=10+9) Fannkuch11-96 7.17s ± 0% 7.00s ± 0% -2.43% (p=0.000 n=8+9) FmtFprintfEmpty-96 208ns ± 1% 207ns ± 0% ~ (p=0.152 n=10+8) FmtFprintfString-96 379ns ± 0% 375ns ± 0% -0.95% (p=0.000 n=10+9) FmtFprintfInt-96 385ns ± 0% 383ns ± 0% -0.52% (p=0.000 n=9+10) FmtFprintfIntInt-96 591ns ± 0% 586ns ± 0% -0.85% (p=0.006 n=7+9) FmtFprintfPrefixedInt-96 656ns ± 0% 667ns ± 0% +1.71% (p=0.000 n=10+10) FmtFprintfFloat-96 967ns ± 0% 984ns ± 0% +1.78% (p=0.000 n=10+10) FmtManyArgs-96 2.35µs ± 0% 2.25µs ± 0% -4.63% (p=0.000 n=9+8) GobDecode-96 31.0ms ± 0% 30.8ms ± 0% -0.36% (p=0.006 n=9+9) GobEncode-96 24.4ms ± 0% 24.5ms ± 0% +0.30% (p=0.000 n=9+9) Gzip-96 1.60s ± 0% 1.43s ± 0% -10.58% (p=0.000 n=9+10) Gunzip-96 167ms ± 0% 169ms ± 0% +0.83% (p=0.000 n=8+9) HTTPClientServer-96 311µs ± 1% 308µs ± 0% -0.75% (p=0.000 n=10+10) JSONEncode-96 65.0ms ± 0% 64.8ms ± 0% -0.25% (p=0.000 n=9+8) JSONDecode-96 262ms ± 1% 261ms ± 1% ~ (p=0.579 n=10+10) Mandelbrot200-96 18.0ms ± 0% 18.1ms ± 0% +0.17% (p=0.000 n=8+10) GoParse-96 14.0ms ± 0% 14.1ms ± 1% +0.42% (p=0.003 n=9+10) RegexpMatchEasy0_32-96 644ns ± 2% 645ns ± 2% ~ (p=0.836 n=10+10) RegexpMatchEasy0_1K-96 3.70µs ± 0% 3.49µs ± 0% -5.58% (p=0.000 n=10+10) RegexpMatchEasy1_32-96 662ns ± 2% 657ns ± 2% ~ (p=0.137 n=10+10) RegexpMatchEasy1_1K-96 4.47µs ± 0% 4.31µs ± 0% -3.48% (p=0.000 n=10+10) RegexpMatchMedium_32-96 844ns ± 2% 849ns ± 1% ~ (p=0.208 n=10+10) RegexpMatchMedium_1K-96 179µs ± 0% 182µs ± 0% +1.20% (p=0.000 n=10+10) RegexpMatchHard_32-96 10.0µs ± 0% 10.1µs ± 0% +0.48% (p=0.000 n=10+9) RegexpMatchHard_1K-96 297µs ± 0% 297µs ± 0% -0.14% (p=0.000 n=10+10) Revcomp-96 3.08s ± 0% 3.13s ± 0% +1.56% (p=0.000 n=9+9) Template-96 276ms ± 2% 275ms ± 1% ~ (p=0.393 n=10+10) TimeParse-96 1.37µs ± 0% 1.36µs ± 0% -0.53% (p=0.000 n=10+7) TimeFormat-96 1.40µs ± 0% 1.42µs ± 0% +0.97% (p=0.000 n=10+10) [Geo mean] 264µs 262µs -0.77% Change-Id: Ie54eee4b3092af53e6da3baa6d1755098f57f3a2 Reviewed-on: https://go-review.googlesource.com/55670 Run-TryBot: Philip Hofer <phofer@umich.edu> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2018-02-20 06:00:54 +00:00
Chad Rosier	51932c326f	cmd/compile: improve absorb shifts optimization for arm64 Current absorb shifts optimization can generate dead Value nodes which increase use count of other live nodes. It will impact other optimizations (such as combined loads) which are enabled based on specific use count. This patch fixes the issue by decreasing the use count of nodes referenced by dead Value nodes generated by absorb shifts optimization. Performance impacts on go1 benchmarks (data collected on A57@2GHzx8): name old time/op new time/op delta BinaryTree17-8 6.28s ± 2% 6.24s ± 1% ~ (p=0.065 n=10+9) Fannkuch11-8 6.32s ± 0% 6.33s ± 0% +0.17% (p=0.000 n=10+10) FmtFprintfEmpty-8 98.9ns ± 0% 99.2ns ± 0% +0.34% (p=0.000 n=9+7) FmtFprintfString-8 183ns ± 1% 182ns ± 1% -1.01% (p=0.005 n=9+10) FmtFprintfInt-8 199ns ± 1% 202ns ± 1% +1.41% (p=0.000 n=10+9) FmtFprintfIntInt-8 272ns ± 1% 276ns ± 3% +1.36% (p=0.015 n=10+10) FmtFprintfPrefixedInt-8 367ns ± 1% 369ns ± 1% +0.68% (p=0.042 n=10+10) FmtFprintfFloat-8 491ns ± 1% 493ns ± 1% ~ (p=0.064 n=10+10) FmtManyArgs-8 1.31µs ± 1% 1.32µs ± 1% +0.39% (p=0.042 n=8+9) GobDecode-8 17.0ms ± 2% 16.2ms ± 2% -4.74% (p=0.000 n=10+10) GobEncode-8 13.7ms ± 2% 13.4ms ± 1% -2.40% (p=0.000 n=10+9) Gzip-8 844ms ± 0% 737ms ± 0% -12.70% (p=0.000 n=10+10) Gunzip-8 84.4ms ± 1% 83.9ms ± 0% -0.55% (p=0.000 n=10+8) HTTPClientServer-8 122µs ± 1% 124µs ± 1% +1.75% (p=0.000 n=10+9) JSONEncode-8 34.9ms ± 1% 32.4ms ± 0% -7.11% (p=0.000 n=10+9) JSONDecode-8 150ms ± 0% 146ms ± 1% -2.84% (p=0.000 n=7+10) Mandelbrot200-8 10.0ms ± 0% 10.0ms ± 0% ~ (p=0.529 n=10+10) GoParse-8 8.18ms ± 1% 8.03ms ± 0% -1.93% (p=0.000 n=10+10) RegexpMatchEasy0_32-8 209ns ± 0% 209ns ± 0% ~ (p=0.248 n=10+9) RegexpMatchEasy0_1K-8 789ns ± 1% 790ns ± 0% ~ (p=0.361 n=10+10) RegexpMatchEasy1_32-8 202ns ± 0% 202ns ± 1% ~ (p=0.137 n=8+10) RegexpMatchEasy1_1K-8 1.12µs ± 2% 1.12µs ± 1% ~ (p=0.810 n=10+10) RegexpMatchMedium_32-8 298ns ± 0% 298ns ± 0% ~ (p=0.443 n=10+9) RegexpMatchMedium_1K-8 83.0µs ± 5% 78.6µs ± 0% -5.37% (p=0.000 n=10+10) RegexpMatchHard_32-8 4.32µs ± 0% 4.26µs ± 0% -1.47% (p=0.000 n=10+10) RegexpMatchHard_1K-8 132µs ± 4% 126µs ± 0% -4.41% (p=0.000 n=10+9) Revcomp-8 1.11s ± 0% 1.11s ± 0% +0.14% (p=0.017 n=10+9) Template-8 155ms ± 1% 155ms ± 1% ~ (p=0.796 n=10+10) TimeParse-8 774ns ± 1% 785ns ± 1% +1.41% (p=0.001 n=10+10) TimeFormat-8 788ns ± 1% 806ns ± 1% +2.24% (p=0.000 n=10+9) name old speed new speed delta GobDecode-8 45.2MB/s ± 2% 47.5MB/s ± 2% +4.96% (p=0.000 n=10+10) GobEncode-8 56.0MB/s ± 2% 57.4MB/s ± 1% +2.44% (p=0.000 n=10+9) Gzip-8 23.0MB/s ± 0% 26.3MB/s ± 0% +14.55% (p=0.000 n=10+10) Gunzip-8 230MB/s ± 1% 231MB/s ± 0% +0.55% (p=0.000 n=10+8) JSONEncode-8 55.6MB/s ± 1% 59.9MB/s ± 0% +7.65% (p=0.000 n=10+9) JSONDecode-8 12.9MB/s ± 0% 13.3MB/s ± 1% +2.94% (p=0.000 n=7+10) GoParse-8 7.08MB/s ± 1% 7.22MB/s ± 0% +1.95% (p=0.000 n=10+10) RegexpMatchEasy0_32-8 153MB/s ± 0% 153MB/s ± 0% -0.16% (p=0.023 n=10+10) RegexpMatchEasy0_1K-8 1.30GB/s ± 1% 1.30GB/s ± 0% ~ (p=0.393 n=10+10) RegexpMatchEasy1_32-8 158MB/s ± 0% 158MB/s ± 0% ~ (p=0.684 n=10+10) RegexpMatchEasy1_1K-8 915MB/s ± 2% 918MB/s ± 1% ~ (p=0.796 n=10+10) RegexpMatchMedium_32-8 3.35MB/s ± 0% 3.35MB/s ± 0% ~ (p=1.000 n=10+9) RegexpMatchMedium_1K-8 12.3MB/s ± 5% 13.0MB/s ± 0% +5.56% (p=0.000 n=10+10) RegexpMatchHard_32-8 7.40MB/s ± 0% 7.51MB/s ± 0% +1.50% (p=0.000 n=10+10) RegexpMatchHard_1K-8 7.75MB/s ± 4% 8.10MB/s ± 0% +4.52% (p=0.000 n=10+8) Revcomp-8 229MB/s ± 0% 228MB/s ± 0% -0.14% (p=0.017 n=10+9) Template-8 12.5MB/s ± 1% 12.5MB/s ± 1% ~ (p=0.780 n=10+10) Change-Id: I103389f168eac79f6af44e8fef93acc2a7a4ac96 Reviewed-on: https://go-review.googlesource.com/88415 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-02-15 20:54:50 +00:00
Cherry Zhang	7f1c4b3afb	cmd/compile: disable "redundant zeroextensions" optimization for Select on AMD64 A Select Op could produce a value with upper 32 bits NOT zeroed, for example, Div32 is lowered to (Select0 (DIVL x y)). In theory, we could look into the argument of a Select to decide whether the upper bits are zeroed. As it is late in release cycle, just disable this optimization for Select for now. Fixes #23305. Change-Id: Icf665a2af9ccb0a7ba0ae00c683c9e349638bf85 Reviewed-on: https://go-review.googlesource.com/85736 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2018-01-02 21:08:35 +00:00
Keith Randall	5419ed3a66	cmd/compile: remove unused code Found a few functions in cmd/compile that aren't used. Change-Id: I53957dae6f1a645feb8b95383f0f050964b4f7d4 Reviewed-on: https://go-review.googlesource.com/79975 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-11-27 16:48:28 +00:00
Ilya Tocar	f3884680fc	cmd/compile/internal/ssa: inline memmove with known size Replace calls to memmove with known (constant) size, with OpMove. Do it only if it is safe from aliasing point of view. Helps with code like this: append(buf,"const str"...) In strconv this provides nice benefit: Quote-6 731ns ± 2% 647ns ± 3% -11.41% (p=0.000 n=10+10) QuoteRune-6 117ns ± 5% 111ns ± 1% -4.54% (p=0.000 n=10+10) AppendQuote-6 475ns ± 0% 396ns ± 0% -16.59% (p=0.000 n=9+10) AppendQuoteRune-6 32.0ns ± 0% 27.4ns ± 0% -14.41% (p=0.000 n=8+9) Change-Id: I7704f5c51b46aed2d8f033de74c75140fc35036c Reviewed-on: https://go-review.googlesource.com/54394 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-11-02 20:30:25 +00:00
Keith Randall	1787ced894	cmd/compile: remove Symbol wrappers from Aux fields We used to have {Arg,Auto,Extern}Symbol structs with which we wrapped a gc.Node or obj.LSym before storing them in the Aux field of an ssa.Value. This let the SSA part of the compiler distinguish between autos and args, for example. We no longer need the wrappers as we can query the underlying objects directly. There was also some sloppy usage, where VarDef had a gc.Node directly in its Aux field, whereas the use of that variable had that gc.Node wrapped in an AutoSymbol. Thus the Aux fields didn't match (using ==) when they probably should. This sloppy usage cleanup is the only thing in the CL that changes the generated code - we can get rid of some more unused auto variables if the matching happens reliably. Removing this wrapper also lets us get rid of the varsyms cache (which was used to prevent wrapping the same *gc.Node twice). Change-Id: I0dedf8f82f84bfee413d310342b777316bd1d478 Reviewed-on: https://go-review.googlesource.com/64452 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2017-09-19 22:03:10 +00:00
Ilya Tocar	782ee23884	cmd/compile/internal/ssa: remove redundant zeroextensions on amd64 Some instructions operating on <= 32 bits also zero out upper 32bits. Remove zeroextensions of such values. Triggers a few times during all.bash. Also removes ugly code like: MOVL CX,CX Change-Id: I66a46c190dd6929b7e3c52f3fe6b967768d00638 Reviewed-on: https://go-review.googlesource.com/58090 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-30 16:22:18 +00:00
philhofer	c59b495963	cmd/compile: add support for arm64 bit-test instructions Add support for generating TBZ/TBNZ instructions. The bit-test-and-branch pattern shows up in a number of important places, including the runtime (gc bitmaps). Before this change, there were 3 TB[N]?Z instructions in the Go tool, all of which were in hand-written assembly. After this change, there are 285. Also, the go1 benchmark binary gets about 4.5kB smaller. Fixes #21361 Change-Id: I170c138b852754b9b8df149966ca5e62e6dfa771 Reviewed-on: https://go-review.googlesource.com/54470 Run-TryBot: Philip Hofer <phofer@umich.edu> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-08-15 13:39:11 +00:00
Keith Randall	a027466e4b	cmd/compile: check that phis are always first after scheduling Update #20178 Change-Id: I603f77268ed38afdd84228c775efe006f08f14a7 Reviewed-on: https://go-review.googlesource.com/45018 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2017-06-07 00:13:20 +00:00
Michael Munday	4fc498d89a	cmd/compile: add generic rules to eliminate some unnecessary stores Eliminates stores of values that have just been loaded from the same location. Handles the common case where there are up to 3 intermediate stores to non-overlapping struct fields. For example the loads and stores of x.a, x.b and x.d in the following function are now removed: type T struct { a, b, c, d int } func f(x T) { y := x y.c += 8 *x = y } Before this CL (s390x): TEXT "".f(SB) MOVD "".x(R15), R5 MOVD (R5), R1 MOVD 8(R5), R2 MOVD 16(R5), R0 MOVD 24(R5), R4 ADD $8, R0, R3 STMG R1, R4, (R5) RET After this CL (s390x): TEXT "".f(SB) MOVD "".x(R15), R1 MOVD 16(R1), R0 ADD $8, R0, R0 MOVD R0, 16(R1) RET In total these rules are triggered ~5091 times during all.bash, which is broken down as: Intermediate stores \| Triggered --------------------+---------- 0 \| 1434 1 \| 2508 2 \| 888 3 \| 261 --------------------+---------- Change-Id: Ia4721ae40146aceec1fdd3e65b0e9283770bfba5 Reviewed-on: https://go-review.googlesource.com/38793 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-05-10 15:58:43 +00:00
Josh Bleecher Snyder	46b88c9fbc	cmd/compile: change ssa.Type into types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of types.Type. The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-05-09 23:01:51 +00:00
Cherry Zhang	fb0ccc5d0a	cmd/internal/obj/arm64, cmd/compile: improve offset folding on ARM64 ARM64 assembler backend only accepts loads and stores with small or aligned offset. The compiler therefore can only fold small or aligned offsets into loads and stores. For locals and args, their offsets to SP are not known until very late, and the compiler makes conservative decision not folding some of them. However, in most cases, the offset is indeed small or aligned, and can be folded into load and store (but actually not). This CL adds support of loads and stores with large and unaligned offsets. When the offset doesn't fit into the instruction, it uses two instructions and (for very large offset) the constant pool. This way, the compiler doesn't need to be conservative, and can simply fold the offset. To make it work, the assembler's optab matching rules need to be changed. Before, MOVD accepts C_UAUTO32K which matches multiple of 8 between 0 and 32K, and also C_UAUTO16K, which may not be multiple of 8 and does not fit into MOVD instruction. The assembler errors in the latter case. This change makes it only matches multiple of 8 (or offsets within ±256, which also fits in instruction), and uses the large-or-unaligned-offset rule for things doesn't fit (without error). Other sized move rules are changed similarly. Class C_UAUTO64K and C_UOREG64K are removed, as they are never used. In shared library, load/store of global is rewritten to using GOT and temp register, which conflicts with the use of temp register for assembling large offset. So the folding is disabled for globals in shared library mode. Reduce cmd/go binary size by 2%. name old time/op new time/op delta BinaryTree17-8 8.67s ± 0% 8.61s ± 0% -0.60% (p=0.000 n=9+10) Fannkuch11-8 6.24s ± 0% 6.19s ± 0% -0.83% (p=0.000 n=10+9) FmtFprintfEmpty-8 116ns ± 0% 116ns ± 0% ~ (all equal) FmtFprintfString-8 196ns ± 0% 192ns ± 0% -1.89% (p=0.000 n=10+10) FmtFprintfInt-8 199ns ± 0% 198ns ± 0% -0.35% (p=0.001 n=9+10) FmtFprintfIntInt-8 294ns ± 0% 293ns ± 0% -0.34% (p=0.000 n=8+8) FmtFprintfPrefixedInt-8 318ns ± 1% 318ns ± 1% ~ (p=1.000 n=10+10) FmtFprintfFloat-8 537ns ± 0% 531ns ± 0% -1.17% (p=0.000 n=9+10) FmtManyArgs-8 1.19µs ± 1% 1.18µs ± 1% -1.41% (p=0.001 n=10+10) GobDecode-8 17.2ms ± 1% 17.3ms ± 2% ~ (p=0.165 n=10+10) GobEncode-8 14.7ms ± 1% 14.7ms ± 2% ~ (p=0.631 n=10+10) Gzip-8 837ms ± 0% 836ms ± 0% -0.14% (p=0.006 n=9+10) Gunzip-8 141ms ± 0% 139ms ± 0% -1.24% (p=0.000 n=9+10) HTTPClientServer-8 256µs ± 1% 253µs ± 1% -1.35% (p=0.000 n=10+10) JSONEncode-8 40.1ms ± 1% 41.3ms ± 1% +3.06% (p=0.000 n=10+9) JSONDecode-8 157ms ± 1% 156ms ± 1% -0.83% (p=0.001 n=9+8) Mandelbrot200-8 8.94ms ± 0% 8.94ms ± 0% +0.02% (p=0.000 n=9+9) GoParse-8 8.69ms ± 0% 8.54ms ± 1% -1.69% (p=0.000 n=8+10) RegexpMatchEasy0_32-8 227ns ± 1% 228ns ± 1% +0.48% (p=0.016 n=10+9) RegexpMatchEasy0_1K-8 1.92µs ± 0% 1.63µs ± 0% -15.08% (p=0.000 n=10+9) RegexpMatchEasy1_32-8 256ns ± 0% 251ns ± 0% -2.19% (p=0.000 n=10+9) RegexpMatchEasy1_1K-8 2.38µs ± 0% 2.09µs ± 0% -12.49% (p=0.000 n=10+9) RegexpMatchMedium_32-8 352ns ± 0% 354ns ± 0% +0.39% (p=0.002 n=10+9) RegexpMatchMedium_1K-8 106µs ± 0% 106µs ± 0% -0.05% (p=0.005 n=10+9) RegexpMatchHard_32-8 5.92µs ± 0% 5.89µs ± 0% -0.40% (p=0.000 n=9+8) RegexpMatchHard_1K-8 180µs ± 0% 179µs ± 0% -0.14% (p=0.000 n=10+9) Revcomp-8 1.20s ± 0% 1.13s ± 0% -6.29% (p=0.000 n=9+8) Template-8 159ms ± 1% 154ms ± 1% -3.14% (p=0.000 n=9+10) TimeParse-8 800ns ± 3% 769ns ± 1% -3.91% (p=0.000 n=10+10) TimeFormat-8 826ns ± 2% 817ns ± 2% -1.04% (p=0.050 n=10+10) [Geo mean] 145µs 143µs -1.79% Change-Id: I5fc42087cee9b54ea414f8ef6d6d020b80eb5985 Reviewed-on: https://go-review.googlesource.com/42172 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com>	2017-05-09 19:41:00 +00:00
Michael Munday	35cf3843a4	cmd/{asm,compile}: avoid zeroAuto clobbering flags on s390x This CL modifies how MOV[DWHB] instructions that store a constant to memory are assembled to avoid them clobbering the condition code (flags). It also modifies zeroAuto to use MOVD instructions instead of CLEAR (which is assembled as XC). MOV[DWHB]storeconst ops also no longer clobbers flags. Note: this CL modifies the assembler so that it can no longer handle immediates outside the range of an int16 or offsets from SB, which reflects what the machine instructions support. The compiler doesn't need this capability any more and I don't think this affects any existing assembly, but it is easy to workaround if it does. Fixes #20187. Change-Id: Ie54947ff38367bd6a19962bf1a6d0296a4accffb Reviewed-on: https://go-review.googlesource.com/42179 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-05-02 17:43:31 +00:00
Ben Shi	38fbada557	cmd/compile/internal/ssa: more constant folding rules for ARM (ADDconst [c] x) && !isARMImmRot(uint32(c)) && isARMImmRot(uint32(-c)) -> (SUBconst [int64(int32(-c))] x) (SUBconst [c] x) && !isARMImmRot(uint32(c)) && isARMImmRot(uint32(-c)) -> (ADDconst [int64(int32(-c))] x) Currently a = a + 0xfffffff1 is compiled to （variable a is in R0) MVN $14, R11 ADD R11, R0, R0 After applying the above 2 rules, it becomes SUB $15, R0, R0 (BICconst [c] (BICconst [d] x)) -> (BICconst [int64(int32(c\|d))] x) This rule also optimizes the generated ARM code. The other rules are added to avoid to generate less optimized ARM code when substitutions ADD->SUB happen. Change-Id: I3ead9aae2b446b674e2ab42d37259d38ceb93a4d Reviewed-on: https://go-review.googlesource.com/41679 Reviewed-by: Keith Randall <khr@golang.org>	2017-04-29 02:53:46 +00:00
Josh Bleecher Snyder	dae5389d3d	Revert "cmd/compile: add Type.MustSize and Type.MustAlignment" This reverts commit `94d540a4b6`. Reason for revert: prefer something along the lines of CL 42018. Change-Id: I876fe32e98f37d8d725fe55e0fd0ea429c0198e0 Reviewed-on: https://go-review.googlesource.com/42022 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2017-04-28 01:24:13 +00:00
Josh Bleecher Snyder	94d540a4b6	cmd/compile: add Type.MustSize and Type.MustAlignment Type.Size and Type.Alignment are for the front end: They calculate size and alignment if needed. Type.MustSize and Type.MustAlignment are for the back end: They call Fatal if size and alignment are not already calculated. Most uses are of MustSize and MustAlignment, but that's because the back end is newer, and this API was added to support it. This CL was mostly generated with sed and selective reversion. The only mildly interesting bit is the change of the ssa.Type interface and the supporting ssa dummy types. Follow-up to review feedback on CL 41970. Passes toolstash-check. Change-Id: I0d9b9505e57453dae8fb6a236a07a7a02abd459e Reviewed-on: https://go-review.googlesource.com/42016 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-04-27 22:57:57 +00:00
Josh Bleecher Snyder	4ee934ad27	cmd/compile: remove references to *os.File from ssa package This reduces the size of the ssa export data by 10%, from 76154 to 67886. It doesn't appear that #20084, which would do this automatically, is going to be fixed soon. Do it manually for now. This speeds up compiling cmd/compile/internal/amd64 and presumably its comrades as well: name old time/op new time/op delta CompileAMD64 89.6ms ± 6% 86.7ms ± 5% -3.29% (p=0.000 n=49+47) name old user-time/op new user-time/op delta CompileAMD64 116ms ± 5% 112ms ± 5% -3.51% (p=0.000 n=45+42) name old alloc/op new alloc/op delta CompileAMD64 26.7MB ± 0% 25.8MB ± 0% -3.26% (p=0.008 n=5+5) name old allocs/op new allocs/op delta CompileAMD64 223k ± 0% 213k ± 0% -4.46% (p=0.008 n=5+5) Updates #20084 Change-Id: I49e8951c5bfce63ad2b7f4fc3bfa0868c53114f9 Reviewed-on: https://go-review.googlesource.com/41493 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-04-24 23:58:14 +00:00
Matthew Dempsky	c87520c598	cmd: remove IntSize and Widthint Use PtrSize and Widthptr instead. CL prepared mostly with sed and uniq. Passes toolstash-check -all. Fixes #19954. Change-Id: I09371bd7128672885cb8bc4e7f534ad56a88d755 Reviewed-on: https://go-review.googlesource.com/40506 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2017-04-22 17:43:43 +00:00
Todd Neal	716219ffd9	cmd/compile: remove dead code Change-Id: I2d287981d5fcef3aace948c405d618f46200948e Reviewed-on: https://go-review.googlesource.com/41450 Run-TryBot: Todd Neal <todd@tneal.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-04-22 01:18:40 +00:00
Matthew Dempsky	4eb48a336e	cmd/compile/internal/ssa: refactor ARM64 address folding These patterns are the only uses of isArg and isAuto, and they all follow a common pattern too. Extract out so that we can more easily tweak the interface for isArg/isAuto. Passes toolstash -cmp for linux/arm64. Change-Id: I9c509dabdc123c93cb1ad2f34fe8c12a9f313f6d Reviewed-on: https://go-review.googlesource.com/40490 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-04-12 21:21:38 +00:00
Josh Bleecher Snyder	4c1622082e	cmd/compile: don't catch panics during rewrite This is a holdover from the days when we did not have full SSA coverage and compiled things optimistically, and catching the panic obscures useful information. Change-Id: I196790cb6b97419d92b318a2dfa7f1e1097cefb7 Reviewed-on: https://go-review.googlesource.com/39534 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Keith Randall <khr@golang.org>	2017-04-04 23:37:17 +00:00
Keith Randall	63a72fd447	cmd/compile: strength-reduce floating point x2 -> x+x x/c, c power of 2 -> x(1/c) Fixes #19827 Change-Id: I74c9f0b5b49b2ed26c0990314c7d1d5f9631b6f1 Reviewed-on: https://go-review.googlesource.com/39295 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2017-04-03 21:27:03 +00:00
Keith Randall	86dc86b4f9	cmd/compile: don't merge load+op if other op arg is still live We want to merge a load and op into a single instruction l = LOAD ptr mem y = OP x l into y = OPload x ptr mem However, all of our OPload instructions require that y uses the same register as x. If x is needed past this instruction, then we must copy x somewhere else, losing the whole benefit of merging the instructions in the first place. Disable this optimization if x is live past the OP. Also disable this optimization if the OP is in a deeper loop than the load. Update #19595 Change-Id: I87f596aad7e91c9127bfb4705cbae47106e1e77a Reviewed-on: https://go-review.googlesource.com/38337 Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2017-03-23 15:53:04 +00:00
Josh Bleecher Snyder	2cdb7f118a	cmd/compile: move Frontend field from ssa.Config to ssa.Func Suggested by mdempsky in CL 38232. This allows us to use the Frontend field to associate frontend state and information with a function. See the following CL in the series for examples. This is a giant CL, but it is almost entirely routine refactoring. The ssa test API is starting to feel a bit unwieldy. I will clean it up separately, once the dust has settled. Passes toolstash -cmp. Updates #15756 Change-Id: I71c573bd96ff7251935fce1391b06b1f133c3caf Reviewed-on: https://go-review.googlesource.com/38327 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2017-03-17 23:18:57 +00:00
Josh Bleecher Snyder	193510f2f6	cmd/compile: evaluate config as needed in rewrite rules Prior to this CL, config was an explicit argument to the SSA rewrite rules, and rules that needed a Frontend got at it via config. An upcoming CL moves Frontend from Config to Func, so rules can no longer reach Frontend via Config. Passing a Frontend as an argument to the rewrite rules causes a 2-3% regression in compile times. This CL takes a different approach: It treats the variable names "config" and "fe" as special and calculates them as needed. The "as needed part" is also important to performance: If they are calculated eagerly, the nilchecks themselves cause a regression. This introduces a little bit of magic into the rewrite generator. However, from the perspective of the rules, the config variable was already more or less magic. And it makes the upcoming changes much clearer. Passes toolstash -cmp. Change-Id: I173f2bcc124cba43d53138bfa3775e21316a9107 Reviewed-on: https://go-review.googlesource.com/38326 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2017-03-17 22:35:29 +00:00
philhofer	295307ae78	cmd/compile: de-virtualize interface calls With this change, code like h := sha1.New() h.Write(buf) sum := h.Sum() gets compiled into static calls rather than interface calls, because the compiler is able to prove that 'h' is really a *sha1.digest. The InterCall re-write rule hits a few dozen times during make.bash, and hundreds of times during all.bash. The most common pattern identified by the compiler is a constructor like func New() Interface { return &impl{...} } where the constructor gets inlined into the caller, and the result is used immediately. Examples include {sha1,md5,crc32,crc64,...}.New, base64.NewEncoder, base64.NewDecoder, errors.New, net.Pipe, and so on. Some existing benchmarks that change on darwin/amd64: Crc64/ISO4KB-8 2.67µs ± 1% 2.66µs ± 0% -0.36% (p=0.015 n=10+10) Crc64/ISO1KB-8 694ns ± 0% 690ns ± 1% -0.59% (p=0.001 n=10+10) Adler32KB-8 473ns ± 1% 471ns ± 0% -0.39% (p=0.010 n=10+9) On architectures like amd64, the reduction in code size appears to contribute more to benchmark improvements than just removing the indirect call, since that branch gets predicted accurately when called in a loop. Updates #19361 Change-Id: I57d4dc21ef40a05ec0fbd55a9bb0eb74cdc67a3d Reviewed-on: https://go-review.googlesource.com/38139 Run-TryBot: Philip Hofer <phofer@umich.edu> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2017-03-14 18:49:23 +00:00
David Chase	b59a405656	Revert "cmd/compile: de-virtualize interface calls" This reverts commit `4e0c7c3f61`. Reason for revert: The presence-of-optimization test program is fragile, breaks under noopt, and might break if the Go libraries are tweaked. It needs to be (re)written without reference to other packages. Change-Id: I3aaf1ab006a1a255f961a978e9c984341740e3c7 Reviewed-on: https://go-review.googlesource.com/38097 Reviewed-by: Keith Randall <khr@golang.org>	2017-03-13 21:15:32 +00:00
Philip Hofer	4e0c7c3f61	cmd/compile: de-virtualize interface calls With this change, code like h := sha1.New() h.Write(buf) sum := h.Sum() gets compiled into static calls rather than interface calls, because the compiler is able to prove that 'h' is really a *sha1.digest. The InterCall re-write rule hits a few dozen times during make.bash, and hundreds of times during all.bash. The most common pattern identified by the compiler is a constructor like func New() Interface { return &impl{...} } where the constructor gets inlined into the caller, and the result is used immediately. Examples include {sha1,md5,crc32,crc64,...}.New, base64.NewEncoder, base64.NewDecoder, errors.New, net.Pipe, and so on. Some existing benchmarks that change on darwin/amd64: Crc64/ISO4KB-8 2.67µs ± 1% 2.66µs ± 0% -0.36% (p=0.015 n=10+10) Crc64/ISO1KB-8 694ns ± 0% 690ns ± 1% -0.59% (p=0.001 n=10+10) Adler32KB-8 473ns ± 1% 471ns ± 0% -0.39% (p=0.010 n=10+9) On architectures like amd64, the reduction in code size appears to contribute more to benchmark improvements than just removing the indirect call, since that branch gets predicted accurately when called in a loop. Updates #19361 Change-Id: Ia9d30afdd5f6b4d38d38b14b88f308acae8ce7ed Reviewed-on: https://go-review.googlesource.com/37751 Run-TryBot: Philip Hofer <phofer@umich.edu> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-13 18:24:57 +00:00
Philip Hofer	b0e91d836a	cmd/compile: clean up ssa.Value memory arg usage This change adds a method to replace expressions of the form v.Args[len(v.Args)-1] so that the code's intention to walk memory arguments is explicit. Passes toolstash-check. Change-Id: I0c80d73bc00989dd3cdf72b4f2c8e1075a2515e0 Reviewed-on: https://go-review.googlesource.com/37757 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-09 21:40:47 +00:00
Cherry Zhang	c8eaeb8cba	cmd/compile: remove zeroing after newobject The Zero op right after newobject has been removed. But this rule does not cover Store of constant zero (for SSA-able types). Add rules to cover Store op as well. Updates #19027. Change-Id: I5d2b62eeca0aa9ce8dc7205b264b779de01c660b Reviewed-on: https://go-review.googlesource.com/36836 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-03 20:36:54 +00:00
Keith Randall	708ba22a0c	cmd/compile: move constant divide strength reduction to SSA rules Currently the conversion from constant divides to multiplies is mostly done during the walk pass. This is suboptimal because SSA can determine that the value being divided by is constant more often (e.g. after inlining). Change-Id: If1a9b993edd71be37396b9167f77da271966f85f Reviewed-on: https://go-review.googlesource.com/37015 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2017-02-17 06:16:44 +00:00
Keith Randall	01c8719f8b	cmd/compile: move rotate instruction generation to SSA Remove rotate generation from walk. Remove OLROT and ssa.Lrot* opcodes. Generate rotates during SSA lowering for architectures that have them. This CL will allow rotates to be generated in more situations, like when the shift values are determined to be constant only after some analysis. Fixes #18254 Change-Id: I8d6d684ff5ce2511aceaddfda98b908007851079 Reviewed-on: https://go-review.googlesource.com/34232 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-02-02 17:57:15 +00:00
Robert Griesemer	cfd17f51c8	[dev.inline] cmd/compile/internal/ssa: rename various fields from Line to Pos This is a mostly mechanical rename followed by manual fixes where necessary. Change-Id: Ie5c670b133db978f15dc03e50dc2da0c80fc8842 Reviewed-on: https://go-review.googlesource.com/34137 Reviewed-by: David Lazar <lazard@golang.org>	2016-12-08 21:36:52 +00:00
Josh Bleecher Snyder	1abefc1ff0	cmd/compile: clean up rule logging helpers Introduced in CLs 29380 and 30011. Change-Id: I3d3641e8748ce0adb57b087a1fcd62f295ade665 Reviewed-on: https://go-review.googlesource.com/31933 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-10-25 13:33:57 +00:00
Michael Munday	1cfb5c3fd5	cmd/compile: merge loads into operations on s390x Adds the new canMergeLoad function which can be used by rules to decide whether a load can be merged into an operation. The function ensures that the merge will not reorder the load relative to memory operations (for example, stores) in such a way that the block can no longer be scheduled. This new function enables transformations such as: MOVD 0(R1), R2 ADD R2, R3 to: ADD 0(R1), R3 The two-operand form of the following instructions can now read a single memory operand: - ADD - ADDC - ADDW - MULLD - MULLW - SUB - SUBC - SUBE - SUBW - AND - ANDW - OR - ORW - XOR - XORW Improves SHA3 performance by 6-8%. Updates #15054. Change-Id: Ibcb9122126cd1a26f2c01c0dfdbb42fe5e7b5b94 Reviewed-on: https://go-review.googlesource.com/29272 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-10-17 19:45:20 +00:00
David Chase	2f0b8f88df	cmd/compile: PPC64, elide unnecessary sign extension Inputs to store[BHW] and cmpW(U) need not be correct in more bits than are used by the instruction. Added a pattern tailored to what appears to be cgo boilerplate. Added a pattern (also seen in cgo boilerplate and hashing) to replace {EQ,NE}-CMP-ANDconst with {EQ-NE}-ANDCCconst. Added a pattern to clean up ANDconst shift distance inputs (this was seen in hashing). Simplify repeated and,or,xor. Fixes #17109. Change-Id: I68eac83e3e614d69ffe473a08953048c8b066d88 Reviewed-on: https://go-review.googlesource.com/30455 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-10-10 12:22:40 +00:00
Cherry Zhang	4d07d3e29c	cmd/compile: re-enable nilcheck removal for newobject Also add compiler debug ouput and add a test. Fixes #15390. Change-Id: Iceba1414c29bcc213b87837387bf8ded1f3157f1 Reviewed-on: https://go-review.googlesource.com/30011 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-09-28 19:41:49 +00:00
David Chase	3390294308	cmd/compile: PPC64, find compare-with-immediate Added rules for compare double and word immediate, including those that use invertflags to cope with flipped operands. Change-Id: I594430a210e076e52299a2cc6ab074dbb04a02bd Reviewed-on: https://go-review.googlesource.com/29763 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>	2016-09-26 18:07:33 +00:00
David Chase	cddddbc623	cmd/compile: use ISEL, cleanup use of zero & extensions Abandoned earlier efforts to expose zero register, but left it in numbering to decrease squirrelyness of register allocator. ISELrelOp used in code generation of bool := x relOp y. Some patterns added to better elide zero case and some sign extension. Updates: #17109 Change-Id: Ida7839f0023ca8f0ffddc0545f0ac269e65b05d9 Reviewed-on: https://go-review.googlesource.com/29380 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-09-22 17:36:39 +00:00
Keith Randall	dd24b1098a	cmd/compile: improve tighten pass Move a value to the block which is the lowest common ancestor in the dominator tree of all of its uses. Make sure not to move a value into a loop. Makes the tighten pass on average (across go1 benchmarks) 40% slower. Still not a big contributor to overall compile time. Binary size is just a tad smaller. name old time/op new time/op delta BinaryTree17-12 2.77s ± 9% 2.76s ± 9% ~ (p=0.878 n=8+8) Fannkuch11-12 2.75s ± 1% 2.74s ± 1% ~ (p=0.232 n=8+7) FmtFprintfEmpty-12 48.9ns ± 9% 47.7ns ± 0% ~ (p=0.431 n=8+8) FmtFprintfString-12 143ns ± 8% 142ns ± 1% ~ (p=0.257 n=8+7) FmtFprintfInt-12 123ns ± 1% 122ns ± 1% -1.04% (p=0.026 n=7+8) FmtFprintfIntInt-12 195ns ± 7% 185ns ± 0% -5.32% (p=0.000 n=8+8) FmtFprintfPrefixedInt-12 194ns ± 4% 195ns ± 0% +0.81% (p=0.015 n=7+7) FmtFprintfFloat-12 267ns ± 0% 268ns ± 0% +0.37% (p=0.001 n=7+6) FmtManyArgs-12 800ns ± 0% 762ns ± 1% -4.78% (p=0.000 n=8+8) GobDecode-12 7.67ms ± 2% 7.60ms ± 2% ~ (p=0.234 n=8+8) GobEncode-12 6.55ms ± 0% 6.57ms ± 1% ~ (p=0.336 n=7+8) Gzip-12 237ms ± 0% 238ms ± 0% +0.40% (p=0.017 n=7+7) Gunzip-12 40.8ms ± 0% 40.2ms ± 0% -1.52% (p=0.000 n=7+8) HTTPClientServer-12 208µs ± 3% 209µs ± 3% ~ (p=0.955 n=8+7) JSONEncode-12 16.2ms ± 1% 17.2ms ±11% +5.80% (p=0.001 n=7+8) JSONDecode-12 57.3ms ±12% 55.5ms ± 3% ~ (p=0.867 n=8+7) Mandelbrot200-12 4.68ms ± 6% 4.46ms ± 1% ~ (p=0.442 n=8+8) GoParse-12 4.27ms ±44% 3.42ms ± 1% -19.95% (p=0.005 n=8+8) RegexpMatchEasy0_32-12 75.1ns ± 0% 75.8ns ± 1% +0.99% (p=0.002 n=7+7) RegexpMatchEasy0_1K-12 963ns ± 0% 1021ns ± 6% +5.98% (p=0.001 n=7+7) RegexpMatchEasy1_32-12 72.4ns ±11% 70.8ns ± 1% ~ (p=0.368 n=8+8) RegexpMatchEasy1_1K-12 394ns ± 1% 399ns ± 0% +1.23% (p=0.000 n=8+7) RegexpMatchMedium_32-12 114ns ± 0% 115ns ± 1% +0.63% (p=0.021 n=7+7) RegexpMatchMedium_1K-12 35.9µs ± 0% 37.6µs ± 1% +4.72% (p=0.000 n=7+8) RegexpMatchHard_32-12 1.93µs ± 2% 1.91µs ± 0% -0.91% (p=0.001 n=7+7) RegexpMatchHard_1K-12 60.2µs ± 3% 61.2µs ±10% ~ (p=0.442 n=8+8) Revcomp-12 404ms ± 1% 406ms ± 1% ~ (p=0.054 n=8+7) Template-12 64.6ms ± 1% 63.5ms ± 1% -1.66% (p=0.000 n=8+8) TimeParse-12 347ns ± 8% 309ns ± 0% -11.13% (p=0.000 n=8+7) TimeFormat-12 343ns ± 4% 331ns ± 0% -3.34% (p=0.000 n=8+7) Change-Id: Id6da1239ddd4d0cb074ff29cffb06302d1c6d08f Reviewed-on: https://go-review.googlesource.com/28712 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-09-20 22:49:48 +00:00
Michael Munday	6ec993adc3	cmd/compile: add SSA backend for s390x and enable by default The new SSA backend modifies the ABI slightly: R0 is now a usable general purpose register. Fixes #16677. Change-Id: I367435ce921e0c7e79e021c80cf8ef5d1d1466cf Reviewed-on: https://go-review.googlesource.com/28978 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-09-13 19:39:38 +00:00
Cherry Zhang	b2e0e9688a	cmd/compile: remove Zero and NilCheck for newobject Recognize runtime.newobject and don't Zero or NilCheck it. Fixes #15914 (?) Updates #15390. TBD: add test Change-Id: Ia3bfa5c2ddbe2c27c92d9f68534a713b5ce95934 Reviewed-on: https://go-review.googlesource.com/27930 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-08-30 23:10:43 +00:00
Cherry Zhang	659dd4f1d7	cmd/compile: add more ARM64 optimizations - Use machine instructions for uint64<->float conversions - Do not enforce alignment on Zero/Move ARM64 supports unaligned load/stores, but only aligned offset or small offset can be encoded into instructions. - Do combined loads Change-Id: Iffca7dd0f13070b17b784861ce5a30af584680eb Reviewed-on: https://go-review.googlesource.com/27086 Reviewed-by: David Chase <drchase@google.com>	2016-08-17 18:44:39 +00:00
Keith Randall	d2286ea284	[dev.ssa] Merge remote-tracking branch 'origin/master' into mergebranch Semi-regular merge from tip into dev.ssa. Change-Id: Iadb60e594ef65a99c0e1404b14205fa67c32a9e9	2016-08-04 10:08:20 -07:00
Cherry Zhang	111d590f86	cmd/compile: fix possible spill of invalid pointer with DUFFZERO on AMD64 SSA compiler on AMD64 may spill Duff-adjusted address as scalar. If the object is on stack and the stack moves, the spilled address become invalid. Making the spill pointer-typed does not work. The Duff-adjusted address points to the memory before the area to be zeroed and may be invalid. This may cause stack scanning code panic. Fix it by doing Duff-adjustment in genValue, so the intermediate value is not seen by the reg allocator, and will not be spilled. Add a test to cover both cases. As it depends on allocation, it may be not always triggered. Fixes #16515. Change-Id: Ia81d60204782de7405b7046165ad063384ede0db Reviewed-on: https://go-review.googlesource.com/25309 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-07-29 01:09:55 +00:00
Cherry Zhang	83208504fe	[dev.ssa] cmd/compile: add more on ARM64 SSA Support the following: - Shifts. ARM64 machine instructions only use lowest 6 bits of the shift (i.e. mod 64). Use conditional selection instruction to ensure Go semantics. - Zero/Move. Alignment is ensured. - Hmul, Avg64u, Sqrt. - reserve R18 (platform register in ARM64 ABI) and R29 (frame pointer in ARM64 ABI). Everything compiles, all.bash passed (with non-SSA test disabled). Change-Id: Ia8ed58dae5cbc001946f0b889357b258655078b1 Reviewed-on: https://go-review.googlesource.com/25290 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-07-27 16:37:23 +00:00
David Chase	7bca2c599d	[dev.ssa] cmd/compile: some improvements to PPC codegen Runs fibonacci for all integer types. Fold addressing arithmetic into stores. Updates #16010. Change-Id: I257982c82c00c80b00679757c3da345045968022 Reviewed-on: https://go-review.googlesource.com/25103 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: David Chase <drchase@google.com>	2016-07-22 15:52:06 +00:00

1 2

97 commits