Stowage/go - Remotebranch.eu

Stowage/go

mirror of https://github.com/golang/go.git synced 2025-12-08 06:10:04 +00:00

Author	SHA1	Message	Date
Lynn Boger	a8b2e4a630	cmd/compile: improve LoweredMove performance on ppc64x This change improves the performance for LoweredMove on ppc64le and ppc64. benchmark old ns/op new ns/op delta BenchmarkCopyFat8-16 0.93 0.69 -25.81% BenchmarkCopyFat12-16 2.61 1.85 -29.12% BenchmarkCopyFat16-16 9.68 1.89 -80.48% BenchmarkCopyFat24-16 4.48 1.85 -58.71% BenchmarkCopyFat32-16 6.12 1.82 -70.26% BenchmarkCopyFat64-16 21.2 2.70 -87.26% BenchmarkCopyFat128-16 29.6 3.97 -86.59% BenchmarkCopyFat256-16 52.6 13.4 -74.52% BenchmarkCopyFat512-16 97.1 18.7 -80.74% BenchmarkCopyFat1024-16 186 35.3 -81.02% BenchmarkAssertE2TLarge-16 14.2 5.06 -64.37% Fixes #19785 Change-Id: I7d5e0052712b75811c02c7d86c5112e5649ad782 Reviewed-on: https://go-review.googlesource.com/38950 Reviewed-by: Keith Randall <khr@golang.org>	2017-03-31 21:24:09 +00:00
Ben Shi	8577f81a10	cmd/compile/internal: Optimization with RBIT and REV By checking GOARM in ssa/gen/ARM.rules, each intermediate operator can be implemented via different instruction serials. It is up to the user to choose between compitability and efficiency. The Bswap32(x) is optimized to REV(x) when GOARM >= 6. The CTZ(x) is optimized to CLZ(RBIT x) when GOARM == 7. Change-Id: Ie9ee645fa39333fa79ad84ed4d1cefac30422814 Reviewed-on: https://go-review.googlesource.com/35610 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-03-31 15:10:24 +00:00
Keith Randall	68da265c8e	Revert "cmd/compile: automatically handle commuting ops in rewrite rules" This reverts commit `041ecb697f`. Reason for revert: Not working on S390x and some 386 archs. I have a guess why the S390x is failing. No clue on the 386 yet. Revert until I can figure it out. Change-Id: I64f1ce78fa6d1037ebe7ee2a8a8107cb4c1db70c Reviewed-on: https://go-review.googlesource.com/38790 Reviewed-by: Keith Randall <khr@golang.org>	2017-03-29 18:06:44 +00:00
Keith Randall	041ecb697f	cmd/compile: automatically handle commuting ops in rewrite rules We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. [Note to reviewers: check these carefully. Most of the other rule changes are trivial.] Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: I999b1307272e91965b66754576019dedcbe7527a Reviewed-on: https://go-review.googlesource.com/38666 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2017-03-29 16:22:09 +00:00
Lynn Boger	23bd919136	cmd/compile: improve LoweredZero performance for ppc64x This change improves the performance of the LoweredZero rule on ppc64x. The improvement can be seen in the runtime ClearFat benchmarks: BenchmarkClearFat12-16 2.40 0.69 -71.25% BenchmarkClearFat16-16 9.98 0.93 -90.68% BenchmarkClearFat24-16 4.75 0.93 -80.42% BenchmarkClearFat32-16 6.02 0.93 -84.55% BenchmarkClearFat40-16 7.19 1.16 -83.87% BenchmarkClearFat48-16 15.0 1.39 -90.73% BenchmarkClearFat56-16 9.95 1.62 -83.72% BenchmarkClearFat64-16 18.0 1.86 -89.67% BenchmarkClearFat128-16 30.0 8.08 -73.07% BenchmarkClearFat256-16 52.5 11.3 -78.48% BenchmarkClearFat512-16 97.0 19.0 -80.41% BenchmarkClearFat1024-16 244 34.2 -85.98% Fixes: #19532 Change-Id: If493e28bc1d8e61bc79978498be9f5336a36cd3f Reviewed-on: https://go-review.googlesource.com/38096 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <munday@ca.ibm.com>	2017-03-21 15:08:02 +00:00
Michael Munday	17570a9afb	cmd/compile: emit fused multiply-{add,subtract} on ppc64x A follow on to CL 36963 adding support for ppc64x. Performance changes (as posted on the issue): poly1305: benchmark old ns/op new ns/op delta Benchmark64-16 172 151 -12.21% Benchmark1K-16 1828 1523 -16.68% Benchmark64Unaligned-16 172 151 -12.21% Benchmark1KUnaligned-16 1827 1523 -16.64% math: BenchmarkAcos-16 43.9 39.9 -9.11% BenchmarkAcosh-16 57.0 45.8 -19.65% BenchmarkAsin-16 35.8 33.0 -7.82% BenchmarkAsinh-16 68.6 60.8 -11.37% BenchmarkAtan-16 19.8 16.2 -18.18% BenchmarkAtanh-16 65.5 57.5 -12.21% BenchmarkAtan2-16 45.4 34.2 -24.67% BenchmarkGamma-16 37.6 26.0 -30.85% BenchmarkLgamma-16 40.0 28.2 -29.50% BenchmarkLog1p-16 35.1 29.1 -17.09% BenchmarkSin-16 22.7 18.4 -18.94% BenchmarkSincos-16 31.7 23.7 -25.24% BenchmarkSinh-16 146 131 -10.27% BenchmarkY0-16 130 107 -17.69% BenchmarkY1-16 127 107 -15.75% BenchmarkYn-16 278 235 -15.47% Updates #17895. Change-Id: I1c16199715d20c9c4bd97c4a950bcfa69eb688c1 Reviewed-on: https://go-review.googlesource.com/38095 Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>	2017-03-20 20:01:29 +00:00
Keith Randall	42e97468a1	cmd/compile: intrinsic for math/bits.Reverse on ARM64 I don't know that it exists for any other architectures. Update #18616 Change-Id: Idfe5dee251764d32787915889ec0be4bebc5be24 Reviewed-on: https://go-review.googlesource.com/38323 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-03-17 18:07:18 +00:00
Keith Randall	495b167919	cmd/compile: intrinsics for math/bits.{Len,LeadingZeros} name old time/op new time/op delta LeadingZeros-4 2.00ns ± 0% 1.34ns ± 1% -33.02% (p=0.000 n=8+10) LeadingZeros16-4 1.62ns ± 0% 1.57ns ± 0% -3.09% (p=0.001 n=8+9) LeadingZeros32-4 2.14ns ± 0% 1.48ns ± 0% -30.84% (p=0.002 n=8+10) LeadingZeros64-4 2.06ns ± 1% 1.33ns ± 0% -35.08% (p=0.000 n=8+8) 8-bit args is a special case - the Go code is really fast because it is just a single table lookup. So I've disabled that for now. Intrinsics were actually slower: LeadingZeros8-4 1.22ns ± 3% 1.58ns ± 1% +29.56% (p=0.000 n=10+10) Update #18616 Change-Id: Ia9c289b9ba59c583ea64060470315fd637e814cf Reviewed-on: https://go-review.googlesource.com/38311 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-03-16 22:53:49 +00:00
Cherry Zhang	c8f38b3398	cmd/compile: use type information in Aux for Store size Remove size AuxInt in Store, and alignment in Move/Zero. We still pass size AuxInt to Move/Zero, as it is used for partial Move/Zero lowering (e.g. cmd/compile/internal/ssa/gen/386.rules:288). SizeAndAlign is gone. Passes "toolstash -cmp" on std. Change-Id: I1ca34652b65dd30de886940e789fcf41d521475d Reviewed-on: https://go-review.googlesource.com/38150 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-16 14:25:04 +00:00
Cherry Zhang	211c8c9f1a	cmd/compile: pass types on SSA Store/Move/Zero ops For SSA Store/Move/Zero ops, attach the type of the value being stored to the op as the Aux field. This type will be used for write barrier insertion (in a followup CL). Since SSA passes do not accurately propagate types of values (because of type casting), we can't simply use type of the store's arguments for write barrier insertion. Passes "toolstash -cmp" on std. Updates #17583. Change-Id: I051d5e5c482931640d1d7d879b2a6bb91f2e0056 Reviewed-on: https://go-review.googlesource.com/36838 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-16 14:22:53 +00:00
Matthew Dempsky	91d08e3bca	cmd/compile/internal/ssa: remove unused OpFunc Change-Id: I0f7eec2e0c15a355422d5ae7289508a5bd33b971 Reviewed-on: https://go-review.googlesource.com/38171 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-03-14 19:28:25 +00:00
Matthew Dempsky	691755304c	cmd/compile/internal/ssa: populate SymEffects for SSA Ops Changes to ${GOARCH}Ops.go files were mechanically produced using github.com/mdempsky/ssa-symops, a one-off tool that inserts "SymEffect: X" elements by pattern matching against the Op names. Change-Id: Ibf3e481ffd588647f2a31662d72114b740ccbfcf Reviewed-on: https://go-review.googlesource.com/38084 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-14 18:34:45 +00:00
Matthew Dempsky	1cdf4bf33f	cmd/compile/internal/ssa: add SymEffect attribute to SSA Ops To replace the progeffects tables for liveness analysis. Change-Id: Idc4b990665cb0a9aa300d62cdf8ad12e51c5b991 Reviewed-on: https://go-review.googlesource.com/38083 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-14 18:34:38 +00:00
Matthew Dempsky	cc71aa9ac4	cmd/compile/internal/ssa: make ARM's udiv like other calls Passes toolstash-check -all. Change-Id: Id389f8158cf33a3c0fcef373615b5351e7c74b5b Reviewed-on: https://go-review.googlesource.com/38082 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-13 21:29:02 +00:00
Matthew Dempsky	08d8d5c986	cmd/compile/internal/ssa: replace {Defer,Go}Call with StaticCall Passes toolstash-check -all. Change-Id: Icf8b75364e4761a5e56567f503b2c1cb17382ed2 Reviewed-on: https://go-review.googlesource.com/38080 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-13 19:44:36 +00:00
Cherry Zhang	6fd5e2549a	cmd/compile: mark MOVWF/MOVFW clobbering F15 on ARM The assembler back end uses F15 as a temporary register in these instructions. Checked the assembler back end and made sure that this is the only case clobbering F15. Fixes #19403. Change-Id: I02b9e00fdd9229db899f501c8e9b306e02912d83 Reviewed-on: https://go-review.googlesource.com/37792 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-05 18:31:27 +00:00
Matthew Dempsky	02e36f8c87	cmd/compile/internal/ssa: remove Hmul{8,16}{,u} ops Change-Id: I90865921584ae4bdfb6c220d439b14593d72b6f9 Reviewed-on: https://go-review.googlesource.com/37752 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-03-03 20:47:36 +00:00
Cherry Zhang	5bfd1ef036	cmd/compile: get rid of "volatile" in SSA A value is "volatile" if it is a pointer to the argument region on stack which will be clobbered by function call. This is used to make sure the value is safe when inserting write barrier calls. The writebarrier pass can tell whether a value is such a pointer. Therefore no need to mark it when building SSA and thread this information through. Passes "toolstash -cmp" on std. Updates #17583. Change-Id: Idc5fc0d710152b94b3c504ce8db55ea9ff5b5195 Reviewed-on: https://go-review.googlesource.com/36835 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-03 13:26:15 +00:00
Keith Randall	13c35a1b20	cmd/compile: ppc64x no longer needs a scratch stack location After https://go-review.googlesource.com/c/36725/, ppc64x no longer needs a temp stack location for int reg <-> fp reg moves. Update #18922 Change-Id: Ib4319784f7a855f593dfa5231604ca2c24e4c882 Reviewed-on: https://go-review.googlesource.com/37651 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>	2017-03-01 22:14:21 +00:00
Lynn Boger	95c9583a18	cmd/compile: intrinsify atomics on ppc64x This adds the necessary changes so that atomics are treated as intrinsics on ppc64x. The implementations of And8 and Or8 require power8 for both ppc64 and ppc64le. This is a new requirement for ppc64. Fixes #8739 Change-Id: Icb85e2755a49166ee3652668279f6ed5ebbca901 Reviewed-on: https://go-review.googlesource.com/36832 Reviewed-by: Keith Randall <khr@golang.org>	2017-03-01 19:56:01 +00:00
Josh Bleecher Snyder	2183135554	cmd/compile: recognize bit test patterns on amd64 Updates #18943 Change-Id: If3080d6133bb6d2710b57294da24c90251ab4e08 Reviewed-on: https://go-review.googlesource.com/36329 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-01 00:36:04 +00:00
Michael Munday	bd8a39b67a	cmd/compile: emit fused multiply-{add,subtract} instructions on s390x Explcitly block fused multiply-add pattern matching when a cast is used after the multiplication, for example: - (a * b) + c // can emit fused multiply-add - float64(a * b) + c // cannot emit fused multiply-add float{32,64} and complex{64,128} casts of matching types are now kept as OCONV operations rather than being replaced with OCONVNOP operations because they now imply a rounding operation (and therefore aren't a no-op anymore). Operations (for example, multiplication) on complex types may utilize fused multiply-add and -subtract instructions internally. There is no way to disable this behavior at the moment. Improves the performance of the floating point implementation of poly1305: name old speed new speed delta 64 246MB/s ± 0% 275MB/s ± 0% +11.48% (p=0.000 n=10+8) 1K 312MB/s ± 0% 357MB/s ± 0% +14.41% (p=0.000 n=10+10) 64Unaligned 246MB/s ± 0% 274MB/s ± 0% +11.43% (p=0.000 n=10+10) 1KUnaligned 312MB/s ± 0% 357MB/s ± 0% +14.39% (p=0.000 n=10+8) Updates #17895. Change-Id: Ia771d275bb9150d1a598f8cc773444663de5ce16 Reviewed-on: https://go-review.googlesource.com/36963 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-02-28 15:34:20 +00:00
David Chase	11b283092a	cmd/compile: add opcode flag hasSideEffects for do-not-remove Added a flag to generic and various architectures' atomic operations that are judged to have observable side effects and thus cannot be dead-code-eliminated. Test requires GOMAXPROCS > 1 without preemption in loop. Fixes #19182. Change-Id: Id2230031abd2cca0bbb32fd68fc8a58fb912070f Reviewed-on: https://go-review.googlesource.com/37333 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-02-22 15:15:47 +00:00
Keith Randall	cfb0d34992	cmd/compile: amd64, allow XCHG on stack pointers XCHG needs to allow the stack pointer as an argument because we have a rewrite that incorporates the address of a local variable into the instruction. Fixes #19184 Change-Id: Ic438e6e1946332cdce3864d15abecd41b911b2a9 Reviewed-on: https://go-review.googlesource.com/37253 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-02-19 17:16:01 +00:00
Ilya Tocar	21c71d7788	cmd/compile/internal/ssa: combine load + op on AMD64 On AMD64 Most operation can have one operand in memory. Combine load and dependand operation into one new operation, where possible. I've seen no significant performance changes on go1, but this allows to remove ~1.8kb code from go tool. And in math package I see e. g.: Remainder-6 70.0ns ± 0% 64.6ns ± 0% -7.76% (p=0.000 n=9+1 Change-Id: I88b8602b1d55da8ba548a34eb7da4b25d59a297e Reviewed-on: https://go-review.googlesource.com/36793 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-02-17 22:21:49 +00:00
Keith Randall	708ba22a0c	cmd/compile: move constant divide strength reduction to SSA rules Currently the conversion from constant divides to multiplies is mostly done during the walk pass. This is suboptimal because SSA can determine that the value being divided by is constant more often (e.g. after inlining). Change-Id: If1a9b993edd71be37396b9167f77da271966f85f Reviewed-on: https://go-review.googlesource.com/37015 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2017-02-17 06:16:44 +00:00
Lynn Boger	695f12c21a	cmd/compile: rules change to use ANDN more effectively on ppc64x Currently there are cases where an XOR with -1 followed by an AND is generanted when it could be done with just an ANDN instruction. Changes to PPC64.rules and required files allows this change in generated code. Examples of this occur in sha3 among others. Fixes: #18918 Change-Id: I647cb9b4a4aaeebb27db85f8bf75487d78f720c9 Reviewed-on: https://go-review.googlesource.com/36218 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>	2017-02-09 18:57:19 +00:00
Michael Munday	ddf807fce8	cmd/compile: fix type propagation through s390x SSA rules This CL fixes two issues: 1. Load ops were initially always lowered to unsigned loads, even for signed types. This was fine by itself however LoadReg ops (used to re-load spilled values) were lowered to signed loads for signed types. This meant that spills could invalidate optimizations that assumed the original unsigned load. 2. Types were not always being maintained correctly through rules designed to eliminate unnecessary zero and sign extensions. Fixes #18906. Change-Id: I95785dcadba03f7e3e94524677e7d8d3d3b9b737 Reviewed-on: https://go-review.googlesource.com/36256 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-02-03 21:27:21 +00:00
Cherry Zhang	fddc004537	cmd/compile: remove nil check for Zero/Move on 386, AMD64, S390X Fixes #18003. Change-Id: Iadcc5c424c64badecfb5fdbd4dbd9197df56182c Reviewed-on: https://go-review.googlesource.com/33421 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-02-02 21:28:38 +00:00
Keith Randall	01c8719f8b	cmd/compile: move rotate instruction generation to SSA Remove rotate generation from walk. Remove OLROT and ssa.Lrot* opcodes. Generate rotates during SSA lowering for architectures that have them. This CL will allow rotates to be generated in more situations, like when the shift values are determined to be constant only after some analysis. Fixes #18254 Change-Id: I8d6d684ff5ce2511aceaddfda98b908007851079 Reviewed-on: https://go-review.googlesource.com/34232 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-02-02 17:57:15 +00:00
Vladimir Stefanovic	247fc4a98e	cmd/compile/internal/ssa: add support for GOARCH=mips{,le} Change-Id: I632d4aef7295778ba5018d98bcb06a68bcf07ce1 Reviewed-on: https://go-review.googlesource.com/31478 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-11-08 19:40:43 +00:00
Keith Randall	741445068f	cmd/compile: make [0]T and [1]T SSAable types We used to have to keep on-stack copies of these types. Now they can be registerized. [0]T is kind of trivial but might as well handle it. This change enables another change I'm working on to improve how x.(T) expressions are handled (#17405). This CL helps because now all types that are direct interface types are registerizeable (e.g. [1]*byte). No higher-degree arrays for now because non-constant indexes are hard. Update #17405 Change-Id: I2399940965d17b3969ae66f6fe447a8cefdd6edd Reviewed-on: https://go-review.googlesource.com/32416 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-10-31 19:44:19 +00:00
Austin Clements	8a7f0ad0b5	cmd/compile: use typedmemclr for zeroing if there are pointers Currently, zeroing generates an ssa.OpZero, which never has write barriers, even if the assignment is an OASWB. The hybrid barrier requires write barriers on zeroing, so change OASWB to generate an ssa.OpZeroWB when assigning the zero value, which turns into a typedmemclr. Updates #17503. Change-Id: Ib37ac5e39f578447dbd6b36a6a54117d5624784d Reviewed-on: https://go-review.googlesource.com/31451 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-10-28 19:13:23 +00:00
Cherry Zhang	f9238a76ff	cmd/compile: make LR allocatable in non-leaf functions on ARM The mechanism is initially introduced (and reviewed) in CL 30597 on S390X. Reduce number of "spilled value remains" by 0.4% in cmd/go. Disabled on ARMv5 because LR is clobbered almost everywhere with inserted softfloat calls. Change-Id: I2934737ce2455909647ed2118fe2bd6f0aa5ac52 Reviewed-on: https://go-review.googlesource.com/32178 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-10-28 14:25:33 +00:00
Keith Randall	deb4177cf0	cmd/compile: use masks instead of branches for slicing When we do var x []byte = ... y := x[i:] We can't just use y.ptr = x.ptr + i, as the new pointer may point to the next object in memory after the backing array. We used to fix this by doing: y.cap = x.cap - i delta := i if y.cap == 0 { delta = 0 } y.ptr = x.ptr + delta That generates a branch in what is otherwise straight-line code. Better to do: y.cap = x.cap - i mask := (y.cap - 1) >> 63 // -1 if y.cap==0, 0 otherwise y.ptr = x.ptr + i &^ mask It's about the same number of instructions (~4, depending on what parts are constant, and the target architecture), but it is all inline. It plays nicely with CSE, and the mask can be computed in parallel with the index (in cases where a multiply is required). It is a minor win in both speed and space. Change-Id: Ied60465a0b8abb683c02208402e5bb7ac0e8370f Reviewed-on: https://go-review.googlesource.com/32022 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-10-27 20:22:49 +00:00
Cherry Zhang	4f6d479186	cmd/compile: make LR allocatable in non-leaf functions on MIPS64 The mechanism is initially introduced (and reviewed) in CL 30597 on S390X. Change-Id: I83024d2fc84c8efc23fbda52b3ad83073f42cb93 Reviewed-on: https://go-review.googlesource.com/32179 Reviewed-by: David Chase <drchase@google.com>	2016-10-27 15:35:20 +00:00
Cherry Zhang	5c59cb4aa3	cmd/compile: make LR allocatable in non-leaf functions on ARM64 The mechanism is initially introduced (and reviewed) in CL 30597 on S390X. Change-Id: I12fbe6e9269b2936690e0ec896cb6b5aa40ad7da Reviewed-on: https://go-review.googlesource.com/32180 Reviewed-by: David Chase <drchase@google.com>	2016-10-27 15:35:06 +00:00
Cherry Zhang	f6aec889e1	cmd/compile: add a writebarrier phase in SSA When the compiler insert write barriers, the frontend makes conservative decisions at an early stage. This may have false positives which result in write barriers for stack writes. A new phase, writebarrier, is added to the SSA backend, to delay the decision and eliminate false positives. The frontend still makes conservative decisions. When building SSA, instead of emitting runtime calls directly, it emits WB ops (StoreWB, MoveWB, etc.), which will be expanded to branches and runtime calls in writebarrier phase. Writes to static locations on stack are detected and write barriers are removed. All write barriers of stack writes found by the script from issue #17330 are eliminated (except two false positives). Fixes #17330. Change-Id: I9bd66333da9d0ceb64dcaa3c6f33502798d1a0f8 Reviewed-on: https://go-review.googlesource.com/31131 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2016-10-25 21:53:40 +00:00
Michael Munday	517a44d57e	cmd/compile: intrinsify atomic operations on s390x Implements the following intrinsics on s390x: - AtomicAdd{32,64} - AtomicCompareAndSwap{32,64} - AtomicExchange{32,64} - AtomicLoad{32,64,Ptr} - AtomicStore{32,64,PtrNoWB} I haven't added rules for And8 or Or8 yet. Change-Id: I647af023a8e513718e90e98a60191e7af6167314 Reviewed-on: https://go-review.googlesource.com/31614 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-10-25 12:23:49 +00:00
Michael Munday	1cfb5c3fd5	cmd/compile: merge loads into operations on s390x Adds the new canMergeLoad function which can be used by rules to decide whether a load can be merged into an operation. The function ensures that the merge will not reorder the load relative to memory operations (for example, stores) in such a way that the block can no longer be scheduled. This new function enables transformations such as: MOVD 0(R1), R2 ADD R2, R3 to: ADD 0(R1), R3 The two-operand form of the following instructions can now read a single memory operand: - ADD - ADDC - ADDW - MULLD - MULLW - SUB - SUBC - SUBE - SUBW - AND - ANDW - OR - ORW - XOR - XORW Improves SHA3 performance by 6-8%. Updates #15054. Change-Id: Ibcb9122126cd1a26f2c01c0dfdbb42fe5e7b5b94 Reviewed-on: https://go-review.googlesource.com/29272 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-10-17 19:45:20 +00:00
Michael Munday	15817e409b	cmd/compile: make link register allocatable in non-leaf functions We save and restore the link register in non-leaf functions because it is clobbered by CALLs. It is therefore available for general purpose use. Only enabled on s390x currently. The RC4 benchmarks in particular benefit from the extra register: name old speed new speed delta RC4_128 243MB/s ± 2% 341MB/s ± 2% +40.46% (p=0.008 n=5+5) RC4_1K 267MB/s ± 0% 359MB/s ± 1% +34.32% (p=0.008 n=5+5) RC4_8K 271MB/s ± 0% 362MB/s ± 0% +33.61% (p=0.008 n=5+5) Change-Id: Id23bff95e771da9425353da2f32668b8e34ba09f Reviewed-on: https://go-review.googlesource.com/30597 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-10-11 18:52:35 +00:00
Cherry Zhang	2756d56c89	cmd/compile: intrinsify math/big.mulWW, divWW on AMD64 Change-Id: I59f7afa7a5803d19f8b21fe70fc85ef997bb3a85 Reviewed-on: https://go-review.googlesource.com/30542 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-10-11 16:07:46 +00:00
David Chase	2f0b8f88df	cmd/compile: PPC64, elide unnecessary sign extension Inputs to store[BHW] and cmpW(U) need not be correct in more bits than are used by the instruction. Added a pattern tailored to what appears to be cgo boilerplate. Added a pattern (also seen in cgo boilerplate and hashing) to replace {EQ,NE}-CMP-ANDconst with {EQ-NE}-ANDCCconst. Added a pattern to clean up ANDconst shift distance inputs (this was seen in hashing). Simplify repeated and,or,xor. Fixes #17109. Change-Id: I68eac83e3e614d69ffe473a08953048c8b066d88 Reviewed-on: https://go-review.googlesource.com/30455 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-10-10 12:22:40 +00:00
Michael Munday	45b26a93f3	cmd/{asm,compile}: replace TESTB op with CMPWconst on s390x TESTB was implemented as AND $0xff, Rx, REGTMP. Unfortunately there is no 3-operand AND-with-immediate instruction and so it was emulated by the assembler using two instructions. This CL uses CMPW instead of AND and also optimizes CMPW to use the chi instruction where possible. Overall this CL reduces the size of the .text section of the bin/go binary by ~2%. Change-Id: Ic335c29fc1129378fcbb1265bfb10f5b744a0f3f Reviewed-on: https://go-review.googlesource.com/30690 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-10-07 20:02:59 +00:00
Michael Munday	dd1dcf9496	cmd/{asm,compile}: add ANDW, ORW and XORW instructions to s390x Adds the following instructions and uses them in the SSA backend: - ANDW - ORW - XORW The instruction encodings for 32-bit operations are typically shorter, particularly when an immediate is used. For example, XORW $-1, R1 only requires one instruction, whereas XOR requires two. Also removes some unused instructions (that were emulated): - ANDN - NAND - ORN - NOR Change-Id: Iff2a16f52004ba498720034e354be9771b10cac4 Reviewed-on: https://go-review.googlesource.com/30291 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-10-06 02:59:04 +00:00
Cherry Zhang	b662e524e4	cmd/compile: use CBZ/CBNZ instrunctions on ARM64 These are conditional branches that takes a register instead of flags as control value. Reduce binary size by 0.7%, text size by 2.4% (cmd/go as an exmaple). Change-Id: I0020cfde745f9eab680b8b949ad28c87fe183afd Reviewed-on: https://go-review.googlesource.com/30030 Reviewed-by: David Chase <drchase@google.com>	2016-10-05 18:22:56 +00:00
Matthew Dempsky	c28f55c502	cmd/compile/internal/ssa: add Op.UsesScratch method Passes toolstash/buildall. Change-Id: I928a2ef39fb10091957f35bb3f1564498f6f1b83 Reviewed-on: https://go-review.googlesource.com/30312 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-10-04 20:56:56 +00:00
Michael Munday	962dc4b44d	cmd/compile: improve load/store merging on s390x This commit makes the process of load/store merging more incremental for both big and little endian operations. It also adds support for 32-bit shifts (needed to merge 16- and 32-bit loads/stores). In addition, the merging of little endian stores is now supported. Little endian stores are now up to 30 times faster. Change-Id: Iefdd81eda4a65b335f23c3ff222146540083ad9c Reviewed-on: https://go-review.googlesource.com/29956 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-09-30 14:41:43 +00:00
Keith Randall	98938189a1	cmd/compile: remove duplicate nilchecks Mark nil check operations as faulting if their arg is zero. This lets the late nilcheck pass remove duplicates. Fixes #17242. Change-Id: I4c9938d8a5a1e43edd85b4a66f0b34004860bcd9 Reviewed-on: https://go-review.googlesource.com/29952 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-09-27 23:54:01 +00:00
Cherry Zhang	9d4b40f55d	runtime, cmd/compile: implement and use DUFFCOPY on ARM64 Change-Id: I8984eac30e5df78d4b94f19412135d3cc36969f8 Reviewed-on: https://go-review.googlesource.com/29910 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-09-27 15:07:31 +00:00

1 2 3 4 5

225 commits