Stowage/go - Remotebranch.eu

Stowage/go

mirror of https://github.com/golang/go.git synced 2025-12-08 06:10:04 +00:00

Author	SHA1	Message	Date
Ilya Tocar	ac29f4d01c	cmd/compile/internal/amd64: add ADD[Q\|L]constmem We can add a constant to loaction in memory with 1 instruction, as opposed to load+add+store, so add a new op and relevent ssa rules. Triggers in e. g. encoding/json isValidNumber: NumberIsValid-6 36.4ns ± 0% 35.2ns ± 1% -3.32% (p=0.000 n=6+10) Shaves ~2.5 kb from go tool. Change-Id: I7ba576676c2522432360f77b290cecb9574a93c3 Reviewed-on: https://go-review.googlesource.com/54431 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-18 18:55:44 +00:00
Ilya Tocar	df70982825	cmd/compile/internal/ssa: use sse to zero on amd64 Use 16-byte stores instead of 8-byte stores to zero small blocks. Also switch to duffzero for 65+ bytes only, because for each duffzero call we also save/restore BP, so call requires 4 instructions and replacing it with 4 sse stores doesn't cause code-bloat. Also switch duffzero to use leaq, instead of addq to avoid clobbering flags. ClearFat8-6 0.54ns ± 0% 0.54ns ± 0% ~ (all equal) ClearFat12-6 1.07ns ± 0% 1.07ns ± 0% ~ (all equal) ClearFat16-6 1.07ns ± 0% 0.69ns ± 0% -35.51% (p=0.001 n=8+9) ClearFat24-6 1.61ns ± 1% 1.07ns ± 0% -33.33% (p=0.000 n=10+10) ClearFat32-6 2.14ns ± 0% 1.07ns ± 0% -50.00% (p=0.001 n=8+9) ClearFat40-6 2.67ns ± 1% 1.61ns ± 0% -39.72% (p=0.000 n=10+8) ClearFat48-6 3.75ns ± 0% 2.68ns ± 0% -28.59% (p=0.000 n=9+9) ClearFat56-6 4.29ns ± 0% 3.22ns ± 0% -25.10% (p=0.000 n=9+9) ClearFat64-6 4.30ns ± 0% 3.22ns ± 0% -25.15% (p=0.000 n=8+8) ClearFat128-6 7.50ns ± 1% 7.51ns ± 0% ~ (p=0.767 n=10+9) ClearFat256-6 13.9ns ± 1% 13.9ns ± 1% ~ (p=0.257 n=10+10) ClearFat512-6 26.8ns ± 0% 26.8ns ± 0% ~ (p=0.467 n=8+8) ClearFat1024-6 52.5ns ± 0% 52.5ns ± 0% ~ (p=1.000 n=8+8) Also shaves ~20kb from go tool: go_old 10384994 go_new 10364514 [-20480 bytes] section differences global text (code) = -20585 bytes (-0.532047%) read-only data = -302 bytes (-0.018101%) Total difference -20887 bytes (-0.348731%) Change-Id: I15854e87544545c1af24775df895e38e16e12694 Reviewed-on: https://go-review.googlesource.com/54410 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-16 15:52:27 +00:00
Alberto Donizetti	d3c0af7a0a	cmd/compile: fix ADDSDmem comment and order in list ADDSDmem comment said f32 (likely a copy-paste mistake). Also swap ADDSSmem and ADDSDmem positions in the list to uniform the list order. Fixes #21225 Change-Id: I26bb116900c1cf4c4e6faeef613d7318c9c85b98 Reviewed-on: https://go-review.googlesource.com/52071 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-08 09:21:25 +00:00
Jamie Kerr	0593ad1e23	cmd/compile: fix comment typo Change-Id: If581bd4e9d9b4421e2ae20582c596fccb73d9aed Reviewed-on: https://go-review.googlesource.com/48866 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-07-15 18:36:07 +00:00
Keith Randall	256210c719	cmd/compile: better check for single live memory Enhance the one-live-memory-at-a-time check to run during many more phases of the SSA backend. Also make it work in an interblock fashion. Change types.IsMemory to return true for tuples containing a memory type. Fix trim pass to build the merged phi correctly. Doesn't affect code but allows the check to pass after trim runs. Switch the AddTuple* ops to take the memory-containing tuple argument second. Update #20335 Change-Id: I5b03ef3606b75a9e4f765276bb8b183cdc172b43 Reviewed-on: https://go-review.googlesource.com/43495 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-05-15 19:17:35 +00:00
Keith Randall	1e72bf6218	cmd/compile: experiment which clobbers all dead pointer fields The experiment "clobberdead" clobbers all pointer fields that the compiler thinks are dead, just before and after every safepoint. Useful for debugging the generation of live pointer bitmaps. Helped find the following issues: Update #15936 Update #16026 Update #16095 Update #18860 Change-Id: Id1d12f86845e3d93bae903d968b1eac61fc461f9 Reviewed-on: https://go-review.googlesource.com/23924 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-04-21 20:19:50 +00:00
Keith Randall	7e07e635f3	cmd/compile: implement non-constant rotates Makes math/bits.Rotate{Left,Right} fast on amd64. name old time/op new time/op delta RotateLeft-12 7.42ns ± 6% 5.45ns ± 6% -26.54% (p=0.000 n=9+10) RotateLeft8-12 4.77ns ± 5% 3.42ns ± 7% -28.25% (p=0.000 n=8+10) RotateLeft16-12 4.82ns ± 8% 3.40ns ± 7% -29.36% (p=0.000 n=10+10) RotateLeft32-12 4.87ns ± 7% 3.48ns ± 7% -28.51% (p=0.000 n=8+9) RotateLeft64-12 5.23ns ±10% 3.35ns ± 6% -35.97% (p=0.000 n=9+10) RotateRight-12 7.59ns ± 8% 5.71ns ± 1% -24.72% (p=0.000 n=10+8) RotateRight8-12 4.98ns ± 7% 3.36ns ± 9% -32.55% (p=0.000 n=10+10) RotateRight16-12 5.12ns ± 2% 3.45ns ± 5% -32.62% (p=0.000 n=10+10) RotateRight32-12 4.80ns ± 6% 3.42ns ±16% -28.68% (p=0.000 n=10+10) RotateRight64-12 4.78ns ± 6% 3.42ns ± 6% -28.50% (p=0.000 n=10+10) Update #18940 Change-Id: Ie79fb5581c489ed4d3b859314c5e669a134c119b Reviewed-on: https://go-review.googlesource.com/39711 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2017-04-17 23:19:45 +00:00
Keith Randall	5cadc91b3c	cmd/compile: intrinsics for math/bits.OnesCount Popcount instructions on amd64 are not guaranteed to be present, so we must guard their call. Rewrite rules can't generate control flow at the moment, so the intrinsifier needs to generate that code. name old time/op new time/op delta OnesCount-8 2.47ns ± 5% 1.04ns ± 2% -57.70% (p=0.000 n=10+10) OnesCount16-8 1.05ns ± 1% 0.78ns ± 0% -25.56% (p=0.000 n=9+8) OnesCount32-8 1.63ns ± 5% 1.04ns ± 2% -35.96% (p=0.000 n=10+10) OnesCount64-8 2.45ns ± 0% 1.04ns ± 1% -57.55% (p=0.000 n=6+10) Update #18616 Change-Id: I4aff2cc9aa93787898d7b22055fe272a7cf95673 Reviewed-on: https://go-review.googlesource.com/38320 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-04-04 02:40:11 +00:00
Keith Randall	53f8a6aeb0	cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2017-04-03 22:03:43 +00:00
Keith Randall	68da265c8e	Revert "cmd/compile: automatically handle commuting ops in rewrite rules" This reverts commit `041ecb697f`. Reason for revert: Not working on S390x and some 386 archs. I have a guess why the S390x is failing. No clue on the 386 yet. Revert until I can figure it out. Change-Id: I64f1ce78fa6d1037ebe7ee2a8a8107cb4c1db70c Reviewed-on: https://go-review.googlesource.com/38790 Reviewed-by: Keith Randall <khr@golang.org>	2017-03-29 18:06:44 +00:00
Keith Randall	041ecb697f	cmd/compile: automatically handle commuting ops in rewrite rules We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. [Note to reviewers: check these carefully. Most of the other rule changes are trivial.] Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: I999b1307272e91965b66754576019dedcbe7527a Reviewed-on: https://go-review.googlesource.com/38666 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2017-03-29 16:22:09 +00:00
Keith Randall	495b167919	cmd/compile: intrinsics for math/bits.{Len,LeadingZeros} name old time/op new time/op delta LeadingZeros-4 2.00ns ± 0% 1.34ns ± 1% -33.02% (p=0.000 n=8+10) LeadingZeros16-4 1.62ns ± 0% 1.57ns ± 0% -3.09% (p=0.001 n=8+9) LeadingZeros32-4 2.14ns ± 0% 1.48ns ± 0% -30.84% (p=0.002 n=8+10) LeadingZeros64-4 2.06ns ± 1% 1.33ns ± 0% -35.08% (p=0.000 n=8+8) 8-bit args is a special case - the Go code is really fast because it is just a single table lookup. So I've disabled that for now. Intrinsics were actually slower: LeadingZeros8-4 1.22ns ± 3% 1.58ns ± 1% +29.56% (p=0.000 n=10+10) Update #18616 Change-Id: Ia9c289b9ba59c583ea64060470315fd637e814cf Reviewed-on: https://go-review.googlesource.com/38311 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-03-16 22:53:49 +00:00
Matthew Dempsky	691755304c	cmd/compile/internal/ssa: populate SymEffects for SSA Ops Changes to ${GOARCH}Ops.go files were mechanically produced using github.com/mdempsky/ssa-symops, a one-off tool that inserts "SymEffect: X" elements by pattern matching against the Op names. Change-Id: Ibf3e481ffd588647f2a31662d72114b740ccbfcf Reviewed-on: https://go-review.googlesource.com/38084 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-14 18:34:45 +00:00
Matthew Dempsky	08d8d5c986	cmd/compile/internal/ssa: replace {Defer,Go}Call with StaticCall Passes toolstash-check -all. Change-Id: Icf8b75364e4761a5e56567f503b2c1cb17382ed2 Reviewed-on: https://go-review.googlesource.com/38080 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-13 19:44:36 +00:00
Matthew Dempsky	02e36f8c87	cmd/compile/internal/ssa: remove Hmul{8,16}{,u} ops Change-Id: I90865921584ae4bdfb6c220d439b14593d72b6f9 Reviewed-on: https://go-review.googlesource.com/37752 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-03-03 20:47:36 +00:00
Josh Bleecher Snyder	2183135554	cmd/compile: recognize bit test patterns on amd64 Updates #18943 Change-Id: If3080d6133bb6d2710b57294da24c90251ab4e08 Reviewed-on: https://go-review.googlesource.com/36329 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-01 00:36:04 +00:00
Keith Randall	0fe58bf650	cmd/compile: simplify load+op rules There's no need to use @block rules, as canMergeLoad makes sure that the load and op are already in the same block. With no @block needed, we also don't need to set the type explicitly. It can just be inherited from the op being rewritten. Noticed while working on #19284. Change-Id: Ied8bcc8058260118ff7e166093112e29107bcb7e Reviewed-on: https://go-review.googlesource.com/37585 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-02-28 22:16:23 +00:00
David Chase	11b283092a	cmd/compile: add opcode flag hasSideEffects for do-not-remove Added a flag to generic and various architectures' atomic operations that are judged to have observable side effects and thus cannot be dead-code-eliminated. Test requires GOMAXPROCS > 1 without preemption in loop. Fixes #19182. Change-Id: Id2230031abd2cca0bbb32fd68fc8a58fb912070f Reviewed-on: https://go-review.googlesource.com/37333 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-02-22 15:15:47 +00:00
Keith Randall	cfb0d34992	cmd/compile: amd64, allow XCHG on stack pointers XCHG needs to allow the stack pointer as an argument because we have a rewrite that incorporates the address of a local variable into the instruction. Fixes #19184 Change-Id: Ic438e6e1946332cdce3864d15abecd41b911b2a9 Reviewed-on: https://go-review.googlesource.com/37253 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-02-19 17:16:01 +00:00
Ilya Tocar	21c71d7788	cmd/compile/internal/ssa: combine load + op on AMD64 On AMD64 Most operation can have one operand in memory. Combine load and dependand operation into one new operation, where possible. I've seen no significant performance changes on go1, but this allows to remove ~1.8kb code from go tool. And in math package I see e. g.: Remainder-6 70.0ns ± 0% 64.6ns ± 0% -7.76% (p=0.000 n=9+1 Change-Id: I88b8602b1d55da8ba548a34eb7da4b25d59a297e Reviewed-on: https://go-review.googlesource.com/36793 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-02-17 22:21:49 +00:00
Cherry Zhang	fddc004537	cmd/compile: remove nil check for Zero/Move on 386, AMD64, S390X Fixes #18003. Change-Id: Iadcc5c424c64badecfb5fdbd4dbd9197df56182c Reviewed-on: https://go-review.googlesource.com/33421 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-02-02 21:28:38 +00:00
Keith Randall	01c8719f8b	cmd/compile: move rotate instruction generation to SSA Remove rotate generation from walk. Remove OLROT and ssa.Lrot* opcodes. Generate rotates during SSA lowering for architectures that have them. This CL will allow rotates to be generated in more situations, like when the shift values are determined to be constant only after some analysis. Fixes #18254 Change-Id: I8d6d684ff5ce2511aceaddfda98b908007851079 Reviewed-on: https://go-review.googlesource.com/34232 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-02-02 17:57:15 +00:00
Michael Munday	15817e409b	cmd/compile: make link register allocatable in non-leaf functions We save and restore the link register in non-leaf functions because it is clobbered by CALLs. It is therefore available for general purpose use. Only enabled on s390x currently. The RC4 benchmarks in particular benefit from the extra register: name old speed new speed delta RC4_128 243MB/s ± 2% 341MB/s ± 2% +40.46% (p=0.008 n=5+5) RC4_1K 267MB/s ± 0% 359MB/s ± 1% +34.32% (p=0.008 n=5+5) RC4_8K 271MB/s ± 0% 362MB/s ± 0% +33.61% (p=0.008 n=5+5) Change-Id: Id23bff95e771da9425353da2f32668b8e34ba09f Reviewed-on: https://go-review.googlesource.com/30597 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-10-11 18:52:35 +00:00
Cherry Zhang	2756d56c89	cmd/compile: intrinsify math/big.mulWW, divWW on AMD64 Change-Id: I59f7afa7a5803d19f8b21fe70fc85ef997bb3a85 Reviewed-on: https://go-review.googlesource.com/30542 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-10-11 16:07:46 +00:00
Keith Randall	98938189a1	cmd/compile: remove duplicate nilchecks Mark nil check operations as faulting if their arg is zero. This lets the late nilcheck pass remove duplicates. Fixes #17242. Change-Id: I4c9938d8a5a1e43edd85b4a66f0b34004860bcd9 Reviewed-on: https://go-review.googlesource.com/29952 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-09-27 23:54:01 +00:00
Keith Randall	3134ab3c2d	cmd/compile: redo nil checks Get rid of BlockCheck. Josh goaded me into it, and I went down a rabbithole making it happen. NilCheck now panics if the pointer is nil and returns void, as before. BlockCheck is gone, and NilCheck is no longer a Control value for any block. It just exists (and deadcode knows not to throw it away). I rewrote the nilcheckelim pass to handle this case. In particular, there can now be multiple NilCheck ops per block. I moved all of the arch-dependent nil check elimination done as part of ssaGenValue into its own proper pass, so we don't have to duplicate that code for every architecture. Making the arch-dependent nil check its own pass means I needed to add a bunch of flags to the opcode table so I could write the code without arch-dependent ops everywhere. Change-Id: I419f891ac9b0de313033ff09115c374163416a9f Reviewed-on: https://go-review.googlesource.com/29120 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-09-15 02:42:13 +00:00
Keith Randall	c345a3913f	cmd/compile: get rid of BlockCall No need for it, we can treat calls as (mostly) normal values that take a memory and return a memory. Lowers the number of basic blocks needed to represent a function. "go test -c net/http" uses 27% fewer basic blocks. Probably doesn't affect generated code much, but should help various passes whose running time and/or space depends on the number of basic blocks. Fixes #15631 Change-Id: I0bf21e123f835e2cfa382753955a4f8bce03dfa6 Reviewed-on: https://go-review.googlesource.com/28950 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2016-09-12 23:27:02 +00:00
Cherry Zhang	f1ef5a06d2	cmd/compile: mark some AMD64 atomic ops as clobberFlags Fixes #16985. Change-Id: I5954db28f7b70dd3ac7768e471d5df871a5b20f9 Reviewed-on: https://go-review.googlesource.com/28510 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-09-06 14:26:18 +00:00
Keith Randall	cc0248aea5	cmd/compile: don't reserve X15 for float sub/div any more We used to reserve X15 to implement the 3-operand floating-point sub/div ops with the 2-operand sub/div that 386/amd64 gives us. Now that resultInArg0 is implemented, we no longer need to reserve X15 (X7 on 386). Fixes #15584 Change-Id: I978e6c0a35236e89641bfc027538cede66004e82 Reviewed-on: https://go-review.googlesource.com/28272 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-08-31 20:35:49 +00:00
Keith Randall	0c6c3d1de7	cmd/compile: fix noopt build Atomic add rules were depending on CSE to combine duplicate atomic ops. With -N, CSE doesn't run. Redo the rules for atomic add so there's only one atomic op. Introduce an add-to-first-part-of-tuple pseudo-ops to make the atomic add result correct. Change-Id: Ib132247051abe5f80fefad6c197db8df8ee06427 Reviewed-on: https://go-review.googlesource.com/27991 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-08-28 18:54:52 +00:00
Keith Randall	84aac622a4	cmd/compile: intrinsify the rest of runtime/internal/atomic for amd64 Atomic swap, add/and/or, compare and swap. Also works on amd64p32. Change-Id: Idf2d8f3e1255f71deba759e6e75e293afe4ab2ba Reviewed-on: https://go-review.googlesource.com/27813 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-08-28 16:31:08 +00:00
Keith Randall	320ddcf834	cmd/compile: inline atomics from runtime/internal/atomic on amd64 Inline atomic reads and writes on amd64. There's no reason to pay the overhead of a call for these. To keep atomic loads from being reordered, we make them return a <value,memory> tuple. Change the meaning of resultInArg0 for tuple-generating ops to mean the first part of the result tuple, not the second. This means we can always put the store part of the tuple last, matching how arguments are laid out. This requires reordering the outputs of add32carry and sub32carry and their descendents in various architectures. benchmark old ns/op new ns/op delta BenchmarkAtomicLoad64-8 2.09 0.26 -87.56% BenchmarkAtomicStore64-8 7.54 5.72 -24.14% TBD (in a different CL): Cas, Or8, ... Change-Id: I713ea88e7da3026c44ea5bdb56ed094b20bc5207 Reviewed-on: https://go-review.googlesource.com/27641 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-08-25 20:09:04 +00:00
Keith Randall	3e270ab80b	cmd/compile: clean up ctz ops Now that we have ops that can return 2 results, have BSF return a result and flags. We can then get rid of the redundant comparison and use CMOV instead of CMOVconst ops. Get rid of a bunch of the ops we don't use. Ctz{8,16}, plus all the Clzs, and CMOVNEs. I don't think we'll ever use them, and they would be easy to add back if needed. Change-Id: I8858a1d017903474ea7e4002fc76a6a86e7bd487 Reviewed-on: https://go-review.googlesource.com/27630 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-08-23 23:45:12 +00:00
Keith Randall	5ae8230769	cmd/compile: use shorter versions of zero-extend ops Only need to zero-extend to 32 bits and we get the top 32 bits zeroed for free. Only the WQ change actually generates different code. The assembler did this optimization for us in the other two cases. But we might as well do it during SSA so -S output more closely matches the actual generated instructions. Change-Id: I3e4ac50dc4da124014d4e31c86e9fc539d94f7fd Reviewed-on: https://go-review.googlesource.com/23711 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2016-08-16 21:32:21 +00:00
Keith Randall	69a755b602	[dev.ssa] cmd/compile: port SSA backend to amd64p32 It's not a new backend, just a PtrSize==4 modification of the existing AMD64 backend. Change-Id: Icc63521a5cf4ebb379f7430ef3f070894c09afda Reviewed-on: https://go-review.googlesource.com/25586 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-08-09 15:48:26 +00:00
Cherry Zhang	0484052358	[dev.ssa] cmd/compile: remove flags from regMask Reg allocator skips flag-typed values. Flag allocator uses the type and whether the op has "clobberFlags" set. Tested on AMD64, ARM, ARM64, 386. Passed 'toolstash -cmp' on AMD64. PPC64 is coded blindly. Change-Id: Ib1cc27efecef6a1bb27f7d7ed035a582660d244f Reviewed-on: https://go-review.googlesource.com/25480 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-08-07 03:08:03 +00:00
Keith Randall	d2286ea284	[dev.ssa] Merge remote-tracking branch 'origin/master' into mergebranch Semi-regular merge from tip into dev.ssa. Change-Id: Iadb60e594ef65a99c0e1404b14205fa67c32a9e9	2016-08-04 10:08:20 -07:00
Cherry Zhang	111d590f86	cmd/compile: fix possible spill of invalid pointer with DUFFZERO on AMD64 SSA compiler on AMD64 may spill Duff-adjusted address as scalar. If the object is on stack and the stack moves, the spilled address become invalid. Making the spill pointer-typed does not work. The Duff-adjusted address points to the memory before the area to be zeroed and may be invalid. This may cause stack scanning code panic. Fix it by doing Duff-adjustment in genValue, so the intermediate value is not seen by the reg allocator, and will not be spilled. Add a test to cover both cases. As it depends on allocation, it may be not always triggered. Fixes #16515. Change-Id: Ia81d60204782de7405b7046165ad063384ede0db Reviewed-on: https://go-review.googlesource.com/25309 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-07-29 01:09:55 +00:00
Keith Randall	1b0404c4ca	[dev.ssa] cmd/compile: fix verbose typing of DIV Use Cherry's awesome pair type constructor. Change-Id: I282156a570ee4dd3548bd82fbf15b8d8eb5bedf6 Reviewed-on: https://go-review.googlesource.com/25009 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2016-07-18 21:13:15 +00:00
Keith Randall	aee8d8b9dd	[dev.ssa] cmd/compile: implement more 64-bit ops on 386 add/sub/mul, plus constant input variants. Change-Id: I1c8006727c4fdf73558da0e646e7d1fa130ed773 Reviewed-on: https://go-review.googlesource.com/25006 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-07-18 19:52:28 +00:00
Keith Randall	cf92e3845f	[dev.ssa] cmd/compile: use 2-result divide op We now allow Values to have 2 outputs. Use that ability for amd64. This allows x,y := a/b,a%b to use just a single divide instruction. Update #6815 Change-Id: Id70bcd20188a2dd8445e631a11d11f60991921e4 Reviewed-on: https://go-review.googlesource.com/25004 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: David Chase <drchase@google.com>	2016-07-18 19:41:05 +00:00
Cherry Zhang	90883091ff	[dev.ssa] cmd/compile: clean up hardcoded regmasks in ssa/regalloc.go Auto-generate register masks and load them through Config. Passed toolstash -cmp on AMD64. Tests phi_ssa.go and regalloc_ssa.go in cmd/compile/internal/gc/testdata passed on ARM. Updates #15365. Change-Id: I393924d68067f2dbb13dab82e569fb452c986593 Reviewed-on: https://go-review.googlesource.com/23292 Reviewed-by: David Chase <drchase@google.com>	2016-06-02 13:01:44 +00:00
Cherry Zhang	ccaed50c7b	[dev.ssa] cmd/compile: handle boolean values for SSA on ARM Fix hardcoded flag register mask in ssa/flagalloc.go by auto-generating the mask. Also fix a mistake (in previous CL) about conditional branches. Progress on SSA backend for ARM. Still not complete. Now "container/ring" package compiles and tests passed. Updates #15365. Change-Id: Id7c8805c30dbb8107baedb485ed0f71f59ed6ea8 Reviewed-on: https://go-review.googlesource.com/23093 Reviewed-by: Keith Randall <khr@golang.org>	2016-05-19 02:48:36 +00:00
Keith Randall	e4355aeedf	cmd/compile: more sanity checks on rewrite rules Make sure ops have the right number of args, set aux and auxint only if allowed, etc. Normalize error reporting format. Change-Id: Ie545fcc5990c8c7d62d40d9a0a55885f941eb645 Reviewed-on: https://go-review.googlesource.com/22320 Reviewed-by: David Chase <drchase@google.com>	2016-04-26 18:01:55 +00:00
Keith Randall	9e3c68f1e0	cmd/compile: get rid of most byte and word insns for amd64 Now that we're using 32-bit ops for 8/16-bit logical operations (to avoid partial register stalls), there's really no need to keep track of the 8/16-bit ops at all. Convert everything we can to 32-bit ops. This CL is the obvious stuff. I might think a bit more about whether we can get rid of weirder stuff like HMULWU. The only downside to this CL is that we lose some information about constants. If we had source like: var a byte = ... a += 128 a += 128 We will convert that to a += 256, when we could get rid of the add altogether. This seems like a fairly unusual scenario and I'm happy with forgoing that optimization. Change-Id: Ia7c1e5203d0d110807da69ed646535194a3efba1 Reviewed-on: https://go-review.googlesource.com/22382 Reviewed-by: Todd Neal <todd@tneal.org>	2016-04-23 16:30:27 +00:00
Keith Randall	0004f34cef	cmd/compile: regalloc enforces 2-address instructions Instead of being a hint, resultInArg0 is now enforced by regalloc. This allows us to delete all the code from amd64/ssa.go which deals with converting from a semantically three-address instruction into some copies plus a two-address instruction. Change-Id: Id4f39a80be4b678718bfd42a229f9094ab6ecd7c Reviewed-on: https://go-review.googlesource.com/21816 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2016-04-10 23:20:38 +00:00
Dave Cheney	7208a2cd78	cmd/compile/internal/ssa: hide gen packge from ./make.bash Fixes #15122 Change-Id: Ie2c802d78aea731e25bf4b193b3c2e4c884e0573 Reviewed-on: https://go-review.googlesource.com/21524 Run-TryBot: Dave Cheney <dave@cheney.net> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-05 05:53:15 +00:00
Keith Randall	af517da2f9	cmd/compile: Add more idx1 load/store instructions Helpful for indexed loads and stores when the stride is not equal to the size being loaded/stored. Update #7927 Change-Id: I8714dd4c7b18a96a611bf5647ee21f753d723945 Reviewed-on: https://go-review.googlesource.com/21346 Run-TryBot: Todd Neal <todd@tneal.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Todd Neal <todd@tneal.org>	2016-03-31 17:30:40 +00:00
Keith Randall	7fc5621991	cmd/compile: define high bits of AuxInt Previously if we were only using the low bits of AuxInt, the high bits were ignored and could be junk. This CL changes that behavior to define the high bits to be the sign-extended version of the low bits for all cases. There are 2 main benefits: - Deterministic representation. This helps with CSE. (Const8 [0x1]) and (Const8 [0x101]) used to be the same "value" but CSE couldn't see them as such. - Testability. We can check that all ops leave AuxInt in a state consistent with the new rule. In the old scheme, it was hard to check whether a rule correctly used only the low-order bits. Side benefits: - ==0 and !=0 tests are easier. Drawbacks: - This differs from the runtime representation in registers, where it is important that we allow upper bits to be undefined (so we're not sign/zero-extending all the time). - Ops that treat AuxInt as unsigned (shifts, mostly) need to be a bit more careful. Change-Id: I9a685ff27e36dc03287c9ab1cecd6c0b4045c819 Reviewed-on: https://go-review.googlesource.com/21256 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2016-03-30 04:48:28 +00:00
Matthew Dempsky	da19a0cff4	cmd/compile: fix plan9-amd64 build The previous rules to combine indexed loads produced addresses like: From: obj.Addr{ Type: TYPE_MEM, Reg: REG_CX, Name: NAME_AUTO, Offset: 121, ... } which are erroneous because NAME_AUTO implies a base register of REG_SP, and cmd/internal/obj/x86 makes many assumptions to this effect. Note that previously we were also producing an extra "ADDQ SP, CX" instruction, so indexing off of SP was already handled. The approach taken by this CL to address the problem is to instead produce addresses like: From: obj.Addr{ Type: TYPE_MEM, Reg: REG_SP, Name: NAME_AUTO, Offset: 121, Index: REG_CX, Scale: 1, } and to omit the "ADDQ SP, CX" instruction. Downside to this approach is it requires adding a lot of new MOV[WLQ]loadidx1 instructions that nearly duplicate functionality of the existing MOV[WLQ]loadidx[248] instructions, but with a different Scale. Fixes #15001. Change-Id: Iad9a1a41e5e2552f8d22e3ba975e4ea0862dffd2 Reviewed-on: https://go-review.googlesource.com/21245 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-03-29 03:22:06 +00:00

1 2 3

132 commits