Stowage/go - Remotebranch.eu

Stowage/go

mirror of https://github.com/golang/go.git synced 2025-12-08 06:10:04 +00:00

Author	SHA1	Message	Date
Michael Munday	4745604bcb	cmd/compile: intrinsify math.RoundToEven on s390x The new RoundToEven function can be implemented as a single FIDBR instruction on s390x. name old time/op new time/op delta RoundToEven 5.32ns ± 1% 0.86ns ± 1% -83.86% (p=0.000 n=10+10) Change-Id: Iaf597e57a0d1085961701e3c75ff4f6f6dcebb5f Reviewed-on: https://go-review.googlesource.com/74350 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-10-31 18:04:27 +00:00
Austin Clements	7e343134d3	cmd/compile: compiler support for buffered write barrier This CL implements the compiler support for calling the buffered write barrier added by the previous CL. Since the buffered write barrier is only implemented on amd64 right now, this still supports the old, eager write barrier as well. There's little overhead to supporting both and this way a few tests in test/fixedbugs that expect to have liveness maps at write barrier calls can easily opt-in to the old, eager barrier. This significantly improves the performance of the write barrier: name old time/op new time/op delta WriteBarrier-12 73.5ns ±20% 19.2ns ±27% -73.90% (p=0.000 n=19+18) It also reduces the size of binaries because the write barrier call is more compact: name old object-bytes new object-bytes delta Template 398k ± 0% 393k ± 0% -1.14% (p=0.008 n=5+5) Unicode 208k ± 0% 206k ± 0% -1.00% (p=0.008 n=5+5) GoTypes 1.18M ± 0% 1.15M ± 0% -2.00% (p=0.008 n=5+5) Compiler 4.05M ± 0% 3.88M ± 0% -4.26% (p=0.008 n=5+5) SSA 8.25M ± 0% 8.11M ± 0% -1.59% (p=0.008 n=5+5) Flate 228k ± 0% 224k ± 0% -1.83% (p=0.008 n=5+5) GoParser 295k ± 0% 284k ± 0% -3.62% (p=0.008 n=5+5) Reflect 1.00M ± 0% 0.99M ± 0% -0.70% (p=0.008 n=5+5) Tar 339k ± 0% 333k ± 0% -1.67% (p=0.008 n=5+5) XML 404k ± 0% 395k ± 0% -2.10% (p=0.008 n=5+5) [Geo mean] 704k 690k -2.00% name old exe-bytes new exe-bytes delta HelloSize 1.05M ± 0% 1.04M ± 0% -1.55% (p=0.008 n=5+5) https://perf.golang.org/search?q=upload:20171027.1 (Amusingly, this also reduces compiler allocations by 0.75%, which, combined with the better write barrier, speeds up the compiler overall by 2.10%. See the perf link.) It slightly improves the performance of most of the go1 benchmarks and improves the performance of the x/benchmarks: name old time/op new time/op delta BinaryTree17-12 2.40s ± 1% 2.47s ± 1% +2.69% (p=0.000 n=19+19) Fannkuch11-12 2.95s ± 0% 2.95s ± 0% +0.21% (p=0.000 n=20+19) FmtFprintfEmpty-12 41.8ns ± 4% 41.4ns ± 2% -1.03% (p=0.014 n=20+20) FmtFprintfString-12 68.7ns ± 2% 67.5ns ± 1% -1.75% (p=0.000 n=20+17) FmtFprintfInt-12 79.0ns ± 3% 77.1ns ± 1% -2.40% (p=0.000 n=19+17) FmtFprintfIntInt-12 127ns ± 1% 123ns ± 3% -3.42% (p=0.000 n=20+20) FmtFprintfPrefixedInt-12 152ns ± 1% 150ns ± 1% -1.02% (p=0.000 n=18+17) FmtFprintfFloat-12 211ns ± 1% 209ns ± 0% -0.99% (p=0.000 n=20+16) FmtManyArgs-12 500ns ± 0% 496ns ± 0% -0.73% (p=0.000 n=17+20) GobDecode-12 6.44ms ± 1% 6.53ms ± 0% +1.28% (p=0.000 n=20+19) GobEncode-12 5.46ms ± 0% 5.46ms ± 1% ~ (p=0.550 n=19+20) Gzip-12 220ms ± 1% 216ms ± 0% -1.75% (p=0.000 n=19+19) Gunzip-12 38.8ms ± 0% 38.6ms ± 0% -0.30% (p=0.000 n=18+19) HTTPClientServer-12 79.0µs ± 1% 78.2µs ± 1% -1.01% (p=0.000 n=20+20) JSONEncode-12 11.9ms ± 0% 11.9ms ± 0% -0.29% (p=0.000 n=20+19) JSONDecode-12 52.6ms ± 0% 52.2ms ± 0% -0.68% (p=0.000 n=19+20) Mandelbrot200-12 3.69ms ± 0% 3.68ms ± 0% -0.36% (p=0.000 n=20+20) GoParse-12 3.13ms ± 1% 3.18ms ± 1% +1.67% (p=0.000 n=19+20) RegexpMatchEasy0_32-12 73.2ns ± 1% 72.3ns ± 1% -1.19% (p=0.000 n=19+18) RegexpMatchEasy0_1K-12 241ns ± 0% 239ns ± 0% -0.83% (p=0.000 n=17+16) RegexpMatchEasy1_32-12 68.6ns ± 1% 69.0ns ± 1% +0.47% (p=0.015 n=18+16) RegexpMatchEasy1_1K-12 364ns ± 0% 361ns ± 0% -0.67% (p=0.000 n=16+17) RegexpMatchMedium_32-12 104ns ± 1% 103ns ± 1% -0.79% (p=0.001 n=20+15) RegexpMatchMedium_1K-12 33.8µs ± 3% 34.0µs ± 2% ~ (p=0.267 n=20+19) RegexpMatchHard_32-12 1.64µs ± 1% 1.62µs ± 2% -1.25% (p=0.000 n=19+18) RegexpMatchHard_1K-12 49.2µs ± 0% 48.7µs ± 1% -0.93% (p=0.000 n=19+18) Revcomp-12 391ms ± 5% 396ms ± 7% ~ (p=0.154 n=19+19) Template-12 63.1ms ± 0% 59.5ms ± 0% -5.76% (p=0.000 n=18+19) TimeParse-12 307ns ± 0% 306ns ± 0% -0.39% (p=0.000 n=19+17) TimeFormat-12 325ns ± 0% 323ns ± 0% -0.50% (p=0.000 n=19+19) [Geo mean] 47.3µs 46.9µs -0.67% https://perf.golang.org/search?q=upload:20171026.1 name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.25ms ± 1% 2.20ms ± 1% -2.31% (p=0.000 n=18+18) HTTP-12 12.6µs ± 0% 12.6µs ± 0% -0.72% (p=0.000 n=18+17) JSON-12 11.0ms ± 0% 11.0ms ± 1% -0.68% (p=0.000 n=17+19) https://perf.golang.org/search?q=upload:20171026.2 Updates #14951. Updates #22460. Change-Id: Id4c0932890a1d41020071bec73b8522b1367d3e7 Reviewed-on: https://go-review.googlesource.com/73712 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-10-30 18:12:46 +00:00
Lynn Boger	4d0151ede5	cmd/compile,cmd/internal/obj/ppc64: make math.Abs,math.Copysign instrinsics on ppc64x This adds support for math Abs, Copysign to be instrinsics on ppc64x. New instruction FCPSGN is added to generate fcpsgn. Some new rules are added to improve the int<->float conversions that are generated mainly due to the Float64bits and Float64frombits in the math package. PPC64.rules is also modified as suggested in the review for CL 63290. Improvements: benchmark old ns/op new ns/op delta BenchmarkAbs-16 1.12 0.69 -38.39% BenchmarkCopysign-16 1.30 0.93 -28.46% BenchmarkNextafter32-16 9.34 8.05 -13.81% BenchmarkFrexp-16 8.81 7.60 -13.73% Others that used Copysign also saw smaller improvements. I attempted to make this work using rules since that seems to be preferred, but due to the use of Float64bits and Float64frombits in these functions, several rules had to be added and even then not all cases were matched. Using rules became too complicated and seemed too fragile for these. Updates #21390 Change-Id: Ia265da9a18355e08000818a4fba1a40e9e031995 Reviewed-on: https://go-review.googlesource.com/67130 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Keith Randall <khr@golang.org>	2017-10-30 13:56:39 +00:00
Cherry Zhang	6f3e5e637c	cmd/compile: intrinsify runtime.getcallersp Add a compiler intrinsic for getcallersp. So we are able to get rid of the argument (not done in this CL). Change-Id: Ic38fda1c694f918328659ab44654198fb116668d Reviewed-on: https://go-review.googlesource.com/69350 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: David Chase <drchase@google.com>	2017-10-10 15:15:21 +00:00
David Chase	6cac100eef	cmd/compile: add intrinsic for reading caller's pc First step towards removing the mandatory argument for getcallerpc, which solves certain problems for the runtime. This might also slightly improve performance. Intrinsic enabled on 386, amd64, amd64p32, runtime asm implementation removed on those architectures. Now-superfluous argument remains in getcallerpc signature (for a future CL; non-386/amd64 asm funcs ignore it). Added getcallerpc to the "not a real function" test in dcl.go, that story is a little odd with respect to unexported functions but that is not this CL. Fixes #17327. Change-Id: I5df1ad91f27ee9ac1f0dd88fa48f1329d6306c3e Reviewed-on: https://go-review.googlesource.com/31851 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2017-09-22 18:37:03 +00:00
Michael Munday	7582494e06	cmd/compile: add s390x intrinsics for Ceil, Floor, Round and Trunc Ceil, Floor and Trunc are pre-existing intrinsics. Round is a new function and has been added as an intrinsic in this CL. All of the functions can be implemented as a single 'LOAD FP INTEGER' instruction, FIDBR, on s390x. name old time/op new time/op delta Ceil 2.34ns ± 0% 0.85ns ± 0% -63.74% (p=0.000 n=5+4) Floor 2.33ns ± 0% 0.85ns ± 1% -63.35% (p=0.008 n=5+5) Round 4.23ns ± 0% 0.85ns ± 0% -79.89% (p=0.000 n=5+4) Trunc 2.35ns ± 0% 0.85ns ± 0% -63.83% (p=0.029 n=4+4) Change-Id: Idee7ba24a2899d12bf9afee4eedd6b4aaad3c510 Reviewed-on: https://go-review.googlesource.com/63890 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-09-20 10:01:35 +00:00
Keith Randall	1787ced894	cmd/compile: remove Symbol wrappers from Aux fields We used to have {Arg,Auto,Extern}Symbol structs with which we wrapped a gc.Node or obj.LSym before storing them in the Aux field of an ssa.Value. This let the SSA part of the compiler distinguish between autos and args, for example. We no longer need the wrappers as we can query the underlying objects directly. There was also some sloppy usage, where VarDef had a gc.Node directly in its Aux field, whereas the use of that variable had that gc.Node wrapped in an AutoSymbol. Thus the Aux fields didn't match (using ==) when they probably should. This sloppy usage cleanup is the only thing in the CL that changes the generated code - we can get rid of some more unused auto variables if the matching happens reliably. Removing this wrapper also lets us get rid of the varsyms cache (which was used to prevent wrapping the same *gc.Node twice). Change-Id: I0dedf8f82f84bfee413d310342b777316bd1d478 Reviewed-on: https://go-review.googlesource.com/64452 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2017-09-19 22:03:10 +00:00
Gerrit Code Review	385cd6681b	Merge "Merge remote-tracking branch 'origin/dev.debug' into master"	2017-08-11 17:47:15 +00:00
Lynn Boger	0f19e24da7	cmd/compile: intrinsics for trunc, floor, ceil on ppc64x This implements trunc, floor, and ceil in the math package as intrinsics on ppc64x. Significant improvement mainly due to avoiding call overhead of args and return value. BenchmarkCeil-16 5.95 0.69 -88.40% BenchmarkFloor-16 5.95 0.69 -88.40% BenchmarkTrunc-16 5.82 0.69 -88.14% Updates #21390 Change-Id: I951e182694f6e0c431da79c577272b81fb0ebad0 Reviewed-on: https://go-review.googlesource.com/54654 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: David Chase <drchase@google.com>	2017-08-11 16:35:49 +00:00
Heschi Kreinick	4c54a047c6	[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2017-07-27 20:19:44 +00:00
Marvin Stenger	9aeced650f	cmd/compile/internal/ssa: mark boolean instructions commutative Mark AndB, OrB, EqB, and NeqB as commutative. Change-Id: Ife7cfcb9780cc5dd669617cb52339ab336667da4 Reviewed-on: https://go-review.googlesource.com/42515 Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-05-09 21:31:38 +00:00
Keith Randall	1e72bf6218	cmd/compile: experiment which clobbers all dead pointer fields The experiment "clobberdead" clobbers all pointer fields that the compiler thinks are dead, just before and after every safepoint. Useful for debugging the generation of live pointer bitmaps. Helped find the following issues: Update #15936 Update #16026 Update #16095 Update #18860 Change-Id: Id1d12f86845e3d93bae903d968b1eac61fc461f9 Reviewed-on: https://go-review.googlesource.com/23924 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-04-21 20:19:50 +00:00
Keith Randall	5cadc91b3c	cmd/compile: intrinsics for math/bits.OnesCount Popcount instructions on amd64 are not guaranteed to be present, so we must guard their call. Rewrite rules can't generate control flow at the moment, so the intrinsifier needs to generate that code. name old time/op new time/op delta OnesCount-8 2.47ns ± 5% 1.04ns ± 2% -57.70% (p=0.000 n=10+10) OnesCount16-8 1.05ns ± 1% 0.78ns ± 0% -25.56% (p=0.000 n=9+8) OnesCount32-8 1.63ns ± 5% 1.04ns ± 2% -35.96% (p=0.000 n=10+10) OnesCount64-8 2.45ns ± 0% 1.04ns ± 1% -57.55% (p=0.000 n=6+10) Update #18616 Change-Id: I4aff2cc9aa93787898d7b22055fe272a7cf95673 Reviewed-on: https://go-review.googlesource.com/38320 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-04-04 02:40:11 +00:00
Keith Randall	53f8a6aeb0	cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2017-04-03 22:03:43 +00:00
Keith Randall	68da265c8e	Revert "cmd/compile: automatically handle commuting ops in rewrite rules" This reverts commit `041ecb697f`. Reason for revert: Not working on S390x and some 386 archs. I have a guess why the S390x is failing. No clue on the 386 yet. Revert until I can figure it out. Change-Id: I64f1ce78fa6d1037ebe7ee2a8a8107cb4c1db70c Reviewed-on: https://go-review.googlesource.com/38790 Reviewed-by: Keith Randall <khr@golang.org>	2017-03-29 18:06:44 +00:00
Keith Randall	041ecb697f	cmd/compile: automatically handle commuting ops in rewrite rules We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. [Note to reviewers: check these carefully. Most of the other rule changes are trivial.] Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: I999b1307272e91965b66754576019dedcbe7527a Reviewed-on: https://go-review.googlesource.com/38666 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2017-03-29 16:22:09 +00:00
Keith Randall	42e97468a1	cmd/compile: intrinsic for math/bits.Reverse on ARM64 I don't know that it exists for any other architectures. Update #18616 Change-Id: Idfe5dee251764d32787915889ec0be4bebc5be24 Reviewed-on: https://go-review.googlesource.com/38323 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-03-17 18:07:18 +00:00
Keith Randall	495b167919	cmd/compile: intrinsics for math/bits.{Len,LeadingZeros} name old time/op new time/op delta LeadingZeros-4 2.00ns ± 0% 1.34ns ± 1% -33.02% (p=0.000 n=8+10) LeadingZeros16-4 1.62ns ± 0% 1.57ns ± 0% -3.09% (p=0.001 n=8+9) LeadingZeros32-4 2.14ns ± 0% 1.48ns ± 0% -30.84% (p=0.002 n=8+10) LeadingZeros64-4 2.06ns ± 1% 1.33ns ± 0% -35.08% (p=0.000 n=8+8) 8-bit args is a special case - the Go code is really fast because it is just a single table lookup. So I've disabled that for now. Intrinsics were actually slower: LeadingZeros8-4 1.22ns ± 3% 1.58ns ± 1% +29.56% (p=0.000 n=10+10) Update #18616 Change-Id: Ia9c289b9ba59c583ea64060470315fd637e814cf Reviewed-on: https://go-review.googlesource.com/38311 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-03-16 22:53:49 +00:00
Cherry Zhang	c8f38b3398	cmd/compile: use type information in Aux for Store size Remove size AuxInt in Store, and alignment in Move/Zero. We still pass size AuxInt to Move/Zero, as it is used for partial Move/Zero lowering (e.g. cmd/compile/internal/ssa/gen/386.rules:288). SizeAndAlign is gone. Passes "toolstash -cmp" on std. Change-Id: I1ca34652b65dd30de886940e789fcf41d521475d Reviewed-on: https://go-review.googlesource.com/38150 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-16 14:25:04 +00:00
Cherry Zhang	211c8c9f1a	cmd/compile: pass types on SSA Store/Move/Zero ops For SSA Store/Move/Zero ops, attach the type of the value being stored to the op as the Aux field. This type will be used for write barrier insertion (in a followup CL). Since SSA passes do not accurately propagate types of values (because of type casting), we can't simply use type of the store's arguments for write barrier insertion. Passes "toolstash -cmp" on std. Updates #17583. Change-Id: I051d5e5c482931640d1d7d879b2a6bb91f2e0056 Reviewed-on: https://go-review.googlesource.com/36838 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-16 14:22:53 +00:00
Keith Randall	d5dc490519	cmd/compile: intrinsics for math/bits.TrailingZerosX Implement math/bits.TrailingZerosX using intrinsics. Generally reorganize the intrinsic spec a bit. The instrinsics data structure is now built at init time. This will make doing the other functions in math/bits easier. Update sys.CtzX to return int instead of uint{64,32} so it matches math/bits.TrailingZerosX. Improve the intrinsics a bit for amd64. We don't need the CMOV for <64 bit versions. Update #18616 Change-Id: Ic1c5339c943f961d830ae56f12674d7b29d4ff39 Reviewed-on: https://go-review.googlesource.com/38155 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-03-16 02:44:16 +00:00
Matthew Dempsky	91d08e3bca	cmd/compile/internal/ssa: remove unused OpFunc Change-Id: I0f7eec2e0c15a355422d5ae7289508a5bd33b971 Reviewed-on: https://go-review.googlesource.com/38171 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-03-14 19:28:25 +00:00
Matthew Dempsky	691755304c	cmd/compile/internal/ssa: populate SymEffects for SSA Ops Changes to ${GOARCH}Ops.go files were mechanically produced using github.com/mdempsky/ssa-symops, a one-off tool that inserts "SymEffect: X" elements by pattern matching against the Op names. Change-Id: Ibf3e481ffd588647f2a31662d72114b740ccbfcf Reviewed-on: https://go-review.googlesource.com/38084 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-14 18:34:45 +00:00
Matthew Dempsky	08d8d5c986	cmd/compile/internal/ssa: replace {Defer,Go}Call with StaticCall Passes toolstash-check -all. Change-Id: Icf8b75364e4761a5e56567f503b2c1cb17382ed2 Reviewed-on: https://go-review.googlesource.com/38080 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-13 19:44:36 +00:00
Matthew Dempsky	02e36f8c87	cmd/compile/internal/ssa: remove Hmul{8,16}{,u} ops Change-Id: I90865921584ae4bdfb6c220d439b14593d72b6f9 Reviewed-on: https://go-review.googlesource.com/37752 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-03-03 20:47:36 +00:00
Cherry Zhang	5bfd1ef036	cmd/compile: get rid of "volatile" in SSA A value is "volatile" if it is a pointer to the argument region on stack which will be clobbered by function call. This is used to make sure the value is safe when inserting write barrier calls. The writebarrier pass can tell whether a value is such a pointer. Therefore no need to mark it when building SSA and thread this information through. Passes "toolstash -cmp" on std. Updates #17583. Change-Id: Idc5fc0d710152b94b3c504ce8db55ea9ff5b5195 Reviewed-on: https://go-review.googlesource.com/36835 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-03 13:26:15 +00:00
Michael Munday	bd8a39b67a	cmd/compile: emit fused multiply-{add,subtract} instructions on s390x Explcitly block fused multiply-add pattern matching when a cast is used after the multiplication, for example: - (a * b) + c // can emit fused multiply-add - float64(a * b) + c // cannot emit fused multiply-add float{32,64} and complex{64,128} casts of matching types are now kept as OCONV operations rather than being replaced with OCONVNOP operations because they now imply a rounding operation (and therefore aren't a no-op anymore). Operations (for example, multiplication) on complex types may utilize fused multiply-add and -subtract instructions internally. There is no way to disable this behavior at the moment. Improves the performance of the floating point implementation of poly1305: name old speed new speed delta 64 246MB/s ± 0% 275MB/s ± 0% +11.48% (p=0.000 n=10+8) 1K 312MB/s ± 0% 357MB/s ± 0% +14.41% (p=0.000 n=10+10) 64Unaligned 246MB/s ± 0% 274MB/s ± 0% +11.43% (p=0.000 n=10+10) 1KUnaligned 312MB/s ± 0% 357MB/s ± 0% +14.39% (p=0.000 n=10+8) Updates #17895. Change-Id: Ia771d275bb9150d1a598f8cc773444663de5ce16 Reviewed-on: https://go-review.googlesource.com/36963 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-02-28 15:34:20 +00:00
David Chase	11b283092a	cmd/compile: add opcode flag hasSideEffects for do-not-remove Added a flag to generic and various architectures' atomic operations that are judged to have observable side effects and thus cannot be dead-code-eliminated. Test requires GOMAXPROCS > 1 without preemption in loop. Fixes #19182. Change-Id: Id2230031abd2cca0bbb32fd68fc8a58fb912070f Reviewed-on: https://go-review.googlesource.com/37333 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-02-22 15:15:47 +00:00
Keith Randall	708ba22a0c	cmd/compile: move constant divide strength reduction to SSA rules Currently the conversion from constant divides to multiplies is mostly done during the walk pass. This is suboptimal because SSA can determine that the value being divided by is constant more often (e.g. after inlining). Change-Id: If1a9b993edd71be37396b9167f77da271966f85f Reviewed-on: https://go-review.googlesource.com/37015 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2017-02-17 06:16:44 +00:00
Keith Randall	01c8719f8b	cmd/compile: move rotate instruction generation to SSA Remove rotate generation from walk. Remove OLROT and ssa.Lrot* opcodes. Generate rotates during SSA lowering for architectures that have them. This CL will allow rotates to be generated in more situations, like when the shift values are determined to be constant only after some analysis. Fixes #18254 Change-Id: I8d6d684ff5ce2511aceaddfda98b908007851079 Reviewed-on: https://go-review.googlesource.com/34232 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-02-02 17:57:15 +00:00
Keith Randall	741445068f	cmd/compile: make [0]T and [1]T SSAable types We used to have to keep on-stack copies of these types. Now they can be registerized. [0]T is kind of trivial but might as well handle it. This change enables another change I'm working on to improve how x.(T) expressions are handled (#17405). This CL helps because now all types that are direct interface types are registerizeable (e.g. [1]*byte). No higher-degree arrays for now because non-constant indexes are hard. Update #17405 Change-Id: I2399940965d17b3969ae66f6fe447a8cefdd6edd Reviewed-on: https://go-review.googlesource.com/32416 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-10-31 19:44:19 +00:00
Austin Clements	8a7f0ad0b5	cmd/compile: use typedmemclr for zeroing if there are pointers Currently, zeroing generates an ssa.OpZero, which never has write barriers, even if the assignment is an OASWB. The hybrid barrier requires write barriers on zeroing, so change OASWB to generate an ssa.OpZeroWB when assigning the zero value, which turns into a typedmemclr. Updates #17503. Change-Id: Ib37ac5e39f578447dbd6b36a6a54117d5624784d Reviewed-on: https://go-review.googlesource.com/31451 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-10-28 19:13:23 +00:00
Keith Randall	deb4177cf0	cmd/compile: use masks instead of branches for slicing When we do var x []byte = ... y := x[i:] We can't just use y.ptr = x.ptr + i, as the new pointer may point to the next object in memory after the backing array. We used to fix this by doing: y.cap = x.cap - i delta := i if y.cap == 0 { delta = 0 } y.ptr = x.ptr + delta That generates a branch in what is otherwise straight-line code. Better to do: y.cap = x.cap - i mask := (y.cap - 1) >> 63 // -1 if y.cap==0, 0 otherwise y.ptr = x.ptr + i &^ mask It's about the same number of instructions (~4, depending on what parts are constant, and the target architecture), but it is all inline. It plays nicely with CSE, and the mask can be computed in parallel with the index (in cases where a multiply is required). It is a minor win in both speed and space. Change-Id: Ied60465a0b8abb683c02208402e5bb7ac0e8370f Reviewed-on: https://go-review.googlesource.com/32022 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-10-27 20:22:49 +00:00
Cherry Zhang	f6aec889e1	cmd/compile: add a writebarrier phase in SSA When the compiler insert write barriers, the frontend makes conservative decisions at an early stage. This may have false positives which result in write barriers for stack writes. A new phase, writebarrier, is added to the SSA backend, to delay the decision and eliminate false positives. The frontend still makes conservative decisions. When building SSA, instead of emitting runtime calls directly, it emits WB ops (StoreWB, MoveWB, etc.), which will be expanded to branches and runtime calls in writebarrier phase. Writes to static locations on stack are detected and write barriers are removed. All write barriers of stack writes found by the script from issue #17330 are eliminated (except two false positives). Fixes #17330. Change-Id: I9bd66333da9d0ceb64dcaa3c6f33502798d1a0f8 Reviewed-on: https://go-review.googlesource.com/31131 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2016-10-25 21:53:40 +00:00
Cherry Zhang	2756d56c89	cmd/compile: intrinsify math/big.mulWW, divWW on AMD64 Change-Id: I59f7afa7a5803d19f8b21fe70fc85ef997bb3a85 Reviewed-on: https://go-review.googlesource.com/30542 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-10-11 16:07:46 +00:00
Keith Randall	3134ab3c2d	cmd/compile: redo nil checks Get rid of BlockCheck. Josh goaded me into it, and I went down a rabbithole making it happen. NilCheck now panics if the pointer is nil and returns void, as before. BlockCheck is gone, and NilCheck is no longer a Control value for any block. It just exists (and deadcode knows not to throw it away). I rewrote the nilcheckelim pass to handle this case. In particular, there can now be multiple NilCheck ops per block. I moved all of the arch-dependent nil check elimination done as part of ssaGenValue into its own proper pass, so we don't have to duplicate that code for every architecture. Making the arch-dependent nil check its own pass means I needed to add a bunch of flags to the opcode table so I could write the code without arch-dependent ops everywhere. Change-Id: I419f891ac9b0de313033ff09115c374163416a9f Reviewed-on: https://go-review.googlesource.com/29120 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-09-15 02:42:13 +00:00
Keith Randall	c345a3913f	cmd/compile: get rid of BlockCall No need for it, we can treat calls as (mostly) normal values that take a memory and return a memory. Lowers the number of basic blocks needed to represent a function. "go test -c net/http" uses 27% fewer basic blocks. Probably doesn't affect generated code much, but should help various passes whose running time and/or space depends on the number of basic blocks. Fixes #15631 Change-Id: I0bf21e123f835e2cfa382753955a4f8bce03dfa6 Reviewed-on: https://go-review.googlesource.com/28950 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2016-09-12 23:27:02 +00:00
Keith Randall	33bb597d85	cmd/compile: print SizeAndAlign AuxInt values correctly Makes the AuxInt arg to Move/Zero print in a readable format. Change-Id: I12295959b00ff7c1638d35836cc6d64d112c11ca Reviewed-on: https://go-review.googlesource.com/28271 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-08-31 20:34:39 +00:00
Keith Randall	84aac622a4	cmd/compile: intrinsify the rest of runtime/internal/atomic for amd64 Atomic swap, add/and/or, compare and swap. Also works on amd64p32. Change-Id: Idf2d8f3e1255f71deba759e6e75e293afe4ab2ba Reviewed-on: https://go-review.googlesource.com/27813 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-08-28 16:31:08 +00:00
Keith Randall	320ddcf834	cmd/compile: inline atomics from runtime/internal/atomic on amd64 Inline atomic reads and writes on amd64. There's no reason to pay the overhead of a call for these. To keep atomic loads from being reordered, we make them return a <value,memory> tuple. Change the meaning of resultInArg0 for tuple-generating ops to mean the first part of the result tuple, not the second. This means we can always put the store part of the tuple last, matching how arguments are laid out. This requires reordering the outputs of add32carry and sub32carry and their descendents in various architectures. benchmark old ns/op new ns/op delta BenchmarkAtomicLoad64-8 2.09 0.26 -87.56% BenchmarkAtomicStore64-8 7.54 5.72 -24.14% TBD (in a different CL): Cas, Or8, ... Change-Id: I713ea88e7da3026c44ea5bdb56ed094b20bc5207 Reviewed-on: https://go-review.googlesource.com/27641 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-08-25 20:09:04 +00:00
Keith Randall	3e270ab80b	cmd/compile: clean up ctz ops Now that we have ops that can return 2 results, have BSF return a result and flags. We can then get rid of the redundant comparison and use CMOV instead of CMOVconst ops. Get rid of a bunch of the ops we don't use. Ctz{8,16}, plus all the Clzs, and CMOVNEs. I don't think we'll ever use them, and they would be easy to add back if needed. Change-Id: I8858a1d017903474ea7e4002fc76a6a86e7bd487 Reviewed-on: https://go-review.googlesource.com/27630 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-08-23 23:45:12 +00:00
Cherry Zhang	659dd4f1d7	cmd/compile: add more ARM64 optimizations - Use machine instructions for uint64<->float conversions - Do not enforce alignment on Zero/Move ARM64 supports unaligned load/stores, but only aligned offset or small offset can be encoded into instructions. - Do combined loads Change-Id: Iffca7dd0f13070b17b784861ce5a30af584680eb Reviewed-on: https://go-review.googlesource.com/27086 Reviewed-by: David Chase <drchase@google.com>	2016-08-17 18:44:39 +00:00
Cherry Zhang	83208504fe	[dev.ssa] cmd/compile: add more on ARM64 SSA Support the following: - Shifts. ARM64 machine instructions only use lowest 6 bits of the shift (i.e. mod 64). Use conditional selection instruction to ensure Go semantics. - Zero/Move. Alignment is ensured. - Hmul, Avg64u, Sqrt. - reserve R18 (platform register in ARM64 ABI) and R29 (frame pointer in ARM64 ABI). Everything compiles, all.bash passed (with non-SSA test disabled). Change-Id: Ia8ed58dae5cbc001946f0b889357b258655078b1 Reviewed-on: https://go-review.googlesource.com/25290 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-07-27 16:37:23 +00:00
Josh Bleecher Snyder	68dc102ed1	[dev.ssa] cmd/compile: provide default types for all extension ops Change-Id: I655327818297cc6792c81912f2cebdc321381561 Reviewed-on: https://go-review.googlesource.com/24465 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-06-26 13:35:44 +00:00
Cherry Zhang	59e11d7827	[dev.ssa] cmd/compile: handle floating point on ARM Machine supports (or the runtime simulates in soft float mode) (u)int32<->float conversions. The frontend rewrites int64<->float conversions to call to runtime function. For int64->float32 conversion, the frontend generates . . AS u(100) l(10) tc(1) . . . NAME-main.~r1 u(1) a(true) g(1) l(9) x(8+0) class(PPARAMOUT) f(1) float32 . . . CALLFUNC u(100) l(10) tc(1) float32 . . . . NAME-runtime.int64tofloat64 u(1) a(true) x(0+0) class(PFUNC) tc(1) used(true) FUNC-func(int64) float64 The CALLFUNC node has type float32, whereas runtime.int64tofloat64 returns float64. The legacy backend implicitly makes a float64->float32 conversion. The SSA backend does not do implicit conversion, so we insert an explicit CONV here. All cmd/compile/internal/gc/testdata/*_ssa.go tests passed. Progress on SSA for ARM. Still not complete. Update #15365. Change-Id: I30937c8ff977271246b068f48224693776804339 Reviewed-on: https://go-review.googlesource.com/23652 Reviewed-by: Keith Randall <khr@golang.org>	2016-06-06 14:06:38 +00:00
Cherry Zhang	4636d02244	[dev.ssa] cmd/compile: handle 64-bit shifts on ARM Also fix a mistake in previous CL about x8 and x16 shifts: the shift needs ZeroExt. Progress on SSA for ARM. Still not complete. Updates #15365. Change-Id: Ibc352760023d38bc6b9c5251e929fe26e016637a Reviewed-on: https://go-review.googlesource.com/23486 Reviewed-by: David Chase <drchase@google.com>	2016-06-02 13:03:59 +00:00
Cherry Zhang	8756d9253f	[dev.ssa] cmd/compile: decompose 64-bit integer on ARM Introduce dec64 rules to (generically) decompose 64-bit integer on 32-bit architectures. 64-bit integer is composed/decomposed with Int64Make/Hi/Lo ops, as for complex types. The idea of dealing with Add64 is the following: (Add64 (Int64Make xh xl) (Int64Make yh yl)) -> (Int64Make (Add32withcarry xh yh (Select0 (Add32carry xl yl))) (Select1 (Add32carry xl yl))) where Add32carry returns a tuple (flags,uint32). Select0 and Select1 read the first and the second component of the tuple, respectively. The two Add32carry will be CSE'd. Similarly for multiplication, Mul32uhilo returns a tuple (hi, lo). Also add support of KeepAlive, to fix build after merge. Tests addressed_ssa.go, array_ssa.go, break_ssa.go, chan_ssa.go, cmp_ssa.go, ctl_ssa.go, map_ssa.go, and string_ssa.go in cmd/compile/internal/gc/testdata passed. Progress on SSA for ARM. Still not complete. Updates #15365. Change-Id: I7867c76785a456312de5d8398a6b3f7ca5a4f7ec Reviewed-on: https://go-review.googlesource.com/23213 Reviewed-by: Keith Randall <khr@golang.org>	2016-06-02 13:01:09 +00:00
David Chase	31e13c83c2	[dev.ssa] Merge branch 'master' into dev.ssa Change-Id: Iabc80b6e0734efbd234d998271e110d2eaad41dd	2016-05-27 15:19:33 -04:00
Keith Randall	3572c6418b	cmd/compile: keep pointer input arguments live throughout function Introduce a KeepAlive op which makes sure that its argument is kept live until the KeepAlive. Use KeepAlive to mark pointer input arguments as live after each function call and at each return. We do this change only for pointer arguments. Those are the critical ones to handle because they might have finalizers. Doing compound arguments (slices, structs, ...) is more complicated because we would need to track field liveness individually (we do that for auto variables now, but inputs requires extra trickery). Turn off the automatic marking of args as live. That way, when args are explicitly nulled, plive will know that the original argument is dead. The KeepAlive op will be the eventual implementation of runtime.KeepAlive. Fixes #15277 Change-Id: I5f223e65d99c9f8342c03fbb1512c4d363e903e5 Reviewed-on: https://go-review.googlesource.com/22365 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Russ Cox <rsc@golang.org>	2016-05-18 19:25:27 +00:00
Cherry Zhang	e2848de9ef	[dev.ssa] cmd/compile: implement the following for SSA on ARM - generic Ops: Phi, CALL variants, NilCheck - generic Blocks: Plain, Check - 32-bit arithmetics - CMP and conditional branches - load/store - zero/sign-extensions (8 to 16, 8 to 32, 16 to 32) Progress on SSA backend for ARM. Still not complete. Now "errors" package compiles and tests passed. Updates #15365. Change-Id: If126fd17f8695cbf55d64085bb3f1a4a53205701 Reviewed-on: https://go-review.googlesource.com/22856 Reviewed-by: Keith Randall <khr@golang.org>	2016-05-10 19:38:11 +00:00

1 2 3

132 commits