Stowage/go - Remotebranch.eu

Stowage/go

mirror of https://github.com/golang/go.git synced 2025-12-08 06:10:04 +00:00

Author	SHA1	Message	Date
Ilya Tocar	94484d8ed5	cmd/compile: intrinsify math.{Trunc/Ceil/Floor} on amd64 This significantly speed-ups Trunc. Ceil/Floor are using the same instruction, so do them too. name old time/op new time/op delta Floor-6 3.33ns ± 1% 3.22ns ± 0% -3.39% (p=0.000 n=10+10) Ceil-6 3.33ns ± 1% 3.22ns ± 0% -3.16% (p=0.000 n=10+7) Trunc-6 4.83ns ± 0% 3.22ns ± 0% -33.36% (p=0.000 n=6+8) Change-Id: If848790e458eedfe38a6a0407bb4f589c68ac254 Reviewed-on: https://go-review.googlesource.com/68630 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-10-31 19:30:54 +00:00
Austin Clements	7e343134d3	cmd/compile: compiler support for buffered write barrier This CL implements the compiler support for calling the buffered write barrier added by the previous CL. Since the buffered write barrier is only implemented on amd64 right now, this still supports the old, eager write barrier as well. There's little overhead to supporting both and this way a few tests in test/fixedbugs that expect to have liveness maps at write barrier calls can easily opt-in to the old, eager barrier. This significantly improves the performance of the write barrier: name old time/op new time/op delta WriteBarrier-12 73.5ns ±20% 19.2ns ±27% -73.90% (p=0.000 n=19+18) It also reduces the size of binaries because the write barrier call is more compact: name old object-bytes new object-bytes delta Template 398k ± 0% 393k ± 0% -1.14% (p=0.008 n=5+5) Unicode 208k ± 0% 206k ± 0% -1.00% (p=0.008 n=5+5) GoTypes 1.18M ± 0% 1.15M ± 0% -2.00% (p=0.008 n=5+5) Compiler 4.05M ± 0% 3.88M ± 0% -4.26% (p=0.008 n=5+5) SSA 8.25M ± 0% 8.11M ± 0% -1.59% (p=0.008 n=5+5) Flate 228k ± 0% 224k ± 0% -1.83% (p=0.008 n=5+5) GoParser 295k ± 0% 284k ± 0% -3.62% (p=0.008 n=5+5) Reflect 1.00M ± 0% 0.99M ± 0% -0.70% (p=0.008 n=5+5) Tar 339k ± 0% 333k ± 0% -1.67% (p=0.008 n=5+5) XML 404k ± 0% 395k ± 0% -2.10% (p=0.008 n=5+5) [Geo mean] 704k 690k -2.00% name old exe-bytes new exe-bytes delta HelloSize 1.05M ± 0% 1.04M ± 0% -1.55% (p=0.008 n=5+5) https://perf.golang.org/search?q=upload:20171027.1 (Amusingly, this also reduces compiler allocations by 0.75%, which, combined with the better write barrier, speeds up the compiler overall by 2.10%. See the perf link.) It slightly improves the performance of most of the go1 benchmarks and improves the performance of the x/benchmarks: name old time/op new time/op delta BinaryTree17-12 2.40s ± 1% 2.47s ± 1% +2.69% (p=0.000 n=19+19) Fannkuch11-12 2.95s ± 0% 2.95s ± 0% +0.21% (p=0.000 n=20+19) FmtFprintfEmpty-12 41.8ns ± 4% 41.4ns ± 2% -1.03% (p=0.014 n=20+20) FmtFprintfString-12 68.7ns ± 2% 67.5ns ± 1% -1.75% (p=0.000 n=20+17) FmtFprintfInt-12 79.0ns ± 3% 77.1ns ± 1% -2.40% (p=0.000 n=19+17) FmtFprintfIntInt-12 127ns ± 1% 123ns ± 3% -3.42% (p=0.000 n=20+20) FmtFprintfPrefixedInt-12 152ns ± 1% 150ns ± 1% -1.02% (p=0.000 n=18+17) FmtFprintfFloat-12 211ns ± 1% 209ns ± 0% -0.99% (p=0.000 n=20+16) FmtManyArgs-12 500ns ± 0% 496ns ± 0% -0.73% (p=0.000 n=17+20) GobDecode-12 6.44ms ± 1% 6.53ms ± 0% +1.28% (p=0.000 n=20+19) GobEncode-12 5.46ms ± 0% 5.46ms ± 1% ~ (p=0.550 n=19+20) Gzip-12 220ms ± 1% 216ms ± 0% -1.75% (p=0.000 n=19+19) Gunzip-12 38.8ms ± 0% 38.6ms ± 0% -0.30% (p=0.000 n=18+19) HTTPClientServer-12 79.0µs ± 1% 78.2µs ± 1% -1.01% (p=0.000 n=20+20) JSONEncode-12 11.9ms ± 0% 11.9ms ± 0% -0.29% (p=0.000 n=20+19) JSONDecode-12 52.6ms ± 0% 52.2ms ± 0% -0.68% (p=0.000 n=19+20) Mandelbrot200-12 3.69ms ± 0% 3.68ms ± 0% -0.36% (p=0.000 n=20+20) GoParse-12 3.13ms ± 1% 3.18ms ± 1% +1.67% (p=0.000 n=19+20) RegexpMatchEasy0_32-12 73.2ns ± 1% 72.3ns ± 1% -1.19% (p=0.000 n=19+18) RegexpMatchEasy0_1K-12 241ns ± 0% 239ns ± 0% -0.83% (p=0.000 n=17+16) RegexpMatchEasy1_32-12 68.6ns ± 1% 69.0ns ± 1% +0.47% (p=0.015 n=18+16) RegexpMatchEasy1_1K-12 364ns ± 0% 361ns ± 0% -0.67% (p=0.000 n=16+17) RegexpMatchMedium_32-12 104ns ± 1% 103ns ± 1% -0.79% (p=0.001 n=20+15) RegexpMatchMedium_1K-12 33.8µs ± 3% 34.0µs ± 2% ~ (p=0.267 n=20+19) RegexpMatchHard_32-12 1.64µs ± 1% 1.62µs ± 2% -1.25% (p=0.000 n=19+18) RegexpMatchHard_1K-12 49.2µs ± 0% 48.7µs ± 1% -0.93% (p=0.000 n=19+18) Revcomp-12 391ms ± 5% 396ms ± 7% ~ (p=0.154 n=19+19) Template-12 63.1ms ± 0% 59.5ms ± 0% -5.76% (p=0.000 n=18+19) TimeParse-12 307ns ± 0% 306ns ± 0% -0.39% (p=0.000 n=19+17) TimeFormat-12 325ns ± 0% 323ns ± 0% -0.50% (p=0.000 n=19+19) [Geo mean] 47.3µs 46.9µs -0.67% https://perf.golang.org/search?q=upload:20171026.1 name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.25ms ± 1% 2.20ms ± 1% -2.31% (p=0.000 n=18+18) HTTP-12 12.6µs ± 0% 12.6µs ± 0% -0.72% (p=0.000 n=18+17) JSON-12 11.0ms ± 0% 11.0ms ± 1% -0.68% (p=0.000 n=17+19) https://perf.golang.org/search?q=upload:20171026.2 Updates #14951. Updates #22460. Change-Id: Id4c0932890a1d41020071bec73b8522b1367d3e7 Reviewed-on: https://go-review.googlesource.com/73712 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-10-30 18:12:46 +00:00
Cherry Zhang	e01eac371a	cmd/compile: mark LoweredGetCallerPC rematerializeable The caller's PC is always available in the frame. We can just load it when needed, no need to spill. Change-Id: I9c0a525903e574bb4eec9fe53cbeb8c64321166a Reviewed-on: https://go-review.googlesource.com/70710 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2017-10-14 00:53:20 +00:00
Cherry Zhang	6f3e5e637c	cmd/compile: intrinsify runtime.getcallersp Add a compiler intrinsic for getcallersp. So we are able to get rid of the argument (not done in this CL). Change-Id: Ic38fda1c694f918328659ab44654198fb116668d Reviewed-on: https://go-review.googlesource.com/69350 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: David Chase <drchase@google.com>	2017-10-10 15:15:21 +00:00
Ilya Tocar	6b8a3c8889	cmd/compile/internal/amd64: add SETccmem Combine setcc and store of result into setcc that writes directly to memory. Triggers 200+ times in go tool. Fixes #21630 Change-Id: Iafa22607426f4120140c88fae4b9aecb46e0bba8 Reviewed-on: https://go-review.googlesource.com/67950 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-10-05 20:53:28 +00:00
David Chase	6cac100eef	cmd/compile: add intrinsic for reading caller's pc First step towards removing the mandatory argument for getcallerpc, which solves certain problems for the runtime. This might also slightly improve performance. Intrinsic enabled on 386, amd64, amd64p32, runtime asm implementation removed on those architectures. Now-superfluous argument remains in getcallerpc signature (for a future CL; non-386/amd64 asm funcs ignore it). Added getcallerpc to the "not a real function" test in dcl.go, that story is a little odd with respect to unexported functions but that is not this CL. Fixes #17327. Change-Id: I5df1ad91f27ee9ac1f0dd88fa48f1329d6306c3e Reviewed-on: https://go-review.googlesource.com/31851 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2017-09-22 18:37:03 +00:00
Michael Munday	2cb61aa3f7	cmd/compile: stop rematerializable ops from clobbering flags Rematerializable ops can be inserted after the flagalloc phase, they must therefore not clobber flags. This CL adds a check to ensure this doesn't happen and fixes the instances where it does currently. amd64: ADDQconst and ADDLconst were recently changed to be rematerializable in CL 54393 (only in tip, not 1.9). That change has been reverted. s390x: MOVDaddr could clobber flags when using dynamic linking due to a ADD with immediate instruction. Change the code generation to use LA/LAY instead. Fixes #21080. Change-Id: Ia85c882afa2a820a309e93775354b3169ec6d034 Reviewed-on: https://go-review.googlesource.com/63030 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2017-09-20 17:10:58 +00:00
Ilya Tocar	a4e1a72f0a	cmd/compile/internal/amd64: add MOVLloadidx8 and MOVLstoreidx8 Currently we only use 1 and 4 as a scale for indexed 4-byte load. In code generated in #20711 we can use indexed load with scale=8, to improve performance: name old time/op new time/op delta GM-6 108µs ± 0% 95µs ± 0% -12.06% (p=0.000 n=10+10) So add new ops and combine loadidx1(shift 3..).. into loadidx8, same for stores. Change-Id: I5ed1c250ac40960e20606580cf9de221e75b72f1 Reviewed-on: https://go-review.googlesource.com/46134 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-09-01 21:25:25 +00:00
Keith Randall	053840dc00	cmd/compile: avoid generating large offsets The assembler barfs on large offsets. Make sure that all the instructions that need to have their offsets in an int32 1) check on any rule that computes offsets for such instructions 2) change their aux fields so the check builder checks it. The assembler also silently misassembled offsets between 1<<31 and 1<<32. Add a check in the assembler to barf on those as well. Fixes #21655 Change-Id: Iebf24bf10f9f37b3ea819ceb7d588251c0f46d7d Reviewed-on: https://go-review.googlesource.com/59630 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2017-08-28 22:02:59 +00:00
Ilya Tocar	7dd279013b	cmd/compile/internal/ssa: Mark ADD[Q\|L]const as rematerializeable We can rematerialize only ops that have SP or SB as their only argument. There are some ADDQconst(SP) that can be rematerialized, but are spilled/filled instead, so mark addconst as rematerializeable. This shaves ~1kb from go tool. Change-Id: Ib4cf4fe5f2ec9d3d7e5f0f77f1193eba66ca2f08 Reviewed-on: https://go-review.googlesource.com/54393 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-28 17:30:40 +00:00
Keith Randall	fb05948d9e	cmd/compile,math: improve code generation for math.Abs Implement int reg <-> fp reg moves on amd64. If we see a load to int reg followed by an int->fp move, then we can just load to the fp reg instead. Same for stores. math.Abs is now: MOVQ "".x+8(SP), AX SHLQ $1, AX SHRQ $1, AX MOVQ AX, "".~r1+16(SP) math.Copysign is now: MOVQ "".x+8(SP), AX SHLQ $1, AX SHRQ $1, AX MOVQ "".y+16(SP), CX SHRQ $63, CX SHLQ $63, CX ORQ CX, AX MOVQ AX, "".~r2+24(SP) math.Float64bits is now: MOVSD "".x+8(SP), X0 MOVSD X0, "".~r1+16(SP) (it would be nicer to use a non-SSE reg for this, nothing is perfect) And due to the fix for #21440, the inlined version of these improve as well. name old time/op new time/op delta Abs 1.38ns ± 5% 0.89ns ±10% -35.54% (p=0.000 n=10+10) Copysign 1.56ns ± 7% 1.35ns ± 6% -13.77% (p=0.000 n=9+10) Fixes #13095 Change-Id: Ibd7f2792412a6668608780b0688a77062e1f1499 Reviewed-on: https://go-review.googlesource.com/58732 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2017-08-25 19:15:01 +00:00
Josh Bleecher Snyder	18b48afec9	cmd/compile: mark MOVQconvert as resultInArg0 on x86 architectures This prevents unnecessary reg-reg moves during pointer arithmetic. This change reduces the size of the full hello world binary by 0.4%. Updates #21572 Change-Id: Ia0427021e5c94545a0dbd83a6801815806e5b12d Reviewed-on: https://go-review.googlesource.com/58371 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-24 16:40:38 +00:00
Ilya Tocar	ac29f4d01c	cmd/compile/internal/amd64: add ADD[Q\|L]constmem We can add a constant to loaction in memory with 1 instruction, as opposed to load+add+store, so add a new op and relevent ssa rules. Triggers in e. g. encoding/json isValidNumber: NumberIsValid-6 36.4ns ± 0% 35.2ns ± 1% -3.32% (p=0.000 n=6+10) Shaves ~2.5 kb from go tool. Change-Id: I7ba576676c2522432360f77b290cecb9574a93c3 Reviewed-on: https://go-review.googlesource.com/54431 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-18 18:55:44 +00:00
Ilya Tocar	df70982825	cmd/compile/internal/ssa: use sse to zero on amd64 Use 16-byte stores instead of 8-byte stores to zero small blocks. Also switch to duffzero for 65+ bytes only, because for each duffzero call we also save/restore BP, so call requires 4 instructions and replacing it with 4 sse stores doesn't cause code-bloat. Also switch duffzero to use leaq, instead of addq to avoid clobbering flags. ClearFat8-6 0.54ns ± 0% 0.54ns ± 0% ~ (all equal) ClearFat12-6 1.07ns ± 0% 1.07ns ± 0% ~ (all equal) ClearFat16-6 1.07ns ± 0% 0.69ns ± 0% -35.51% (p=0.001 n=8+9) ClearFat24-6 1.61ns ± 1% 1.07ns ± 0% -33.33% (p=0.000 n=10+10) ClearFat32-6 2.14ns ± 0% 1.07ns ± 0% -50.00% (p=0.001 n=8+9) ClearFat40-6 2.67ns ± 1% 1.61ns ± 0% -39.72% (p=0.000 n=10+8) ClearFat48-6 3.75ns ± 0% 2.68ns ± 0% -28.59% (p=0.000 n=9+9) ClearFat56-6 4.29ns ± 0% 3.22ns ± 0% -25.10% (p=0.000 n=9+9) ClearFat64-6 4.30ns ± 0% 3.22ns ± 0% -25.15% (p=0.000 n=8+8) ClearFat128-6 7.50ns ± 1% 7.51ns ± 0% ~ (p=0.767 n=10+9) ClearFat256-6 13.9ns ± 1% 13.9ns ± 1% ~ (p=0.257 n=10+10) ClearFat512-6 26.8ns ± 0% 26.8ns ± 0% ~ (p=0.467 n=8+8) ClearFat1024-6 52.5ns ± 0% 52.5ns ± 0% ~ (p=1.000 n=8+8) Also shaves ~20kb from go tool: go_old 10384994 go_new 10364514 [-20480 bytes] section differences global text (code) = -20585 bytes (-0.532047%) read-only data = -302 bytes (-0.018101%) Total difference -20887 bytes (-0.348731%) Change-Id: I15854e87544545c1af24775df895e38e16e12694 Reviewed-on: https://go-review.googlesource.com/54410 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-16 15:52:27 +00:00
Alberto Donizetti	d3c0af7a0a	cmd/compile: fix ADDSDmem comment and order in list ADDSDmem comment said f32 (likely a copy-paste mistake). Also swap ADDSSmem and ADDSDmem positions in the list to uniform the list order. Fixes #21225 Change-Id: I26bb116900c1cf4c4e6faeef613d7318c9c85b98 Reviewed-on: https://go-review.googlesource.com/52071 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-08 09:21:25 +00:00
Jamie Kerr	0593ad1e23	cmd/compile: fix comment typo Change-Id: If581bd4e9d9b4421e2ae20582c596fccb73d9aed Reviewed-on: https://go-review.googlesource.com/48866 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-07-15 18:36:07 +00:00
Keith Randall	256210c719	cmd/compile: better check for single live memory Enhance the one-live-memory-at-a-time check to run during many more phases of the SSA backend. Also make it work in an interblock fashion. Change types.IsMemory to return true for tuples containing a memory type. Fix trim pass to build the merged phi correctly. Doesn't affect code but allows the check to pass after trim runs. Switch the AddTuple* ops to take the memory-containing tuple argument second. Update #20335 Change-Id: I5b03ef3606b75a9e4f765276bb8b183cdc172b43 Reviewed-on: https://go-review.googlesource.com/43495 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-05-15 19:17:35 +00:00
Keith Randall	1e72bf6218	cmd/compile: experiment which clobbers all dead pointer fields The experiment "clobberdead" clobbers all pointer fields that the compiler thinks are dead, just before and after every safepoint. Useful for debugging the generation of live pointer bitmaps. Helped find the following issues: Update #15936 Update #16026 Update #16095 Update #18860 Change-Id: Id1d12f86845e3d93bae903d968b1eac61fc461f9 Reviewed-on: https://go-review.googlesource.com/23924 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-04-21 20:19:50 +00:00
Keith Randall	7e07e635f3	cmd/compile: implement non-constant rotates Makes math/bits.Rotate{Left,Right} fast on amd64. name old time/op new time/op delta RotateLeft-12 7.42ns ± 6% 5.45ns ± 6% -26.54% (p=0.000 n=9+10) RotateLeft8-12 4.77ns ± 5% 3.42ns ± 7% -28.25% (p=0.000 n=8+10) RotateLeft16-12 4.82ns ± 8% 3.40ns ± 7% -29.36% (p=0.000 n=10+10) RotateLeft32-12 4.87ns ± 7% 3.48ns ± 7% -28.51% (p=0.000 n=8+9) RotateLeft64-12 5.23ns ±10% 3.35ns ± 6% -35.97% (p=0.000 n=9+10) RotateRight-12 7.59ns ± 8% 5.71ns ± 1% -24.72% (p=0.000 n=10+8) RotateRight8-12 4.98ns ± 7% 3.36ns ± 9% -32.55% (p=0.000 n=10+10) RotateRight16-12 5.12ns ± 2% 3.45ns ± 5% -32.62% (p=0.000 n=10+10) RotateRight32-12 4.80ns ± 6% 3.42ns ±16% -28.68% (p=0.000 n=10+10) RotateRight64-12 4.78ns ± 6% 3.42ns ± 6% -28.50% (p=0.000 n=10+10) Update #18940 Change-Id: Ie79fb5581c489ed4d3b859314c5e669a134c119b Reviewed-on: https://go-review.googlesource.com/39711 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2017-04-17 23:19:45 +00:00
Keith Randall	5cadc91b3c	cmd/compile: intrinsics for math/bits.OnesCount Popcount instructions on amd64 are not guaranteed to be present, so we must guard their call. Rewrite rules can't generate control flow at the moment, so the intrinsifier needs to generate that code. name old time/op new time/op delta OnesCount-8 2.47ns ± 5% 1.04ns ± 2% -57.70% (p=0.000 n=10+10) OnesCount16-8 1.05ns ± 1% 0.78ns ± 0% -25.56% (p=0.000 n=9+8) OnesCount32-8 1.63ns ± 5% 1.04ns ± 2% -35.96% (p=0.000 n=10+10) OnesCount64-8 2.45ns ± 0% 1.04ns ± 1% -57.55% (p=0.000 n=6+10) Update #18616 Change-Id: I4aff2cc9aa93787898d7b22055fe272a7cf95673 Reviewed-on: https://go-review.googlesource.com/38320 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-04-04 02:40:11 +00:00
Keith Randall	53f8a6aeb0	cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2017-04-03 22:03:43 +00:00
Keith Randall	68da265c8e	Revert "cmd/compile: automatically handle commuting ops in rewrite rules" This reverts commit `041ecb697f`. Reason for revert: Not working on S390x and some 386 archs. I have a guess why the S390x is failing. No clue on the 386 yet. Revert until I can figure it out. Change-Id: I64f1ce78fa6d1037ebe7ee2a8a8107cb4c1db70c Reviewed-on: https://go-review.googlesource.com/38790 Reviewed-by: Keith Randall <khr@golang.org>	2017-03-29 18:06:44 +00:00
Keith Randall	041ecb697f	cmd/compile: automatically handle commuting ops in rewrite rules We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. [Note to reviewers: check these carefully. Most of the other rule changes are trivial.] Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: I999b1307272e91965b66754576019dedcbe7527a Reviewed-on: https://go-review.googlesource.com/38666 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2017-03-29 16:22:09 +00:00
Keith Randall	495b167919	cmd/compile: intrinsics for math/bits.{Len,LeadingZeros} name old time/op new time/op delta LeadingZeros-4 2.00ns ± 0% 1.34ns ± 1% -33.02% (p=0.000 n=8+10) LeadingZeros16-4 1.62ns ± 0% 1.57ns ± 0% -3.09% (p=0.001 n=8+9) LeadingZeros32-4 2.14ns ± 0% 1.48ns ± 0% -30.84% (p=0.002 n=8+10) LeadingZeros64-4 2.06ns ± 1% 1.33ns ± 0% -35.08% (p=0.000 n=8+8) 8-bit args is a special case - the Go code is really fast because it is just a single table lookup. So I've disabled that for now. Intrinsics were actually slower: LeadingZeros8-4 1.22ns ± 3% 1.58ns ± 1% +29.56% (p=0.000 n=10+10) Update #18616 Change-Id: Ia9c289b9ba59c583ea64060470315fd637e814cf Reviewed-on: https://go-review.googlesource.com/38311 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-03-16 22:53:49 +00:00
Matthew Dempsky	691755304c	cmd/compile/internal/ssa: populate SymEffects for SSA Ops Changes to ${GOARCH}Ops.go files were mechanically produced using github.com/mdempsky/ssa-symops, a one-off tool that inserts "SymEffect: X" elements by pattern matching against the Op names. Change-Id: Ibf3e481ffd588647f2a31662d72114b740ccbfcf Reviewed-on: https://go-review.googlesource.com/38084 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-14 18:34:45 +00:00
Matthew Dempsky	08d8d5c986	cmd/compile/internal/ssa: replace {Defer,Go}Call with StaticCall Passes toolstash-check -all. Change-Id: Icf8b75364e4761a5e56567f503b2c1cb17382ed2 Reviewed-on: https://go-review.googlesource.com/38080 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-13 19:44:36 +00:00
Matthew Dempsky	02e36f8c87	cmd/compile/internal/ssa: remove Hmul{8,16}{,u} ops Change-Id: I90865921584ae4bdfb6c220d439b14593d72b6f9 Reviewed-on: https://go-review.googlesource.com/37752 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-03-03 20:47:36 +00:00
Josh Bleecher Snyder	2183135554	cmd/compile: recognize bit test patterns on amd64 Updates #18943 Change-Id: If3080d6133bb6d2710b57294da24c90251ab4e08 Reviewed-on: https://go-review.googlesource.com/36329 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-01 00:36:04 +00:00
Keith Randall	0fe58bf650	cmd/compile: simplify load+op rules There's no need to use @block rules, as canMergeLoad makes sure that the load and op are already in the same block. With no @block needed, we also don't need to set the type explicitly. It can just be inherited from the op being rewritten. Noticed while working on #19284. Change-Id: Ied8bcc8058260118ff7e166093112e29107bcb7e Reviewed-on: https://go-review.googlesource.com/37585 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-02-28 22:16:23 +00:00
David Chase	11b283092a	cmd/compile: add opcode flag hasSideEffects for do-not-remove Added a flag to generic and various architectures' atomic operations that are judged to have observable side effects and thus cannot be dead-code-eliminated. Test requires GOMAXPROCS > 1 without preemption in loop. Fixes #19182. Change-Id: Id2230031abd2cca0bbb32fd68fc8a58fb912070f Reviewed-on: https://go-review.googlesource.com/37333 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-02-22 15:15:47 +00:00
Keith Randall	cfb0d34992	cmd/compile: amd64, allow XCHG on stack pointers XCHG needs to allow the stack pointer as an argument because we have a rewrite that incorporates the address of a local variable into the instruction. Fixes #19184 Change-Id: Ic438e6e1946332cdce3864d15abecd41b911b2a9 Reviewed-on: https://go-review.googlesource.com/37253 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-02-19 17:16:01 +00:00
Ilya Tocar	21c71d7788	cmd/compile/internal/ssa: combine load + op on AMD64 On AMD64 Most operation can have one operand in memory. Combine load and dependand operation into one new operation, where possible. I've seen no significant performance changes on go1, but this allows to remove ~1.8kb code from go tool. And in math package I see e. g.: Remainder-6 70.0ns ± 0% 64.6ns ± 0% -7.76% (p=0.000 n=9+1 Change-Id: I88b8602b1d55da8ba548a34eb7da4b25d59a297e Reviewed-on: https://go-review.googlesource.com/36793 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-02-17 22:21:49 +00:00
Cherry Zhang	fddc004537	cmd/compile: remove nil check for Zero/Move on 386, AMD64, S390X Fixes #18003. Change-Id: Iadcc5c424c64badecfb5fdbd4dbd9197df56182c Reviewed-on: https://go-review.googlesource.com/33421 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-02-02 21:28:38 +00:00
Keith Randall	01c8719f8b	cmd/compile: move rotate instruction generation to SSA Remove rotate generation from walk. Remove OLROT and ssa.Lrot* opcodes. Generate rotates during SSA lowering for architectures that have them. This CL will allow rotates to be generated in more situations, like when the shift values are determined to be constant only after some analysis. Fixes #18254 Change-Id: I8d6d684ff5ce2511aceaddfda98b908007851079 Reviewed-on: https://go-review.googlesource.com/34232 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-02-02 17:57:15 +00:00
Michael Munday	15817e409b	cmd/compile: make link register allocatable in non-leaf functions We save and restore the link register in non-leaf functions because it is clobbered by CALLs. It is therefore available for general purpose use. Only enabled on s390x currently. The RC4 benchmarks in particular benefit from the extra register: name old speed new speed delta RC4_128 243MB/s ± 2% 341MB/s ± 2% +40.46% (p=0.008 n=5+5) RC4_1K 267MB/s ± 0% 359MB/s ± 1% +34.32% (p=0.008 n=5+5) RC4_8K 271MB/s ± 0% 362MB/s ± 0% +33.61% (p=0.008 n=5+5) Change-Id: Id23bff95e771da9425353da2f32668b8e34ba09f Reviewed-on: https://go-review.googlesource.com/30597 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-10-11 18:52:35 +00:00
Cherry Zhang	2756d56c89	cmd/compile: intrinsify math/big.mulWW, divWW on AMD64 Change-Id: I59f7afa7a5803d19f8b21fe70fc85ef997bb3a85 Reviewed-on: https://go-review.googlesource.com/30542 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-10-11 16:07:46 +00:00
Keith Randall	98938189a1	cmd/compile: remove duplicate nilchecks Mark nil check operations as faulting if their arg is zero. This lets the late nilcheck pass remove duplicates. Fixes #17242. Change-Id: I4c9938d8a5a1e43edd85b4a66f0b34004860bcd9 Reviewed-on: https://go-review.googlesource.com/29952 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-09-27 23:54:01 +00:00
Keith Randall	3134ab3c2d	cmd/compile: redo nil checks Get rid of BlockCheck. Josh goaded me into it, and I went down a rabbithole making it happen. NilCheck now panics if the pointer is nil and returns void, as before. BlockCheck is gone, and NilCheck is no longer a Control value for any block. It just exists (and deadcode knows not to throw it away). I rewrote the nilcheckelim pass to handle this case. In particular, there can now be multiple NilCheck ops per block. I moved all of the arch-dependent nil check elimination done as part of ssaGenValue into its own proper pass, so we don't have to duplicate that code for every architecture. Making the arch-dependent nil check its own pass means I needed to add a bunch of flags to the opcode table so I could write the code without arch-dependent ops everywhere. Change-Id: I419f891ac9b0de313033ff09115c374163416a9f Reviewed-on: https://go-review.googlesource.com/29120 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-09-15 02:42:13 +00:00
Keith Randall	c345a3913f	cmd/compile: get rid of BlockCall No need for it, we can treat calls as (mostly) normal values that take a memory and return a memory. Lowers the number of basic blocks needed to represent a function. "go test -c net/http" uses 27% fewer basic blocks. Probably doesn't affect generated code much, but should help various passes whose running time and/or space depends on the number of basic blocks. Fixes #15631 Change-Id: I0bf21e123f835e2cfa382753955a4f8bce03dfa6 Reviewed-on: https://go-review.googlesource.com/28950 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2016-09-12 23:27:02 +00:00
Cherry Zhang	f1ef5a06d2	cmd/compile: mark some AMD64 atomic ops as clobberFlags Fixes #16985. Change-Id: I5954db28f7b70dd3ac7768e471d5df871a5b20f9 Reviewed-on: https://go-review.googlesource.com/28510 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-09-06 14:26:18 +00:00
Keith Randall	cc0248aea5	cmd/compile: don't reserve X15 for float sub/div any more We used to reserve X15 to implement the 3-operand floating-point sub/div ops with the 2-operand sub/div that 386/amd64 gives us. Now that resultInArg0 is implemented, we no longer need to reserve X15 (X7 on 386). Fixes #15584 Change-Id: I978e6c0a35236e89641bfc027538cede66004e82 Reviewed-on: https://go-review.googlesource.com/28272 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-08-31 20:35:49 +00:00
Keith Randall	0c6c3d1de7	cmd/compile: fix noopt build Atomic add rules were depending on CSE to combine duplicate atomic ops. With -N, CSE doesn't run. Redo the rules for atomic add so there's only one atomic op. Introduce an add-to-first-part-of-tuple pseudo-ops to make the atomic add result correct. Change-Id: Ib132247051abe5f80fefad6c197db8df8ee06427 Reviewed-on: https://go-review.googlesource.com/27991 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-08-28 18:54:52 +00:00
Keith Randall	84aac622a4	cmd/compile: intrinsify the rest of runtime/internal/atomic for amd64 Atomic swap, add/and/or, compare and swap. Also works on amd64p32. Change-Id: Idf2d8f3e1255f71deba759e6e75e293afe4ab2ba Reviewed-on: https://go-review.googlesource.com/27813 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-08-28 16:31:08 +00:00
Keith Randall	320ddcf834	cmd/compile: inline atomics from runtime/internal/atomic on amd64 Inline atomic reads and writes on amd64. There's no reason to pay the overhead of a call for these. To keep atomic loads from being reordered, we make them return a <value,memory> tuple. Change the meaning of resultInArg0 for tuple-generating ops to mean the first part of the result tuple, not the second. This means we can always put the store part of the tuple last, matching how arguments are laid out. This requires reordering the outputs of add32carry and sub32carry and their descendents in various architectures. benchmark old ns/op new ns/op delta BenchmarkAtomicLoad64-8 2.09 0.26 -87.56% BenchmarkAtomicStore64-8 7.54 5.72 -24.14% TBD (in a different CL): Cas, Or8, ... Change-Id: I713ea88e7da3026c44ea5bdb56ed094b20bc5207 Reviewed-on: https://go-review.googlesource.com/27641 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-08-25 20:09:04 +00:00
Keith Randall	3e270ab80b	cmd/compile: clean up ctz ops Now that we have ops that can return 2 results, have BSF return a result and flags. We can then get rid of the redundant comparison and use CMOV instead of CMOVconst ops. Get rid of a bunch of the ops we don't use. Ctz{8,16}, plus all the Clzs, and CMOVNEs. I don't think we'll ever use them, and they would be easy to add back if needed. Change-Id: I8858a1d017903474ea7e4002fc76a6a86e7bd487 Reviewed-on: https://go-review.googlesource.com/27630 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-08-23 23:45:12 +00:00
Keith Randall	5ae8230769	cmd/compile: use shorter versions of zero-extend ops Only need to zero-extend to 32 bits and we get the top 32 bits zeroed for free. Only the WQ change actually generates different code. The assembler did this optimization for us in the other two cases. But we might as well do it during SSA so -S output more closely matches the actual generated instructions. Change-Id: I3e4ac50dc4da124014d4e31c86e9fc539d94f7fd Reviewed-on: https://go-review.googlesource.com/23711 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2016-08-16 21:32:21 +00:00
Keith Randall	69a755b602	[dev.ssa] cmd/compile: port SSA backend to amd64p32 It's not a new backend, just a PtrSize==4 modification of the existing AMD64 backend. Change-Id: Icc63521a5cf4ebb379f7430ef3f070894c09afda Reviewed-on: https://go-review.googlesource.com/25586 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-08-09 15:48:26 +00:00
Cherry Zhang	0484052358	[dev.ssa] cmd/compile: remove flags from regMask Reg allocator skips flag-typed values. Flag allocator uses the type and whether the op has "clobberFlags" set. Tested on AMD64, ARM, ARM64, 386. Passed 'toolstash -cmp' on AMD64. PPC64 is coded blindly. Change-Id: Ib1cc27efecef6a1bb27f7d7ed035a582660d244f Reviewed-on: https://go-review.googlesource.com/25480 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-08-07 03:08:03 +00:00
Keith Randall	d2286ea284	[dev.ssa] Merge remote-tracking branch 'origin/master' into mergebranch Semi-regular merge from tip into dev.ssa. Change-Id: Iadb60e594ef65a99c0e1404b14205fa67c32a9e9	2016-08-04 10:08:20 -07:00
Cherry Zhang	111d590f86	cmd/compile: fix possible spill of invalid pointer with DUFFZERO on AMD64 SSA compiler on AMD64 may spill Duff-adjusted address as scalar. If the object is on stack and the stack moves, the spilled address become invalid. Making the spill pointer-typed does not work. The Duff-adjusted address points to the memory before the area to be zeroed and may be invalid. This may cause stack scanning code panic. Fix it by doing Duff-adjustment in genValue, so the intermediate value is not seen by the reg allocator, and will not be spilled. Add a test to cover both cases. As it depends on allocation, it may be not always triggered. Fixes #16515. Change-Id: Ia81d60204782de7405b7046165ad063384ede0db Reviewed-on: https://go-review.googlesource.com/25309 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-07-29 01:09:55 +00:00

1 2 3

144 commits