Stowage/go - Remotebranch.eu

Stowage/go

mirror of https://github.com/golang/go.git synced 2025-12-08 06:10:04 +00:00

Author	SHA1	Message	Date
Alberto Donizetti	c1806906d8	test: port bits.Len intrinsics tests to the new codegen harness This change move bits.Len* intrinsification tests to the new codegen test harness, removing them from the old ssa_test file. Five different test functions (one for each bit.Len function tested) was used, to avoid possible unwanted interactions between multiple calls inside one function. Change-Id: Iffd5be55b58e88597fa30a562a28dacb01236d8b Reviewed-on: https://go-review.googlesource.com/98156 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com>	2018-03-05 18:01:19 +00:00
Giovanni Bajo	89ae7045f3	test: convert all math-related tests from asm_test Change-Id: If542f0b5c5754e6eb2f9b302fe5a148ba9a57338 Reviewed-on: https://go-review.googlesource.com/98443 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-03-04 16:52:33 +00:00
Giovanni Bajo	fad31e513d	test: move load/store combines into asmcheck This CL moves the load/store combining tests into asmcheck. In addition at being more compact, it's also now easier to spot what it is missing in each architecture. While doing so, I think I uncovered a bug in ppc64le and arm64 rules, because they fail to load/store combine in non-trivial functions. Not sure why, I'll open an issue. Change-Id: Ia1572d53c0553d9104f3e52b95e4d1768a8440a3 Reviewed-on: https://go-review.googlesource.com/98441 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-03-04 16:52:03 +00:00
Giovanni Bajo	8ce74b7d11	test: port a nil-check interface test from asm_test Change-Id: I69c1688506d1aeca655047acf35d1bff966fc01e Reviewed-on: https://go-review.googlesource.com/98442 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-03-03 20:20:54 +00:00
Chad Rosier	39fefa0709	cmd/compile/internal/ssa: combine consecutive BigEndian stores on arm64 This optimization mirrors that which is already implemented for AMD64. The optimization specifically targets the binary.BigEndian.PutUint* functions. encoding-binary results on Amberwing: name old time/op new time/op delta ReadSlice1000Int32s 9.83µs ± 2% 9.78µs ± 1% ~ (p=0.362 n=9+10) ReadStruct 5.24µs ± 3% 5.19µs ± 2% ~ (p=0.285 n=10+10) ReadInts 8.35µs ± 8% 8.44µs ± 3% ~ (p=0.323 n=10+10) WriteInts 3.38µs ± 3% 3.44µs ±15% ~ (p=0.921 n=9+10) WriteSlice1000Int32s 11.4µs ± 6% 10.2µs ± 4% -9.94% (p=0.000 n=10+10) PutUint16 510ns ±12% 500ns ± 0% ~ (p=0.586 n=10+7) PutUint32 530ns ±15% 490ns ±12% ~ (p=0.086 n=10+10) PutUint64 550ns ± 0% 470ns ± 6% -14.52% (p=0.000 n=7+10) LittleEndianPutUint16 500ns ± 0% 475ns ±16% ~ (p=0.120 n=7+10) LittleEndianPutUint32 450ns ± 0% 517ns ±16% +14.81% (p=0.004 n=8+9) LittleEndianPutUint64 550ns ± 0% 485ns ±13% -11.82% (p=0.000 n=8+10) PutUvarint32 685ns ±12% 622ns ± 4% -9.17% (p=0.005 n=10+9) PutUvarint64 735ns ± 9% 711ns ± 9% ~ (p=0.272 n=10+9) [Geo mean] 1.47µs 1.42µs -3.87% name old speed new speed delta ReadSlice1000Int32s 407MB/s ± 2% 409MB/s ± 1% ~ (p=0.362 n=9+10) ReadStruct 14.3MB/s ± 3% 14.4MB/s ± 2% ~ (p=0.250 n=10+10) ReadInts 3.59MB/s ± 7% 3.56MB/s ± 4% ~ (p=0.340 n=10+10) WriteInts 8.87MB/s ± 3% 8.74MB/s ±13% ~ (p=0.890 n=9+10) WriteSlice1000Int32s 352MB/s ± 6% 391MB/s ± 4% +11.03% (p=0.000 n=10+10) PutUint16 3.95MB/s ±13% 4.00MB/s ± 0% ~ (p=0.312 n=10+7) PutUint32 7.62MB/s ±17% 8.21MB/s ±11% ~ (p=0.086 n=10+10) PutUint64 14.6MB/s ± 0% 17.1MB/s ± 6% +17.28% (p=0.000 n=7+10) LittleEndianPutUint16 4.00MB/s ± 0% 4.23MB/s ±18% ~ (p=0.176 n=7+10) LittleEndianPutUint32 8.89MB/s ± 0% 7.64MB/s ±20% -14.05% (p=0.001 n=8+10) LittleEndianPutUint64 14.6MB/s ± 0% 16.6MB/s ±12% +13.86% (p=0.000 n=8+10) PutUvarint32 5.86MB/s ±14% 6.44MB/s ± 5% +9.84% (p=0.006 n=10+9) PutUvarint64 10.9MB/s ± 8% 11.3MB/s ± 9% ~ (p=0.373 n=10+9) [Geo mean] 14.2MB/s 14.8MB/s +3.93% go1 results on Amberwing: RegexpMatchEasy0_32 254ns ± 0% 254ns ± 0% ~ (all equal) RegexpMatchEasy0_1K 547ns ± 0% 547ns ± 0% ~ (all equal) RegexpMatchEasy1_32 252ns ± 0% 253ns ± 1% ~ (p=0.294 n=8+10) RegexpMatchEasy1_1K 782ns ± 0% 783ns ± 1% ~ (p=0.529 n=8+9) RegexpMatchMedium_32 316ns ± 0% 316ns ± 0% ~ (all equal) RegexpMatchMedium_1K 51.5µs ± 0% 51.5µs ± 0% ~ (p=0.645 n=10+9) RegexpMatchHard_32 2.75µs ± 0% 2.75µs ± 0% ~ (all equal) RegexpMatchHard_1K 78.7µs ± 0% 78.7µs ± 0% ~ (p=0.754 n=10+10) FmtFprintfEmpty 57.0ns ± 0% 57.0ns ± 0% ~ (all equal) FmtFprintfString 111ns ± 0% 111ns ± 0% ~ (all equal) FmtFprintfInt 114ns ± 0% 114ns ± 1% ~ (p=0.065 n=9+10) FmtFprintfIntInt 182ns ± 0% 178ns ± 0% -2.20% (p=0.000 n=10+10) FmtFprintfPrefixedInt 225ns ± 0% 227ns ± 0% +0.89% (p=0.000 n=10+10) FmtFprintfFloat 307ns ± 0% 307ns ± 0% ~ (p=1.000 n=9+9) FmtManyArgs 697ns ± 0% 701ns ± 2% ~ (p=0.108 n=9+10) Gzip 436ms ± 0% 437ms ± 0% +0.23% (p=0.000 n=10+8) HTTPClientServer 88.8µs ± 2% 89.6µs ± 1% +0.98% (p=0.019 n=10+10) JSONEncode 20.1ms ± 1% 20.2ms ± 1% +0.48% (p=0.007 n=10+10) JSONDecode 94.7ms ± 1% 94.1ms ± 0% -0.62% (p=0.000 n=10+9) GobDecode 12.6ms ± 2% 12.6ms ± 1% ~ (p=0.360 n=10+8) GobEncode 12.0ms ± 1% 11.9ms ± 1% -1.34% (p=0.000 n=10+10) Mandelbrot200 5.05ms ± 0% 5.05ms ± 0% +0.12% (p=0.000 n=10+10) TimeParse 448ns ± 0% 448ns ± 0% ~ (p=0.529 n=8+9) TimeFormat 501ns ± 1% 501ns ± 1% ~ (p=1.000 n=10+9) Template 90.6ms ± 0% 89.1ms ± 0% -1.67% (p=0.000 n=9+9) GoParse 6.01ms ± 0% 5.96ms ± 0% -0.83% (p=0.000 n=10+9) BinaryTree17 11.7s ± 0% 11.7s ± 0% ~ (p=0.481 n=10+10) Revcomp 675ms ± 0% 675ms ± 0% ~ (p=0.436 n=9+9) Fannkuch11 3.26s ± 0% 3.27s ± 1% +0.57% (p=0.000 n=10+10) [Geo mean] 67.4µs 67.3µs -0.10% name old speed new speed delta RegexpMatchEasy0_32 126MB/s ± 0% 126MB/s ± 0% ~ (p=0.353 n=10+7) RegexpMatchEasy0_1K 1.87GB/s ± 0% 1.87GB/s ± 0% ~ (p=0.275 n=8+10) RegexpMatchEasy1_32 127MB/s ± 0% 126MB/s ± 1% ~ (p=0.110 n=8+10) RegexpMatchEasy1_1K 1.31GB/s ± 0% 1.31GB/s ± 1% ~ (p=0.079 n=8+10) RegexpMatchMedium_32 3.16MB/s ± 0% 3.16MB/s ± 0% ~ (all equal) RegexpMatchMedium_1K 19.9MB/s ± 0% 19.9MB/s ± 0% ~ (p=0.889 n=10+9) RegexpMatchHard_32 11.7MB/s ± 0% 11.7MB/s ± 0% ~ (all equal) RegexpMatchHard_1K 13.0MB/s ± 0% 13.0MB/s ± 0% ~ (p=1.000 n=10+10) Gzip 44.5MB/s ± 0% 44.4MB/s ± 0% -0.22% (p=0.000 n=10+8) JSONEncode 96.6MB/s ± 1% 96.1MB/s ± 1% -0.48% (p=0.007 n=10+10) JSONDecode 20.5MB/s ± 1% 20.6MB/s ± 0% +0.63% (p=0.000 n=10+9) GobDecode 61.0MB/s ± 2% 61.1MB/s ± 1% ~ (p=0.372 n=10+8) GobEncode 63.8MB/s ± 1% 64.7MB/s ± 1% +1.36% (p=0.000 n=10+10) Template 21.4MB/s ± 0% 21.8MB/s ± 0% +1.69% (p=0.000 n=9+9) GoParse 9.63MB/s ± 0% 9.71MB/s ± 0% +0.84% (p=0.000 n=9+8) Revcomp 377MB/s ± 0% 376MB/s ± 0% ~ (p=0.399 n=9+9) [Geo mean] 56.2MB/s 56.3MB/s +0.20% Change-Id: Ic915373f5ef512f9fbc45745860e5db7f6de6286 Reviewed-on: https://go-review.googlesource.com/97755 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-03-01 20:29:22 +00:00
Chad Rosier	77ba071ec6	cmd/compile/internal/ssa: combine consecutive LittleEndian stores on arm64 This optimization mirrors that which is already implemented for AMD64. The optimization specifically targets the binary.LittleEndian.PutUint* functions. encoding/binary results on Amberwing: name old time/op new time/op delta ReadSlice1000Int32s 9.67µs ± 1% 9.64µs ± 1% ~ (p=0.185 n=9+9) ReadStruct 5.24µs ± 2% 5.36µs ± 2% +2.24% (p=0.002 n=10+8) ReadInts 8.69µs ± 5% 8.88µs ± 5% ~ (p=0.083 n=10+10) WriteInts 3.90µs ±10% 3.71µs ± 9% ~ (p=0.077 n=10+10) WriteSlice1000Int32s 10.9µs ± 1% 10.9µs ± 1% ~ (p=0.701 n=9+9) PutUint16 572ns ±14% 505ns ±11% -11.75% (p=0.006 n=9+10) PutUint32 550ns ±18% 540ns ±11% ~ (p=0.692 n=10+10) PutUint64 565ns ±15% 540ns ±17% ~ (p=0.248 n=10+10) LittleEndianPutUint16 540ns ±11% 500ns ±10% ~ (p=0.094 n=10+10) LittleEndianPutUint32 520ns ±15% 480ns ±15% ~ (p=0.087 n=10+10) LittleEndianPutUint64 505ns ±29% 470ns ±17% ~ (p=0.208 n=10+10) PutUvarint32 700ns ±21% 635ns ±10% -9.29% (p=0.028 n=10+10) PutUvarint64 740ns ± 8% 740ns ± 8% ~ (p=0.713 n=10+10) [Geo mean] 1.53µs 1.47µs -3.93% name old speed new speed delta ReadSlice1000Int32s 414MB/s ± 1% 415MB/s ± 1% ~ (p=0.185 n=9+9) ReadStruct 14.3MB/s ± 2% 14.0MB/s ± 2% -2.21% (p=0.000 n=10+8) ReadInts 3.45MB/s ± 4% 3.38MB/s ± 6% ~ (p=0.085 n=10+10) WriteInts 7.71MB/s ± 9% 8.09MB/s ± 8% +4.93% (p=0.048 n=10+10) WriteSlice1000Int32s 367MB/s ± 1% 366MB/s ± 1% ~ (p=0.701 n=9+9) PutUint16 3.51MB/s ±14% 3.99MB/s ±11% +13.47% (p=0.009 n=9+10) PutUint32 7.35MB/s ±21% 7.44MB/s ±10% ~ (p=0.692 n=10+10) PutUint64 14.3MB/s ±14% 15.0MB/s ±19% ~ (p=0.248 n=10+10) LittleEndianPutUint16 3.72MB/s ±11% 4.03MB/s ±10% ~ (p=0.094 n=10+10) LittleEndianPutUint32 7.75MB/s ±15% 8.39MB/s ±13% ~ (p=0.087 n=10+10) LittleEndianPutUint64 16.1MB/s ±23% 17.2MB/s ±16% ~ (p=0.208 n=10+10) PutUvarint32 5.76MB/s ±18% 6.32MB/s ±10% +9.72% (p=0.028 n=10+10) PutUvarint64 10.8MB/s ± 8% 10.8MB/s ± 8% ~ (p=0.713 n=10+10) [Geo mean] 13.7MB/s 14.3MB/s +4.02% go1 results on Amberwing: name old time/op new time/op delta RegexpMatchEasy0_32 249ns ± 0% 249ns ± 0% ~ (p=0.087 n=10+10) RegexpMatchEasy0_1K 584ns ± 0% 584ns ± 0% ~ (all equal) RegexpMatchEasy1_32 246ns ± 0% 246ns ± 0% ~ (p=1.000 n=10+10) RegexpMatchEasy1_1K 806ns ± 0% 806ns ± 0% ~ (p=0.706 n=10+9) RegexpMatchMedium_32 314ns ± 0% 314ns ± 0% ~ (all equal) RegexpMatchMedium_1K 52.1µs ± 0% 52.1µs ± 0% ~ (p=0.245 n=10+8) RegexpMatchHard_32 2.75µs ± 1% 2.75µs ± 1% ~ (p=0.690 n=10+10) RegexpMatchHard_1K 78.9µs ± 0% 78.9µs ± 1% ~ (p=0.295 n=9+9) FmtFprintfEmpty 58.5ns ± 0% 58.5ns ± 0% ~ (all equal) FmtFprintfString 112ns ± 0% 112ns ± 0% ~ (all equal) FmtFprintfInt 117ns ± 0% 116ns ± 0% -0.85% (p=0.000 n=10+10) FmtFprintfIntInt 181ns ± 0% 181ns ± 0% ~ (all equal) FmtFprintfPrefixedInt 222ns ± 0% 224ns ± 0% +0.90% (p=0.000 n=9+10) FmtFprintfFloat 318ns ± 1% 322ns ± 0% ~ (p=0.059 n=10+8) FmtManyArgs 736ns ± 1% 735ns ± 0% ~ (p=0.206 n=9+9) Gzip 437ms ± 0% 436ms ± 0% -0.25% (p=0.000 n=10+10) HTTPClientServer 89.8µs ± 1% 90.2µs ± 2% ~ (p=0.393 n=10+10) JSONEncode 20.1ms ± 1% 20.2ms ± 1% ~ (p=0.065 n=9+10) JSONDecode 94.2ms ± 1% 93.9ms ± 1% -0.42% (p=0.043 n=10+10) GobDecode 12.7ms ± 1% 12.8ms ± 2% +0.94% (p=0.019 n=10+10) GobEncode 12.1ms ± 0% 12.1ms ± 0% ~ (p=0.052 n=10+10) Mandelbrot200 5.06ms ± 0% 5.05ms ± 0% -0.04% (p=0.000 n=9+10) TimeParse 450ns ± 3% 446ns ± 0% ~ (p=0.238 n=10+9) TimeFormat 485ns ± 1% 483ns ± 1% ~ (p=0.073 n=10+10) Template 90.4ms ± 0% 90.7ms ± 0% +0.29% (p=0.000 n=8+10) GoParse 6.01ms ± 0% 6.03ms ± 0% +0.35% (p=0.000 n=10+10) BinaryTree17 11.7s ± 0% 11.7s ± 0% ~ (p=0.481 n=10+10) Revcomp 669ms ± 0% 669ms ± 0% ~ (p=0.315 n=10+10) Fannkuch11 3.40s ± 0% 3.37s ± 0% -0.92% (p=0.000 n=10+10) [Geo mean] 67.9µs 67.9µs +0.02% name old speed new speed delta RegexpMatchEasy0_32 128MB/s ± 0% 128MB/s ± 0% -0.08% (p=0.003 n=8+10) RegexpMatchEasy0_1K 1.75GB/s ± 0% 1.75GB/s ± 0% ~ (p=0.642 n=8+10) RegexpMatchEasy1_32 130MB/s ± 0% 130MB/s ± 0% ~ (p=0.690 n=10+9) RegexpMatchEasy1_1K 1.27GB/s ± 0% 1.27GB/s ± 0% ~ (p=0.661 n=10+9) RegexpMatchMedium_32 3.18MB/s ± 0% 3.18MB/s ± 0% ~ (all equal) RegexpMatchMedium_1K 19.7MB/s ± 0% 19.6MB/s ± 0% ~ (p=0.190 n=10+9) RegexpMatchHard_32 11.6MB/s ± 0% 11.6MB/s ± 1% ~ (p=0.669 n=10+10) RegexpMatchHard_1K 13.0MB/s ± 0% 13.0MB/s ± 0% ~ (p=0.718 n=9+9) Gzip 44.4MB/s ± 0% 44.5MB/s ± 0% +0.24% (p=0.000 n=10+10) JSONEncode 96.5MB/s ± 1% 96.1MB/s ± 1% ~ (p=0.065 n=9+10) JSONDecode 20.6MB/s ± 1% 20.7MB/s ± 1% +0.42% (p=0.041 n=10+10) GobDecode 60.6MB/s ± 1% 60.0MB/s ± 2% -0.92% (p=0.016 n=10+10) GobEncode 63.4MB/s ± 0% 63.6MB/s ± 0% ~ (p=0.055 n=10+10) Template 21.5MB/s ± 0% 21.4MB/s ± 0% -0.30% (p=0.000 n=9+10) GoParse 9.64MB/s ± 0% 9.61MB/s ± 0% -0.36% (p=0.000 n=10+10) Revcomp 380MB/s ± 0% 380MB/s ± 0% ~ (p=0.323 n=10+10) [Geo mean] 56.0MB/s 55.9MB/s -0.07% Change-Id: I79a4978d42d01a5f72ed5ceec07f5e78ac6b3859 Reviewed-on: https://go-review.googlesource.com/97175 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-03-01 16:40:19 +00:00
Ben Shi	1057624985	cmd/compile: optimize ARM64 code with EON/ORN EON and ORN are efficient ARM64 instructions. EON combines (x ^ ^y) into a single operation, and so ORN does for (x \| ^y). This CL implements that optimization. And here are benchmark results with RaspberryPi3/ArchLinux. 1. A specific test gets about 13% improvement. EONORN 181µs ± 0% 157µs ± 0% -13.26% (p=0.000 n=26+23) (https://github.com/benshi001/ugo1/blob/master/eonorn_test.go) 2. There is little change in the go1 benchmark, excluding noise. name old time/op new time/op delta BinaryTree17-4 44.1s ± 2% 44.0s ± 2% ~ (p=0.513 n=30+30) Fannkuch11-4 32.9s ± 3% 32.8s ± 3% -0.12% (p=0.024 n=30+30) FmtFprintfEmpty-4 561ns ± 9% 558ns ± 9% ~ (p=0.654 n=30+30) FmtFprintfString-4 1.09µs ± 4% 1.09µs ± 3% ~ (p=0.158 n=30+30) FmtFprintfInt-4 1.12µs ± 0% 1.12µs ± 0% ~ (p=0.917 n=23+28) FmtFprintfIntInt-4 1.73µs ± 0% 1.76µs ± 4% ~ (p=0.665 n=23+30) FmtFprintfPrefixedInt-4 2.15µs ± 1% 2.15µs ± 0% ~ (p=0.389 n=27+26) FmtFprintfFloat-4 3.18µs ± 4% 3.13µs ± 0% -1.50% (p=0.003 n=30+23) FmtManyArgs-4 7.32µs ± 4% 7.21µs ± 0% ~ (p=0.220 n=30+25) GobDecode-4 99.1ms ± 9% 97.0ms ± 0% -2.07% (p=0.000 n=30+23) GobEncode-4 83.3ms ± 3% 82.4ms ± 4% ~ (p=0.321 n=30+30) Gzip-4 4.39s ± 4% 4.32s ± 2% -1.42% (p=0.017 n=30+23) Gunzip-4 440ms ± 0% 447ms ± 4% +1.54% (p=0.006 n=24+30) HTTPClientServer-4 547µs ± 1% 537µs ± 1% -1.91% (p=0.000 n=30+30) JSONEncode-4 211ms ± 0% 211ms ± 0% +0.04% (p=0.000 n=23+24) JSONDecode-4 847ms ± 0% 847ms ± 0% ~ (p=0.158 n=25+25) Mandelbrot200-4 46.5ms ± 0% 46.5ms ± 0% -0.04% (p=0.000 n=25+24) GoParse-4 43.4ms ± 0% 43.4ms ± 0% ~ (p=0.494 n=24+25) RegexpMatchEasy0_32-4 1.03µs ± 0% 1.03µs ± 0% ~ (all equal) RegexpMatchEasy0_1K-4 4.02µs ± 3% 3.98µs ± 0% -0.95% (p=0.003 n=30+24) RegexpMatchEasy1_32-4 1.01µs ± 3% 1.01µs ± 2% ~ (p=0.629 n=30+30) RegexpMatchEasy1_1K-4 6.39µs ± 0% 6.39µs ± 0% ~ (p=0.564 n=24+23) RegexpMatchMedium_32-4 1.80µs ± 3% 1.78µs ± 0% ~ (p=0.155 n=30+24) RegexpMatchMedium_1K-4 555µs ± 0% 563µs ± 3% +1.55% (p=0.004 n=27+30) RegexpMatchHard_32-4 31.0µs ± 4% 30.5µs ± 1% -1.58% (p=0.000 n=30+23) RegexpMatchHard_1K-4 947µs ± 4% 931µs ± 0% -1.66% (p=0.009 n=30+24) Revcomp-4 7.71s ± 4% 7.71s ± 4% ~ (p=0.196 n=29+30) Template-4 877ms ± 0% 878ms ± 0% +0.16% (p=0.018 n=23+27) TimeParse-4 4.75µs ± 1% 4.74µs ± 0% ~ (p=0.895 n=24+23) TimeFormat-4 4.83µs ± 4% 4.83µs ± 4% ~ (p=0.767 n=30+30) [Geo mean] 709µs 707µs -0.35% name old speed new speed delta GobDecode-4 7.75MB/s ± 8% 7.91MB/s ± 0% +2.03% (p=0.001 n=30+23) GobEncode-4 9.22MB/s ± 3% 9.32MB/s ± 4% ~ (p=0.389 n=30+30) Gzip-4 4.43MB/s ± 4% 4.43MB/s ± 4% ~ (p=0.888 n=30+30) Gunzip-4 44.1MB/s ± 0% 43.4MB/s ± 4% -1.46% (p=0.009 n=24+30) JSONEncode-4 9.18MB/s ± 0% 9.18MB/s ± 0% ~ (p=0.308 n=16+24) JSONDecode-4 2.29MB/s ± 0% 2.29MB/s ± 0% ~ (all equal) GoParse-4 1.33MB/s ± 0% 1.33MB/s ± 0% ~ (all equal) RegexpMatchEasy0_32-4 30.9MB/s ± 0% 30.9MB/s ± 0% ~ (p=1.000 n=23+24) RegexpMatchEasy0_1K-4 255MB/s ± 3% 257MB/s ± 0% +0.92% (p=0.004 n=30+24) RegexpMatchEasy1_32-4 31.7MB/s ± 3% 31.6MB/s ± 2% ~ (p=0.603 n=30+30) RegexpMatchEasy1_1K-4 160MB/s ± 0% 160MB/s ± 0% ~ (p=0.435 n=24+23) RegexpMatchMedium_32-4 554kB/s ± 3% 560kB/s ± 0% +1.08% (p=0.004 n=30+24) RegexpMatchMedium_1K-4 1.85MB/s ± 0% 1.82MB/s ± 3% -1.48% (p=0.001 n=27+30) RegexpMatchHard_32-4 1.03MB/s ± 4% 1.05MB/s ± 1% +1.51% (p=0.027 n=30+23) RegexpMatchHard_1K-4 1.08MB/s ± 4% 1.10MB/s ± 0% +1.69% (p=0.002 n=30+25) Revcomp-4 33.0MB/s ± 4% 33.0MB/s ± 4% ~ (p=0.272 n=29+30) Template-4 2.21MB/s ± 0% 2.21MB/s ± 0% ~ (all equal) [Geo mean] 7.75MB/s 7.77MB/s +0.29% 3. There is little regression in the compilecmp benchmark. name old time/op new time/op delta Template 2.28s ± 3% 2.28s ± 4% ~ (p=0.739 n=10+10) Unicode 1.34s ± 4% 1.32s ± 3% ~ (p=0.113 n=10+9) GoTypes 8.10s ± 3% 8.18s ± 3% ~ (p=0.393 n=10+10) Compiler 39.0s ± 3% 39.2s ± 3% ~ (p=0.393 n=10+10) SSA 114s ± 3% 115s ± 2% ~ (p=0.631 n=10+10) Flate 1.41s ± 2% 1.42s ± 3% ~ (p=0.353 n=10+10) GoParser 1.81s ± 1% 1.83s ± 2% ~ (p=0.211 n=10+9) Reflect 5.06s ± 2% 5.06s ± 2% ~ (p=0.912 n=10+10) Tar 2.19s ± 3% 2.20s ± 3% ~ (p=0.247 n=10+10) XML 2.65s ± 2% 2.67s ± 5% ~ (p=0.796 n=10+10) [Geo mean] 4.92s 4.93s +0.27% name old user-time/op new user-time/op delta Template 2.81s ± 2% 2.81s ± 3% ~ (p=0.971 n=10+10) Unicode 1.70s ± 3% 1.67s ± 5% ~ (p=0.315 n=10+10) GoTypes 9.71s ± 1% 9.78s ± 1% +0.71% (p=0.023 n=10+10) Compiler 47.3s ± 1% 47.1s ± 3% ~ (p=0.579 n=10+10) SSA 143s ± 2% 143s ± 2% ~ (p=0.280 n=10+10) Flate 1.70s ± 3% 1.71s ± 3% ~ (p=0.481 n=10+10) GoParser 2.21s ± 3% 2.21s ± 1% ~ (p=0.549 n=10+9) Reflect 5.89s ± 1% 5.87s ± 2% ~ (p=0.739 n=10+10) Tar 2.66s ± 2% 2.63s ± 2% ~ (p=0.105 n=10+10) XML 3.16s ± 3% 3.18s ± 2% ~ (p=0.143 n=10+10) [Geo mean] 5.97s 5.97s -0.06% name old text-bytes new text-bytes delta HelloSize 637kB ± 0% 637kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 9.46kB ± 0% 9.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 125kB ± 0% 125kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.24MB ± 0% 1.24MB ± 0% ~ (all equal) Change-Id: Ie27357d65c5ce9d07afdffebe1e2daadcaa3369f Reviewed-on: https://go-review.googlesource.com/97036 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-02-28 23:42:40 +00:00
Balaram Makam	094258408d	cmd/compile: improve fractional word zeroing This change improves fractional word zeroing by using overlapping MOVDs for the fractions. Performance of go1 benchmarks on Amberwing was all noise: name old time/op new time/op delta RegexpMatchEasy0_32 247ns ± 0% 246ns ± 0% -0.40% (p=0.008 n=5+5) RegexpMatchEasy0_1K 581ns ± 0% 579ns ± 0% -0.34% (p=0.000 n=5+4) RegexpMatchEasy1_32 244ns ± 0% 242ns ± 0% ~ (p=0.079 n=4+5) RegexpMatchEasy1_1K 804ns ± 0% 805ns ± 0% ~ (p=0.238 n=5+4) RegexpMatchMedium_32 313ns ± 0% 311ns ± 0% -0.64% (p=0.008 n=5+5) RegexpMatchMedium_1K 52.2µs ± 0% 51.9µs ± 0% -0.52% (p=0.016 n=5+4) RegexpMatchHard_32 2.75µs ± 0% 2.74µs ± 0% ~ (p=0.603 n=5+5) RegexpMatchHard_1K 78.8µs ± 0% 78.9µs ± 0% +0.05% (p=0.008 n=5+5) FmtFprintfEmpty 58.6ns ± 0% 58.6ns ± 0% ~ (p=0.159 n=5+5) FmtFprintfString 118ns ± 0% 119ns ± 0% +0.85% (p=0.008 n=5+5) FmtFprintfInt 119ns ± 0% 123ns ± 0% +3.36% (p=0.016 n=5+4) FmtFprintfIntInt 192ns ± 0% 200ns ± 0% +4.17% (p=0.008 n=5+5) FmtFprintfPrefixedInt 224ns ± 0% 209ns ± 0% -6.70% (p=0.008 n=5+5) FmtFprintfFloat 335ns ± 0% 335ns ± 0% ~ (all equal) FmtManyArgs 775ns ± 0% 811ns ± 1% +4.67% (p=0.016 n=4+5) Gzip 437ms ± 0% 438ms ± 0% +0.19% (p=0.008 n=5+5) HTTPClientServer 88.7µs ± 1% 90.3µs ± 1% +1.75% (p=0.016 n=5+5) JSONEncode 20.1ms ± 1% 20.1ms ± 0% ~ (p=1.000 n=5+5) JSONDecode 94.7ms ± 1% 94.8ms ± 1% ~ (p=0.548 n=5+5) GobDecode 12.8ms ± 1% 12.8ms ± 1% ~ (p=0.548 n=5+5) GobEncode 12.1ms ± 0% 12.1ms ± 0% ~ (p=0.151 n=5+5) Mandelbrot200 5.37ms ± 0% 5.37ms ± 0% -0.03% (p=0.008 n=5+5) TimeParse 450ns ± 0% 451ns ± 1% ~ (p=0.635 n=4+5) TimeFormat 485ns ± 0% 484ns ± 0% ~ (p=0.508 n=5+5) Template 90.4ms ± 0% 90.2ms ± 0% -0.24% (p=0.016 n=5+5) GoParse 5.98ms ± 0% 5.98ms ± 0% ~ (p=1.000 n=5+5) BinaryTree17 11.8s ± 0% 11.8s ± 0% ~ (p=0.841 n=5+5) Revcomp 669ms ± 0% 669ms ± 0% ~ (p=0.310 n=5+5) Fannkuch11 3.28s ± 0% 3.34s ± 0% +1.64% (p=0.008 n=5+5) name old speed new speed delta RegexpMatchEasy0_32 129MB/s ± 0% 130MB/s ± 0% +0.30% (p=0.016 n=4+5) RegexpMatchEasy0_1K 1.76GB/s ± 0% 1.77GB/s ± 0% +0.27% (p=0.016 n=5+4) RegexpMatchEasy1_32 131MB/s ± 0% 132MB/s ± 0% +0.71% (p=0.016 n=4+5) RegexpMatchEasy1_1K 1.27GB/s ± 0% 1.27GB/s ± 0% -0.17% (p=0.016 n=5+4) RegexpMatchMedium_32 3.19MB/s ± 0% 3.21MB/s ± 0% +0.63% (p=0.008 n=5+5) RegexpMatchMedium_1K 19.6MB/s ± 0% 19.7MB/s ± 0% +0.52% (p=0.016 n=5+4) RegexpMatchHard_32 11.7MB/s ± 0% 11.7MB/s ± 0% ~ (p=0.643 n=5+5) RegexpMatchHard_1K 13.0MB/s ± 0% 13.0MB/s ± 0% ~ (p=0.079 n=4+5) Gzip 44.4MB/s ± 0% 44.3MB/s ± 0% -0.19% (p=0.008 n=5+5) JSONEncode 96.3MB/s ± 1% 96.4MB/s ± 0% ~ (p=1.000 n=5+5) JSONDecode 20.5MB/s ± 1% 20.5MB/s ± 1% ~ (p=0.460 n=5+5) GobDecode 60.1MB/s ± 1% 59.9MB/s ± 1% ~ (p=0.548 n=5+5) GobEncode 63.5MB/s ± 0% 63.7MB/s ± 0% ~ (p=0.135 n=5+5) Template 21.5MB/s ± 0% 21.5MB/s ± 0% +0.24% (p=0.016 n=5+5) GoParse 9.68MB/s ± 0% 9.69MB/s ± 0% ~ (p=0.786 n=5+5) Revcomp 380MB/s ± 0% 380MB/s ± 0% ~ (p=0.310 n=5+5) Change-Id: I596eee6421cdbad1a0189cdb9fe0628bba534eaf Reviewed-on: https://go-review.googlesource.com/96775 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-02-28 23:28:39 +00:00
Ilya Tocar	0f2ef0ad44	cmd/compile/internal/ssa: combine byte stores on amd64 On amd64 we optimize encoding/binary.BigEndian.PutUint{16,32,64} into bswap + single store, but strangely enough not LittleEndian.PutUint{16,32}. We have similar rules, but they use 64-bit shifts everywhere, and fail for 16/32-bit case. Add rules that matchLittleEndian.PutUint, and relevant tests. Performance results: LittleEndianPutUint16-6 1.43ns ± 0% 1.07ns ± 0% -25.17% (p=0.000 n=9+9) LittleEndianPutUint32-6 2.14ns ± 0% 0.94ns ± 0% -56.07% (p=0.019 n=6+8) LittleEndianPutUint16-6 1.40GB/s ± 0% 1.87GB/s ± 0% +33.24% (p=0.000 n=9+9) LittleEndianPutUint32-6 1.87GB/s ± 0% 4.26GB/s ± 0% +128.54% (p=0.000 n=8+8) Discovered, while looking at ethereum_ethash from community benchmarks Change-Id: Id86d5443687ecddd2803edf3203dbdd1246f61fe Reviewed-on: https://go-review.googlesource.com/95475 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-02-27 19:38:50 +00:00
Chad Rosier	ecd9e8a2fe	cmd/compile/internal/ssa: combine zero stores into larger stores on arm64 This reduces the go tool binary on arm64 by 12k. go1 results on Amberwing: name old time/op new time/op delta RegexpMatchEasy0_32 249ns ± 0% 249ns ± 0% ~ (p=0.087 n=10+10) RegexpMatchEasy0_1K 584ns ± 0% 584ns ± 0% ~ (all equal) RegexpMatchEasy1_32 246ns ± 0% 246ns ± 0% ~ (p=1.000 n=10+10) RegexpMatchEasy1_1K 806ns ± 0% 806ns ± 0% ~ (p=0.706 n=10+9) RegexpMatchMedium_32 314ns ± 0% 314ns ± 0% ~ (all equal) RegexpMatchMedium_1K 52.1µs ± 0% 52.1µs ± 0% ~ (p=0.245 n=10+8) RegexpMatchHard_32 2.75µs ± 1% 2.75µs ± 1% ~ (p=0.690 n=10+10) RegexpMatchHard_1K 78.9µs ± 0% 78.9µs ± 1% ~ (p=0.295 n=9+9) FmtFprintfEmpty 58.5ns ± 0% 58.5ns ± 0% ~ (all equal) FmtFprintfString 112ns ± 0% 112ns ± 0% ~ (all equal) FmtFprintfInt 117ns ± 0% 116ns ± 0% -0.85% (p=0.000 n=10+10) FmtFprintfIntInt 181ns ± 0% 181ns ± 0% ~ (all equal) FmtFprintfPrefixedInt 222ns ± 0% 224ns ± 0% +0.90% (p=0.000 n=9+10) FmtFprintfFloat 318ns ± 1% 322ns ± 0% ~ (p=0.059 n=10+8) FmtManyArgs 736ns ± 1% 735ns ± 0% ~ (p=0.206 n=9+9) Gzip 437ms ± 0% 436ms ± 0% -0.25% (p=0.000 n=10+10) HTTPClientServer 89.8µs ± 1% 90.2µs ± 2% ~ (p=0.393 n=10+10) JSONEncode 20.1ms ± 1% 20.2ms ± 1% ~ (p=0.065 n=9+10) JSONDecode 94.2ms ± 1% 93.9ms ± 1% -0.42% (p=0.043 n=10+10) GobDecode 12.7ms ± 1% 12.8ms ± 2% +0.94% (p=0.019 n=10+10) GobEncode 12.1ms ± 0% 12.1ms ± 0% ~ (p=0.052 n=10+10) Mandelbrot200 5.06ms ± 0% 5.05ms ± 0% -0.04% (p=0.000 n=9+10) TimeParse 450ns ± 3% 446ns ± 0% ~ (p=0.238 n=10+9) TimeFormat 485ns ± 1% 483ns ± 1% ~ (p=0.073 n=10+10) Template 90.4ms ± 0% 90.7ms ± 0% +0.29% (p=0.000 n=8+10) GoParse 6.01ms ± 0% 6.03ms ± 0% +0.35% (p=0.000 n=10+10) BinaryTree17 11.7s ± 0% 11.7s ± 0% ~ (p=0.481 n=10+10) Revcomp 669ms ± 0% 669ms ± 0% ~ (p=0.315 n=10+10) Fannkuch11 3.40s ± 0% 3.37s ± 0% -0.92% (p=0.000 n=10+10) [Geo mean] 67.9µs 67.9µs +0.02% name old speed new speed delta RegexpMatchEasy0_32 128MB/s ± 0% 128MB/s ± 0% -0.08% (p=0.003 n=8+10) RegexpMatchEasy0_1K 1.75GB/s ± 0% 1.75GB/s ± 0% ~ (p=0.642 n=8+10) RegexpMatchEasy1_32 130MB/s ± 0% 130MB/s ± 0% ~ (p=0.690 n=10+9) RegexpMatchEasy1_1K 1.27GB/s ± 0% 1.27GB/s ± 0% ~ (p=0.661 n=10+9) RegexpMatchMedium_32 3.18MB/s ± 0% 3.18MB/s ± 0% ~ (all equal) RegexpMatchMedium_1K 19.7MB/s ± 0% 19.6MB/s ± 0% ~ (p=0.190 n=10+9) RegexpMatchHard_32 11.6MB/s ± 0% 11.6MB/s ± 1% ~ (p=0.669 n=10+10) RegexpMatchHard_1K 13.0MB/s ± 0% 13.0MB/s ± 0% ~ (p=0.718 n=9+9) Gzip 44.4MB/s ± 0% 44.5MB/s ± 0% +0.24% (p=0.000 n=10+10) JSONEncode 96.5MB/s ± 1% 96.1MB/s ± 1% ~ (p=0.065 n=9+10) JSONDecode 20.6MB/s ± 1% 20.7MB/s ± 1% +0.42% (p=0.041 n=10+10) GobDecode 60.6MB/s ± 1% 60.0MB/s ± 2% -0.92% (p=0.016 n=10+10) GobEncode 63.4MB/s ± 0% 63.6MB/s ± 0% ~ (p=0.055 n=10+10) Template 21.5MB/s ± 0% 21.4MB/s ± 0% -0.30% (p=0.000 n=9+10) GoParse 9.64MB/s ± 0% 9.61MB/s ± 0% -0.36% (p=0.000 n=10+10) Revcomp 380MB/s ± 0% 380MB/s ± 0% ~ (p=0.323 n=10+10) [Geo mean] 56.0MB/s 55.9MB/s -0.07% Change-Id: Ia732fa57fbcf4767d72382516d9f16705d177736 Reviewed-on: https://go-review.googlesource.com/96435 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-02-27 00:07:25 +00:00
Keith Randall	4b00d3f4a2	cmd/compile: implement comparisons directly with memory Allow the compiler to generate code like CMPQ 16(AX), $7 It's tricky because it's difficult to spill such a comparison during flagalloc, because the same memory state might not be available at the restore locations. Solve this problem by decomposing the compare+load back into its parts if it needs to be spilled. The big win is that the write barrier test goes from: MOVL runtime.writeBarrier(SB), CX TESTL CX, CX JNE 60 to CMPL runtime.writeBarrier(SB), $0 JNE 59 It's one instruction and one byte smaller. Fixes #19485 Fixes #15245 Update #22460 Binaries are about 0.15% smaller. Change-Id: I4fd8d1111b6b9924d52f9a0901ca1b2e5cce0836 Reviewed-on: https://go-review.googlesource.com/86035 Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2018-02-26 23:49:44 +00:00
Alberto Donizetti	37a038a3dc	cmd/compile: add code generation tests for sqrt intrinsics Add "sqrt-intrisified" code generation tests for mips64 and 386, where we weren't intrisifying math.Sqrt (see CL 96615 and CL 95916), and for mips and amd64, which lacked sqrt intrinsics tests. Change-Id: I0cfc08aec6eefd47f3cd7a5995a89393e8b7ed9e Reviewed-on: https://go-review.googlesource.com/96716 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-02-23 16:48:53 +00:00
Giovanni Bajo	0cacc4d0e2	cmd/compile: fold LEAQ/ADDQconst into SETx ops This saves an instruction and a register. The new rules match ~4900 times during all.bash. Change-Id: I2f867c5e70262004e31f545f3bb89e939c45b718 Reviewed-on: https://go-review.googlesource.com/94767 Reviewed-by: Keith Randall <khr@golang.org>	2018-02-20 22:32:35 +00:00
philhofer	2d0172c3a7	cmd/compile/internal/ssa: emit csel on arm64 Introduce a new SSA pass to generate CondSelect intstrutions, and add CondSelect lowering rules for arm64. In order to make the CSEL instruction easier to optimize, and to simplify the introduction of CSNEG, CSINC, and CSINV in the future, modify the CSEL instruction to accept a condition code in the aux field. Notably, this change makes the go1 Gzip benchmark more than 10% faster. Benchmarks on a Cavium ThunderX: name old time/op new time/op delta BinaryTree17-96 15.9s ± 6% 16.0s ± 4% ~ (p=0.968 n=10+9) Fannkuch11-96 7.17s ± 0% 7.00s ± 0% -2.43% (p=0.000 n=8+9) FmtFprintfEmpty-96 208ns ± 1% 207ns ± 0% ~ (p=0.152 n=10+8) FmtFprintfString-96 379ns ± 0% 375ns ± 0% -0.95% (p=0.000 n=10+9) FmtFprintfInt-96 385ns ± 0% 383ns ± 0% -0.52% (p=0.000 n=9+10) FmtFprintfIntInt-96 591ns ± 0% 586ns ± 0% -0.85% (p=0.006 n=7+9) FmtFprintfPrefixedInt-96 656ns ± 0% 667ns ± 0% +1.71% (p=0.000 n=10+10) FmtFprintfFloat-96 967ns ± 0% 984ns ± 0% +1.78% (p=0.000 n=10+10) FmtManyArgs-96 2.35µs ± 0% 2.25µs ± 0% -4.63% (p=0.000 n=9+8) GobDecode-96 31.0ms ± 0% 30.8ms ± 0% -0.36% (p=0.006 n=9+9) GobEncode-96 24.4ms ± 0% 24.5ms ± 0% +0.30% (p=0.000 n=9+9) Gzip-96 1.60s ± 0% 1.43s ± 0% -10.58% (p=0.000 n=9+10) Gunzip-96 167ms ± 0% 169ms ± 0% +0.83% (p=0.000 n=8+9) HTTPClientServer-96 311µs ± 1% 308µs ± 0% -0.75% (p=0.000 n=10+10) JSONEncode-96 65.0ms ± 0% 64.8ms ± 0% -0.25% (p=0.000 n=9+8) JSONDecode-96 262ms ± 1% 261ms ± 1% ~ (p=0.579 n=10+10) Mandelbrot200-96 18.0ms ± 0% 18.1ms ± 0% +0.17% (p=0.000 n=8+10) GoParse-96 14.0ms ± 0% 14.1ms ± 1% +0.42% (p=0.003 n=9+10) RegexpMatchEasy0_32-96 644ns ± 2% 645ns ± 2% ~ (p=0.836 n=10+10) RegexpMatchEasy0_1K-96 3.70µs ± 0% 3.49µs ± 0% -5.58% (p=0.000 n=10+10) RegexpMatchEasy1_32-96 662ns ± 2% 657ns ± 2% ~ (p=0.137 n=10+10) RegexpMatchEasy1_1K-96 4.47µs ± 0% 4.31µs ± 0% -3.48% (p=0.000 n=10+10) RegexpMatchMedium_32-96 844ns ± 2% 849ns ± 1% ~ (p=0.208 n=10+10) RegexpMatchMedium_1K-96 179µs ± 0% 182µs ± 0% +1.20% (p=0.000 n=10+10) RegexpMatchHard_32-96 10.0µs ± 0% 10.1µs ± 0% +0.48% (p=0.000 n=10+9) RegexpMatchHard_1K-96 297µs ± 0% 297µs ± 0% -0.14% (p=0.000 n=10+10) Revcomp-96 3.08s ± 0% 3.13s ± 0% +1.56% (p=0.000 n=9+9) Template-96 276ms ± 2% 275ms ± 1% ~ (p=0.393 n=10+10) TimeParse-96 1.37µs ± 0% 1.36µs ± 0% -0.53% (p=0.000 n=10+7) TimeFormat-96 1.40µs ± 0% 1.42µs ± 0% +0.97% (p=0.000 n=10+10) [Geo mean] 264µs 262µs -0.77% Change-Id: Ie54eee4b3092af53e6da3baa6d1755098f57f3a2 Reviewed-on: https://go-review.googlesource.com/55670 Run-TryBot: Philip Hofer <phofer@umich.edu> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2018-02-20 06:00:54 +00:00
Chad Rosier	07f0f09563	cmd/compile: make math.Ceil/Floor/Round/Trunc intrinsics on arm64 name old time/op new time/op delta Ceil 550ns ± 0% 486ns ± 7% -11.64% (p=0.000 n=13+18) Floor 495ns ±19% 512ns ±12% ~ (p=0.164 n=20+20) Round 550ns ± 0% 487ns ± 8% -11.49% (p=0.000 n=12+19) Trunc 563ns ± 7% 488ns ±13% -13.44% (p=0.000 n=15+2) Change-Id: I53f234b160b3c026a277506e2cf977d150379464 Reviewed-on: https://go-review.googlesource.com/88295 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-02-16 15:37:57 +00:00
Balaram Makam	fcba05148f	cmd/compile: arm64 intrinsics for math/bits.OnesCount This adds math/bits intrinsics for OnesCount on arm64. name old time/op new time/op delta OnesCount 3.81ns ± 0% 1.60ns ± 0% -57.96% (p=0.000 n=7+8) OnesCount8 1.60ns ± 0% 1.60ns ± 0% ~ (all equal) OnesCount16 2.41ns ± 0% 1.60ns ± 0% -33.61% (p=0.000 n=8+8) OnesCount32 4.17ns ± 0% 1.60ns ± 0% -61.58% (p=0.000 n=8+8) OnesCount64 3.80ns ± 0% 1.60ns ± 0% -57.84% (p=0.000 n=8+8) Update #18616 Conflicts: src/cmd/compile/internal/gc/asm_test.go Change-Id: I63ac2f63acafdb1f60656ab8a56be0b326eec5cb Reviewed-on: https://go-review.googlesource.com/90835 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-02-15 23:00:20 +00:00
Chad Rosier	51932c326f	cmd/compile: improve absorb shifts optimization for arm64 Current absorb shifts optimization can generate dead Value nodes which increase use count of other live nodes. It will impact other optimizations (such as combined loads) which are enabled based on specific use count. This patch fixes the issue by decreasing the use count of nodes referenced by dead Value nodes generated by absorb shifts optimization. Performance impacts on go1 benchmarks (data collected on A57@2GHzx8): name old time/op new time/op delta BinaryTree17-8 6.28s ± 2% 6.24s ± 1% ~ (p=0.065 n=10+9) Fannkuch11-8 6.32s ± 0% 6.33s ± 0% +0.17% (p=0.000 n=10+10) FmtFprintfEmpty-8 98.9ns ± 0% 99.2ns ± 0% +0.34% (p=0.000 n=9+7) FmtFprintfString-8 183ns ± 1% 182ns ± 1% -1.01% (p=0.005 n=9+10) FmtFprintfInt-8 199ns ± 1% 202ns ± 1% +1.41% (p=0.000 n=10+9) FmtFprintfIntInt-8 272ns ± 1% 276ns ± 3% +1.36% (p=0.015 n=10+10) FmtFprintfPrefixedInt-8 367ns ± 1% 369ns ± 1% +0.68% (p=0.042 n=10+10) FmtFprintfFloat-8 491ns ± 1% 493ns ± 1% ~ (p=0.064 n=10+10) FmtManyArgs-8 1.31µs ± 1% 1.32µs ± 1% +0.39% (p=0.042 n=8+9) GobDecode-8 17.0ms ± 2% 16.2ms ± 2% -4.74% (p=0.000 n=10+10) GobEncode-8 13.7ms ± 2% 13.4ms ± 1% -2.40% (p=0.000 n=10+9) Gzip-8 844ms ± 0% 737ms ± 0% -12.70% (p=0.000 n=10+10) Gunzip-8 84.4ms ± 1% 83.9ms ± 0% -0.55% (p=0.000 n=10+8) HTTPClientServer-8 122µs ± 1% 124µs ± 1% +1.75% (p=0.000 n=10+9) JSONEncode-8 34.9ms ± 1% 32.4ms ± 0% -7.11% (p=0.000 n=10+9) JSONDecode-8 150ms ± 0% 146ms ± 1% -2.84% (p=0.000 n=7+10) Mandelbrot200-8 10.0ms ± 0% 10.0ms ± 0% ~ (p=0.529 n=10+10) GoParse-8 8.18ms ± 1% 8.03ms ± 0% -1.93% (p=0.000 n=10+10) RegexpMatchEasy0_32-8 209ns ± 0% 209ns ± 0% ~ (p=0.248 n=10+9) RegexpMatchEasy0_1K-8 789ns ± 1% 790ns ± 0% ~ (p=0.361 n=10+10) RegexpMatchEasy1_32-8 202ns ± 0% 202ns ± 1% ~ (p=0.137 n=8+10) RegexpMatchEasy1_1K-8 1.12µs ± 2% 1.12µs ± 1% ~ (p=0.810 n=10+10) RegexpMatchMedium_32-8 298ns ± 0% 298ns ± 0% ~ (p=0.443 n=10+9) RegexpMatchMedium_1K-8 83.0µs ± 5% 78.6µs ± 0% -5.37% (p=0.000 n=10+10) RegexpMatchHard_32-8 4.32µs ± 0% 4.26µs ± 0% -1.47% (p=0.000 n=10+10) RegexpMatchHard_1K-8 132µs ± 4% 126µs ± 0% -4.41% (p=0.000 n=10+9) Revcomp-8 1.11s ± 0% 1.11s ± 0% +0.14% (p=0.017 n=10+9) Template-8 155ms ± 1% 155ms ± 1% ~ (p=0.796 n=10+10) TimeParse-8 774ns ± 1% 785ns ± 1% +1.41% (p=0.001 n=10+10) TimeFormat-8 788ns ± 1% 806ns ± 1% +2.24% (p=0.000 n=10+9) name old speed new speed delta GobDecode-8 45.2MB/s ± 2% 47.5MB/s ± 2% +4.96% (p=0.000 n=10+10) GobEncode-8 56.0MB/s ± 2% 57.4MB/s ± 1% +2.44% (p=0.000 n=10+9) Gzip-8 23.0MB/s ± 0% 26.3MB/s ± 0% +14.55% (p=0.000 n=10+10) Gunzip-8 230MB/s ± 1% 231MB/s ± 0% +0.55% (p=0.000 n=10+8) JSONEncode-8 55.6MB/s ± 1% 59.9MB/s ± 0% +7.65% (p=0.000 n=10+9) JSONDecode-8 12.9MB/s ± 0% 13.3MB/s ± 1% +2.94% (p=0.000 n=7+10) GoParse-8 7.08MB/s ± 1% 7.22MB/s ± 0% +1.95% (p=0.000 n=10+10) RegexpMatchEasy0_32-8 153MB/s ± 0% 153MB/s ± 0% -0.16% (p=0.023 n=10+10) RegexpMatchEasy0_1K-8 1.30GB/s ± 1% 1.30GB/s ± 0% ~ (p=0.393 n=10+10) RegexpMatchEasy1_32-8 158MB/s ± 0% 158MB/s ± 0% ~ (p=0.684 n=10+10) RegexpMatchEasy1_1K-8 915MB/s ± 2% 918MB/s ± 1% ~ (p=0.796 n=10+10) RegexpMatchMedium_32-8 3.35MB/s ± 0% 3.35MB/s ± 0% ~ (p=1.000 n=10+9) RegexpMatchMedium_1K-8 12.3MB/s ± 5% 13.0MB/s ± 0% +5.56% (p=0.000 n=10+10) RegexpMatchHard_32-8 7.40MB/s ± 0% 7.51MB/s ± 0% +1.50% (p=0.000 n=10+10) RegexpMatchHard_1K-8 7.75MB/s ± 4% 8.10MB/s ± 0% +4.52% (p=0.000 n=10+8) Revcomp-8 229MB/s ± 0% 228MB/s ± 0% -0.14% (p=0.017 n=10+9) Template-8 12.5MB/s ± 1% 12.5MB/s ± 1% ~ (p=0.780 n=10+10) Change-Id: I103389f168eac79f6af44e8fef93acc2a7a4ac96 Reviewed-on: https://go-review.googlesource.com/88415 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-02-15 20:54:50 +00:00
Chad Rosier	cdd961630c	cmd/compile: generate tbz/tbnz when comparing against zero on arm64 The tbz/tbnz checks the sign bit to determine if the value is >= 0 or < 0. go1 benchmark results: name old speed new speed delta JSONEncode 94.4MB/s ± 1% 95.7MB/s ± 0% +1.36% (p=0.000 n=10+9) JSONDecode 19.7MB/s ± 1% 19.9MB/s ± 1% +1.08% (p=0.000 n=9+10) Gzip 45.5MB/s ± 0% 46.0MB/s ± 0% +1.06% (p=0.000 n=10+10) Revcomp 376MB/s ± 0% 379MB/s ± 0% +0.69% (p=0.000 n=10+10) RegexpMatchHard_1K 12.6MB/s ± 0% 12.7MB/s ± 0% +0.57% (p=0.000 n=10+8) RegexpMatchMedium_32 3.21MB/s ± 0% 3.22MB/s ± 0% +0.31% (p=0.000 n=9+10) RegexpMatchEasy1_1K 1.27GB/s ± 0% 1.27GB/s ± 0% +0.23% (p=0.000 n=9+9) RegexpMatchHard_32 11.4MB/s ± 0% 11.4MB/s ± 1% +0.19% (p=0.036 n=10+8) RegexpMatchEasy0_1K 1.77GB/s ± 0% 1.77GB/s ± 0% +0.13% (p=0.000 n=9+10) RegexpMatchMedium_1K 19.3MB/s ± 0% 19.3MB/s ± 0% +0.04% (p=0.008 n=10+8) RegexpMatchEasy0_32 131MB/s ± 0% 131MB/s ± 0% ~ (p=0.211 n=10+10) GobDecode 57.5MB/s ± 1% 57.6MB/s ± 2% ~ (p=0.469 n=10+10) GobEncode 58.6MB/s ± 1% 58.5MB/s ± 2% ~ (p=0.781 n=10+10) GoParse 9.40MB/s ± 0% 9.39MB/s ± 0% -0.19% (p=0.005 n=10+9) RegexpMatchEasy1_32 133MB/s ± 0% 133MB/s ± 0% -0.48% (p=0.000 n=10+10) Template 20.9MB/s ± 0% 20.6MB/s ± 0% -1.54% (p=0.000 n=8+10) Change-Id: I411efe44db35c3962445618d5a47c12e31b3925b Reviewed-on: https://go-review.googlesource.com/92715 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-02-14 15:52:41 +00:00
Austin Clements	2010189407	runtime: remove legacy eager write barrier Now that the buffered write barrier is implemented for all architectures, we can remove the old eager write barrier implementation. This CL removes the implementation from the runtime, support in the compiler for calling it, and updates some compiler tests that relied on the old eager barrier support. It also makes sure that all of the useful comments from the old write barrier implementation still have a place to live. Fixes #22460. Updates #21640 since this fixes the layering concerns of the write barrier (but not the other things in that issue). Change-Id: I580f93c152e89607e0a72fe43370237ba97bae74 Reviewed-on: https://go-review.googlesource.com/92705 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-02-13 16:34:46 +00:00
Keith Randall	23e8e197b0	cmd/compile: use unsigned loads for multi-element comparisons When loading multiple elements of an array into a single register, make sure we treat them as unsigned. When treated as signed, the upper bits might all be set, causing the shift-or combo to clobber the values higher in the register. Fixes #23719. Change-Id: Ic87da03e9bd0fe2c60bb214b99f846e4e9446052 Reviewed-on: https://go-review.googlesource.com/92335 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2018-02-06 18:24:33 +00:00
Caleb Spare	67fdf587dc	cmd/compile: don't combine 64-bit loads/stores on amd64 This causes a performance regression for some calls. Fixes #23424. Updates #6853. Change-Id: Id1db652d5aca0ce631a3417c0c056d6637fefa9e Reviewed-on: https://go-review.googlesource.com/88135 Run-TryBot: Caleb Spare <cespare@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-01-17 22:05:33 +00:00
Keith Randall	fa1f52c5f6	cmd/compile: always nil check before interface call Fixes #22703 The fix was already done by Cherry for defer/go of an interface call (CL 23820). We just need to do it everywhere. Change-Id: I0115d22e443931fe1bcce44c93c4d0770b5fd268 Reviewed-on: https://go-review.googlesource.com/77450 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-11-14 05:39:45 +00:00
Alberto Donizetti	33a9f01729	cmd/compile: add mul by ±2ⁿ code-generation tests for arm/arm64 This change adds code generation tests for multiplication by ±2ⁿ for arm and arm64, in preparation for a future CL which will remove the relevant architecture-specific SSA rules (the reduction is already performed by rules in generic.rules added in CL 36323). Change-Id: Iebdd5c3bb2fc632c85888569ff0c49f78569a862 Reviewed-on: https://go-review.googlesource.com/75752 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-11-04 10:28:27 +00:00
Lynn Boger	bb1fd3b5ff	cmd/compile: add rules to improve consecutive byte loads and stores on ppc64le This adds new rules to recognize consecutive byte loads and stores and lowers them to loads and stores such as lhz, lwz, ld, sth, stw, std. This change only covers the little endian cases on little endian machines, such as is found in encoding/binary UintXX or PutUintXX for little endian. Big endian will be done later. Updates were also made to binary_test.go to allow the benchmark for Uint and PutUint to actually use those functions because the way they were written, those functions were being optimized out. Testcases were also added to cmd/compile/internal/gc/asm_test.go. Updates #22496 The following improvement can be found in golang.org/x/crypto poly1305: Benchmark64-16 142 114 -19.72% Benchmark1K-16 1717 1424 -17.06% Benchmark64Unaligned-16 142 113 -20.42% Benchmark1KUnaligned-16 1721 1428 -17.02% chacha20poly1305: BenchmarkChacha20Poly1305Open_64-16 1012 885 -12.55% BenchmarkChacha20Poly1305Seal_64-16 971 836 -13.90% BenchmarkChacha20Poly1305Open_1350-16 11113 9539 -14.16% BenchmarkChacha20Poly1305Seal_1350-16 11013 9392 -14.72% BenchmarkChacha20Poly1305Open_8K-16 61074 53431 -12.51% BenchmarkChacha20Poly1305Seal_8K-16 61214 54806 -10.47% Other improvements of around 10% found in crypto/tls. Results after updating encoding/binary/binary_test.go: BenchmarkLittleEndianPutUint64-16 1.87 0.93 -50.27% BenchmarkLittleEndianPutUint32-16 1.19 0.93 -21.85% BenchmarkLittleEndianPutUint16-16 1.16 1.03 -11.21% Change-Id: I7bbe2fbcbd11362d58662fecd907a0c07e6ca2fb Reviewed-on: https://go-review.googlesource.com/74410 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <mike.munday@ibm.com>	2017-11-03 18:46:59 +00:00
Ilya Tocar	f3884680fc	cmd/compile/internal/ssa: inline memmove with known size Replace calls to memmove with known (constant) size, with OpMove. Do it only if it is safe from aliasing point of view. Helps with code like this: append(buf,"const str"...) In strconv this provides nice benefit: Quote-6 731ns ± 2% 647ns ± 3% -11.41% (p=0.000 n=10+10) QuoteRune-6 117ns ± 5% 111ns ± 1% -4.54% (p=0.000 n=10+10) AppendQuote-6 475ns ± 0% 396ns ± 0% -16.59% (p=0.000 n=9+10) AppendQuoteRune-6 32.0ns ± 0% 27.4ns ± 0% -14.41% (p=0.000 n=8+9) Change-Id: I7704f5c51b46aed2d8f033de74c75140fc35036c Reviewed-on: https://go-review.googlesource.com/54394 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-11-02 20:30:25 +00:00
Michael Munday	4745604bcb	cmd/compile: intrinsify math.RoundToEven on s390x The new RoundToEven function can be implemented as a single FIDBR instruction on s390x. name old time/op new time/op delta RoundToEven 5.32ns ± 1% 0.86ns ± 1% -83.86% (p=0.000 n=10+10) Change-Id: Iaf597e57a0d1085961701e3c75ff4f6f6dcebb5f Reviewed-on: https://go-review.googlesource.com/74350 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-10-31 18:04:27 +00:00
Michael Munday	96cdacb971	cmd/asm, cmd/compile: optimize math.Abs and math.Copysign on s390x This change adds three new instructions: - LPDFR: load positive (math.Abs(x)) - LNDFR: load negative (-math.Abs(x)) - CPSDR: copy sign (math.Copysign(x, y)) By making use of GPR <-> FPR moves we can now compile math.Abs and math.Copysign to these instructions using SSA rules. This CL also adds new rules to merge address generation into combined load operations. This makes GPR <-> FPR move matching more reliable. name old time/op new time/op delta Copysign 1.85ns ± 0% 1.40ns ± 1% -24.65% (p=0.000 n=8+10) Abs 1.58ns ± 1% 0.73ns ± 1% -53.64% (p=0.000 n=10+10) The geo mean improvement for all math package benchmarks was 4.6%. Change-Id: I0cec35c5c1b3fb45243bf666b56b57faca981bc9 Reviewed-on: https://go-review.googlesource.com/73950 Run-TryBot: Michael Munday <mike.munday@ibm.com> Reviewed-by: Keith Randall <khr@golang.org>	2017-10-30 23:42:51 +00:00
Austin Clements	7e343134d3	cmd/compile: compiler support for buffered write barrier This CL implements the compiler support for calling the buffered write barrier added by the previous CL. Since the buffered write barrier is only implemented on amd64 right now, this still supports the old, eager write barrier as well. There's little overhead to supporting both and this way a few tests in test/fixedbugs that expect to have liveness maps at write barrier calls can easily opt-in to the old, eager barrier. This significantly improves the performance of the write barrier: name old time/op new time/op delta WriteBarrier-12 73.5ns ±20% 19.2ns ±27% -73.90% (p=0.000 n=19+18) It also reduces the size of binaries because the write barrier call is more compact: name old object-bytes new object-bytes delta Template 398k ± 0% 393k ± 0% -1.14% (p=0.008 n=5+5) Unicode 208k ± 0% 206k ± 0% -1.00% (p=0.008 n=5+5) GoTypes 1.18M ± 0% 1.15M ± 0% -2.00% (p=0.008 n=5+5) Compiler 4.05M ± 0% 3.88M ± 0% -4.26% (p=0.008 n=5+5) SSA 8.25M ± 0% 8.11M ± 0% -1.59% (p=0.008 n=5+5) Flate 228k ± 0% 224k ± 0% -1.83% (p=0.008 n=5+5) GoParser 295k ± 0% 284k ± 0% -3.62% (p=0.008 n=5+5) Reflect 1.00M ± 0% 0.99M ± 0% -0.70% (p=0.008 n=5+5) Tar 339k ± 0% 333k ± 0% -1.67% (p=0.008 n=5+5) XML 404k ± 0% 395k ± 0% -2.10% (p=0.008 n=5+5) [Geo mean] 704k 690k -2.00% name old exe-bytes new exe-bytes delta HelloSize 1.05M ± 0% 1.04M ± 0% -1.55% (p=0.008 n=5+5) https://perf.golang.org/search?q=upload:20171027.1 (Amusingly, this also reduces compiler allocations by 0.75%, which, combined with the better write barrier, speeds up the compiler overall by 2.10%. See the perf link.) It slightly improves the performance of most of the go1 benchmarks and improves the performance of the x/benchmarks: name old time/op new time/op delta BinaryTree17-12 2.40s ± 1% 2.47s ± 1% +2.69% (p=0.000 n=19+19) Fannkuch11-12 2.95s ± 0% 2.95s ± 0% +0.21% (p=0.000 n=20+19) FmtFprintfEmpty-12 41.8ns ± 4% 41.4ns ± 2% -1.03% (p=0.014 n=20+20) FmtFprintfString-12 68.7ns ± 2% 67.5ns ± 1% -1.75% (p=0.000 n=20+17) FmtFprintfInt-12 79.0ns ± 3% 77.1ns ± 1% -2.40% (p=0.000 n=19+17) FmtFprintfIntInt-12 127ns ± 1% 123ns ± 3% -3.42% (p=0.000 n=20+20) FmtFprintfPrefixedInt-12 152ns ± 1% 150ns ± 1% -1.02% (p=0.000 n=18+17) FmtFprintfFloat-12 211ns ± 1% 209ns ± 0% -0.99% (p=0.000 n=20+16) FmtManyArgs-12 500ns ± 0% 496ns ± 0% -0.73% (p=0.000 n=17+20) GobDecode-12 6.44ms ± 1% 6.53ms ± 0% +1.28% (p=0.000 n=20+19) GobEncode-12 5.46ms ± 0% 5.46ms ± 1% ~ (p=0.550 n=19+20) Gzip-12 220ms ± 1% 216ms ± 0% -1.75% (p=0.000 n=19+19) Gunzip-12 38.8ms ± 0% 38.6ms ± 0% -0.30% (p=0.000 n=18+19) HTTPClientServer-12 79.0µs ± 1% 78.2µs ± 1% -1.01% (p=0.000 n=20+20) JSONEncode-12 11.9ms ± 0% 11.9ms ± 0% -0.29% (p=0.000 n=20+19) JSONDecode-12 52.6ms ± 0% 52.2ms ± 0% -0.68% (p=0.000 n=19+20) Mandelbrot200-12 3.69ms ± 0% 3.68ms ± 0% -0.36% (p=0.000 n=20+20) GoParse-12 3.13ms ± 1% 3.18ms ± 1% +1.67% (p=0.000 n=19+20) RegexpMatchEasy0_32-12 73.2ns ± 1% 72.3ns ± 1% -1.19% (p=0.000 n=19+18) RegexpMatchEasy0_1K-12 241ns ± 0% 239ns ± 0% -0.83% (p=0.000 n=17+16) RegexpMatchEasy1_32-12 68.6ns ± 1% 69.0ns ± 1% +0.47% (p=0.015 n=18+16) RegexpMatchEasy1_1K-12 364ns ± 0% 361ns ± 0% -0.67% (p=0.000 n=16+17) RegexpMatchMedium_32-12 104ns ± 1% 103ns ± 1% -0.79% (p=0.001 n=20+15) RegexpMatchMedium_1K-12 33.8µs ± 3% 34.0µs ± 2% ~ (p=0.267 n=20+19) RegexpMatchHard_32-12 1.64µs ± 1% 1.62µs ± 2% -1.25% (p=0.000 n=19+18) RegexpMatchHard_1K-12 49.2µs ± 0% 48.7µs ± 1% -0.93% (p=0.000 n=19+18) Revcomp-12 391ms ± 5% 396ms ± 7% ~ (p=0.154 n=19+19) Template-12 63.1ms ± 0% 59.5ms ± 0% -5.76% (p=0.000 n=18+19) TimeParse-12 307ns ± 0% 306ns ± 0% -0.39% (p=0.000 n=19+17) TimeFormat-12 325ns ± 0% 323ns ± 0% -0.50% (p=0.000 n=19+19) [Geo mean] 47.3µs 46.9µs -0.67% https://perf.golang.org/search?q=upload:20171026.1 name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.25ms ± 1% 2.20ms ± 1% -2.31% (p=0.000 n=18+18) HTTP-12 12.6µs ± 0% 12.6µs ± 0% -0.72% (p=0.000 n=18+17) JSON-12 11.0ms ± 0% 11.0ms ± 1% -0.68% (p=0.000 n=17+19) https://perf.golang.org/search?q=upload:20171026.2 Updates #14951. Updates #22460. Change-Id: Id4c0932890a1d41020071bec73b8522b1367d3e7 Reviewed-on: https://go-review.googlesource.com/73712 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-10-30 18:12:46 +00:00
Lynn Boger	4d0151ede5	cmd/compile,cmd/internal/obj/ppc64: make math.Abs,math.Copysign instrinsics on ppc64x This adds support for math Abs, Copysign to be instrinsics on ppc64x. New instruction FCPSGN is added to generate fcpsgn. Some new rules are added to improve the int<->float conversions that are generated mainly due to the Float64bits and Float64frombits in the math package. PPC64.rules is also modified as suggested in the review for CL 63290. Improvements: benchmark old ns/op new ns/op delta BenchmarkAbs-16 1.12 0.69 -38.39% BenchmarkCopysign-16 1.30 0.93 -28.46% BenchmarkNextafter32-16 9.34 8.05 -13.81% BenchmarkFrexp-16 8.81 7.60 -13.73% Others that used Copysign also saw smaller improvements. I attempted to make this work using rules since that seems to be preferred, but due to the use of Float64bits and Float64frombits in these functions, several rules had to be added and even then not all cases were matched. Using rules became too complicated and seemed too fragile for these. Updates #21390 Change-Id: Ia265da9a18355e08000818a4fba1a40e9e031995 Reviewed-on: https://go-review.googlesource.com/67130 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Keith Randall <khr@golang.org>	2017-10-30 13:56:39 +00:00
Hugues Bruant	3c46f49f94	cmd/compile: fix incorrect go:noinline usage This pragma is not actually honored by the compiler. The tests implicitly relied on the inliner being unable to inline closures with captured variables, which will soon change. Fixes #22208 Change-Id: I13abc9c930b9156d43ec216f8efb768952a29439 Reviewed-on: https://go-review.googlesource.com/73211 Reviewed-by: Michael Munday <mike.munday@ibm.com>	2017-10-30 07:48:21 +00:00
Aliaksandr Valialkin	0011cfbe2b	cmd/compile: optimize signed non-negative div/mod by a power of 2 This CL optimizes assembly for len() or cap() division by a power of 2 constants: func lenDiv(s []int) int { return len(s) / 16 } amd64 assembly before the CL: MOVQ "".s+16(SP), AX MOVQ AX, CX SARQ $63, AX SHRQ $60, AX ADDQ CX, AX SARQ $4, AX MOVQ AX, "".~r1+32(SP) RET amd64 assembly after the CL: MOVQ "".s+16(SP), AX SHRQ $4, AX MOVQ AX, "".~r1+32(SP) RET The CL relies on the fact that len() and cap() result cannot be negative. Trigger stats for the added SSA rules on linux/amd64 when running make.bash: 46 Div64 12 Mod64 The added SSA rules may trigger on more cases in the future when SSA values will be populated with the info on their lower bounds. For instance: func f(i int16) int16 { if i < 3 { return -1 } // Lower bound of i is 3 here -> i is non-negative, // so unsigned arithmetics may be used here. return i % 16 } Change-Id: I8bc6be5a03e71157ced533c01416451ff6f1a7f0 Reviewed-on: https://go-review.googlesource.com/65530 Reviewed-by: Keith Randall <khr@golang.org>	2017-10-06 15:15:39 +00:00
Alberto Donizetti	03614562ca	cmd/compile: remove x86 arch-specific rules for +2ⁿ multiplication amd64 and 386 have rules to reduce multiplication by a positive power of two, but a more general reduction (both for positive and negative powers of two) is already performed by generic rules that were added in CL 36323 to replace walkmul (see lines 166:173 in generic.rules). The x86 and amd64 rules are never triggered during all.bash and can be removed, reducing rules duplication. The change also adds a few code generation tests for amd64 and 386. Change-Id: I566d48186643bd722a4c0137fe94e513b8b20e36 Reviewed-on: https://go-review.googlesource.com/68450 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-10-06 09:30:57 +00:00
Ilya Tocar	6b8a3c8889	cmd/compile/internal/amd64: add SETccmem Combine setcc and store of result into setcc that writes directly to memory. Triggers 200+ times in go tool. Fixes #21630 Change-Id: Iafa22607426f4120140c88fae4b9aecb46e0bba8 Reviewed-on: https://go-review.googlesource.com/67950 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-10-05 20:53:28 +00:00
Michael Munday	7582494e06	cmd/compile: add s390x intrinsics for Ceil, Floor, Round and Trunc Ceil, Floor and Trunc are pre-existing intrinsics. Round is a new function and has been added as an intrinsic in this CL. All of the functions can be implemented as a single 'LOAD FP INTEGER' instruction, FIDBR, on s390x. name old time/op new time/op delta Ceil 2.34ns ± 0% 0.85ns ± 0% -63.74% (p=0.000 n=5+4) Floor 2.33ns ± 0% 0.85ns ± 1% -63.35% (p=0.008 n=5+5) Round 4.23ns ± 0% 0.85ns ± 0% -79.89% (p=0.000 n=5+4) Trunc 2.35ns ± 0% 0.85ns ± 0% -63.83% (p=0.029 n=4+4) Change-Id: Idee7ba24a2899d12bf9afee4eedd6b4aaad3c510 Reviewed-on: https://go-review.googlesource.com/63890 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-09-20 10:01:35 +00:00
Michael Munday	95b146e8eb	cmd/compile: improve floating point constant propagation Add generic rules to propagate floating point constants through comparisons and integer conversions. These new rules seldom trigger in the standard library so there is no performance change, however I think it is worth adding them anyway for completeness. Change-Id: I9db5222746508a2996f1cafb72f4e0cf2541de07 Reviewed-on: https://go-review.googlesource.com/63795 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-09-14 23:08:33 +00:00
Lynn Boger	fa3fe2e3c6	cmd/compile, math/bits: add rotate rules to PPC64.rules This adds rules to match the code in math/bits RotateLeft, RotateLeft32, and RotateLef64 to allow them to be inlined. The rules are complicated because the code in these function use different types, and the non-const version of these shifts generate Mask and Carry instructions that become subexpressions during the match process. Also adds a testcase to asm_test.go. Improvement in math/bits: BenchmarkRotateLeft-16 1.57 1.32 -15.92% BenchmarkRotateLeft32-16 1.60 1.37 -14.37% BenchmarkRotateLeft64-16 1.57 1.32 -15.92% Updates #21390 Change-Id: Ib6f17669ecc9cab54f18d690be27e2225ca654a4 Reviewed-on: https://go-review.googlesource.com/59932 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2017-09-11 20:44:22 +00:00
Michael Munday	9da29b687f	cmd/compile: propagate constants through math.Float{32,64}{,from}bits This CL adds generic SSA rules to propagate constants through raw bits conversions between floats and integers. This allows constants to propagate through some math functions. For example, math.Copysign(0, -1) is now constant folded to a load of -0.0. Requires a fix to the ARM assembler which loaded -0.0 as +0.0. Change-Id: I52649a4691077c7414f19d17bb599a6743c23ac2 Reviewed-on: https://go-review.googlesource.com/62250 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-09-08 17:24:03 +00:00
Keith Randall	aed1c119fd	cmd/compile: fix assembly test Bad merge, missed changing to keyed literal structs. Bug introduced in CL 56252 Change-Id: I55cccff4990bd25e6387f6c90919ee5866900d7f Reviewed-on: https://go-review.googlesource.com/61290 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-09-03 16:24:24 +00:00
Cholerae Hu	fb165eaffd	cmd/compile: combine xn - yn into (x-y)*n Do the similar thing to CL 55143 to reduce IMUL. Change-Id: I1bd38f618058e3cd74fac181f003610ea13f2294 Reviewed-on: https://go-review.googlesource.com/56252 Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-09-03 14:29:38 +00:00
Cherry Zhang	7846500a5a	cmd/compile: remove redundant constant shift rules Normal shift rules plus constant folding are enough to generate efficient shift-by-constant instructions. Add test to make sure we don't generate comparisons for constant shifts. TODO: there are still constant shift rules on PPC64. If they are removed, the constant folding rules are not enough to remove all the test and mask stuff for constant shifts. Leave them in for now. Fixes #20663. Change-Id: I724cc324aa8607762d0c8aacf9bfa641bda5c2a1 Reviewed-on: https://go-review.googlesource.com/60330 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-31 02:08:48 +00:00
Keith Randall	2b079c3c04	cmd/compile: use keyed struct for asm tests Just to make it clearer which regexps are positive and which regexps are negative. Change-Id: Ia190e89be28048fcae2491506f552afad90a5f85 Reviewed-on: https://go-review.googlesource.com/59490 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-08-28 17:34:25 +00:00
David du Colombier	adbfdfe377	cmd/compile: don't use MOVOstore for move on plan9/amd64 The SSA compiler currently generates MOVOstore instructions to optimize 16 bytes moves on AMD64 architecture. However, we can't use the MOVOstore instruction on Plan 9, because floating point operations are not allowed in the note handler. We rely on the useSSE flag to disable the use of the MOVOstore instruction on Plan 9 and replace it by two MOVQstore instructions. Fixes #21625 Change-Id: Idfefcceadccafe1752b059b5fe113ce566c0e71c Reviewed-on: https://go-review.googlesource.com/59171 Run-TryBot: David du Colombier <0intro@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2017-08-28 16:21:28 +00:00
Ilya Tocar	9c99512d18	cmd/compile/internal/ssa: combine consecutive loads and stores on amd64 Sometimes (often for calls) we generate code like this: MOVQ (addr),AX MOVQ 8(addr),BX MOVQ AX,(otheraddr) MOVQ BX,8(otheraddr) Replace it with MOVUPS (addr),X0 MOVUPS X0,(otheraddr) For completeness do the same for 8,16,32-bit loads/stores too. Shaves 1% from code sections of go tool. /localdisk/itocar/golang/bin/go 10293917 go_old 10334877 [40960 bytes] read-only data = 682 bytes (0.040769%) global text (code) = 38961 bytes (1.036503%) Total difference 39643 bytes (0.674628%) Updates #6853 Change-Id: I1f0d2f60273a63a079b58927cd1c4e3429d2e7ae Reviewed-on: https://go-review.googlesource.com/57130 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-25 20:05:17 +00:00
Keith Randall	fb05948d9e	cmd/compile,math: improve code generation for math.Abs Implement int reg <-> fp reg moves on amd64. If we see a load to int reg followed by an int->fp move, then we can just load to the fp reg instead. Same for stores. math.Abs is now: MOVQ "".x+8(SP), AX SHLQ $1, AX SHRQ $1, AX MOVQ AX, "".~r1+16(SP) math.Copysign is now: MOVQ "".x+8(SP), AX SHLQ $1, AX SHRQ $1, AX MOVQ "".y+16(SP), CX SHRQ $63, CX SHLQ $63, CX ORQ CX, AX MOVQ AX, "".~r2+24(SP) math.Float64bits is now: MOVSD "".x+8(SP), X0 MOVSD X0, "".~r1+16(SP) (it would be nicer to use a non-SSE reg for this, nothing is perfect) And due to the fix for #21440, the inlined version of these improve as well. name old time/op new time/op delta Abs 1.38ns ± 5% 0.89ns ±10% -35.54% (p=0.000 n=10+10) Copysign 1.56ns ± 7% 1.35ns ± 6% -13.77% (p=0.000 n=9+10) Fixes #13095 Change-Id: Ibd7f2792412a6668608780b0688a77062e1f1499 Reviewed-on: https://go-review.googlesource.com/58732 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2017-08-25 19:15:01 +00:00
Michael Munday	744ebfde04	cmd/compile: eliminate stores to unread auto variables This is a crude compiler pass to eliminate stores to auto variables that are only ever written to. Eliminates an unnecessary store to x from the following code: func f() int { var x := 1 return *(&x) } Fixes #19765. Change-Id: If2c63a8ae67b8c590b6e0cc98a9610939a3eeffa Reviewed-on: https://go-review.googlesource.com/38746 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-24 16:53:56 +00:00
Alberto Donizetti	8bca7ef607	cmd/compile: support placeholder name '$' in code generation tests This change adds to the code-generation harness in asm_test.go support for the use of a '$' placeholder name for test functions. A few of uninformative function names are also changed to use the placeholder, to confirm that the change works as expected. Fixes #21500 Change-Id: Iba168bd85efc9822253305d003b06682cf8a6c5c Reviewed-on: https://go-review.googlesource.com/57292 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-22 19:42:32 +00:00
Ilya Tocar	da34ddf24b	cmd/compile/internal/ssa: combine more const stores We already combine const stores up-to MOVQstoreconst. Combine 2 64-bit stores of const zero into 1 sse store of 128-bit zero. Shaves significant (>1%) amount of code from go tool: /localdisk/itocar/golang/bin/go 10334877 go_old 10388125 [53248 bytes] global text (code) = 51041 bytes (1.343944%) read-only data = 663 bytes (0.039617%) Total difference 51704 bytes (0.873981%) Change-Id: I7bc40968023c3a69f379b10fbb433cdb11364f1b Reviewed-on: https://go-review.googlesource.com/56250 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Keith Randall <khr@golang.org>	2017-08-17 17:40:40 +00:00
Alberto Donizetti	a0453a180f	cmd/compile: combine xn + yn into (x+y)n There are a few cases where this can be useful. Apart from the obvious (and silly) 100n + 200n where we generate one IMUL instead of two, consider: 15n + 31n Currently, the compiler strength-reduces both imuls, generating: 0x0000 00000 MOVQ "".n+8(SP), AX 0x0005 00005 MOVQ AX, CX 0x0008 00008 SHLQ $4, AX 0x000c 00012 SUBQ CX, AX 0x000f 00015 MOVQ CX, DX 0x0012 00018 SHLQ $5, CX 0x0016 00022 SUBQ DX, CX 0x0019 00025 ADDQ CX, AX 0x001c 00028 MOVQ AX, "".~r1+16(SP) 0x0021 00033 RET But combining the imuls is both faster and shorter: 0x0000 00000 MOVQ "".n+8(SP), AX 0x0005 00005 IMULQ $46, AX 0x0009 00009 MOVQ AX, "".~r1+16(SP) 0x000e 00014 RET even without strength-reduction. Moreover, consider: 5n + 7(n+1) + 11(n+2) We already have a rule that rewrites 7(n+1) into 7n+7, so the generated code (without imuls merging) looks like this: 0x0000 00000 MOVQ "".n+8(SP), AX 0x0005 00005 LEAQ (AX)(AX4), CX 0x0009 00009 MOVQ AX, DX 0x000c 00012 NEGQ AX 0x000f 00015 LEAQ (AX)(DX8), AX 0x0013 00019 ADDQ CX, AX 0x0016 00022 LEAQ (DX)(CX2), CX 0x001a 00026 LEAQ 29(AX)(CX1), AX 0x001f 00031 MOVQ AX, "".~r1+16(SP) But with imuls merging, the 5n, 7n and 11n factors get merged, and the generated code looks like this: 0x0000 00000 MOVQ "".n+8(SP), AX 0x0005 00005 IMULQ $23, AX 0x0009 00009 ADDQ $29, AX 0x000d 00013 MOVQ AX, "".~r1+16(SP) 0x0012 00018 RET Which is both faster and shorter; that's also the exact same code that clang and the intel c compiler generate for the above expression. Change-Id: Ib4d5503f05d2f2efe31a1be14e2fe6cac33730a9 Reviewed-on: https://go-review.googlesource.com/55143 Reviewed-by: Keith Randall <khr@golang.org>	2017-08-16 16:51:59 +00:00
Cherry Zhang	f20944de78	cmd/compile: set/unset base register for better assembly print For address of an auto or arg, on all non-x86 architectures the assembler backend encodes the actual SP offset in the instruction but leaves the offset in Prog unchanged. When the assembly is printed in compile -S, it shows an offset relative to pseudo FP/SP with an actual hardware SP base register (e.g. R13 on ARM). This is confusing. Unset the base register if it is indeed SP, so the assembly output is consistent. If the base register isn't SP, it should be an error and the error output contains the actual base register. For address loading instructions, the base register isn't set in the compiler on non-x86 architectures. Set it. Normally it is SP and will be unset in the change mentioned above for printing. If it is not, it will be an error and the error output contains the actual base register. No change in generated binary, only printed assembly. Passes "go build -a -toolexec 'toolstash -cmp' std cmd" on all architectures. Fixes #21064. Change-Id: Ifafe8d5f9b437efbe824b63b3cbc2f5f6cdc1fd5 Reviewed-on: https://go-review.googlesource.com/49432 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2017-08-02 12:24:02 +00:00
Ilya Tocar	3bdc2f3abf	cmd/compile/internal/gc: speed-up small array comparison Currently we inline array comparisons for arrays with at most 4 elements. Compare arrays with small size, but more than 4 elements (e. g. [16]byte) with larger compares. This provides very slightly smaller binaries, and results in faster code. ArrayEqual-6 7.41ns ± 0% 3.17ns ± 0% -57.15% (p=0.000 n=10+10) For go tool: global text (code) = -559 bytes (-0.014566%) This also helps mapaccess1_faststr, and maps in general: MapDelete/Str/1-6 195ns ± 1% 186ns ± 2% -4.47% (p=0.000 n=10+10) MapDelete/Str/2-6 211ns ± 1% 177ns ± 1% -16.01% (p=0.000 n=10+10) MapDelete/Str/4-6 225ns ± 1% 183ns ± 1% -18.49% (p=0.000 n=8+10) MapStringKeysEight_16-6 31.3ns ± 0% 28.6ns ± 0% -8.63% (p=0.000 n=6+9) MapStringKeysEight_32-6 29.2ns ± 0% 27.6ns ± 0% -5.45% (p=0.000 n=10+10) MapStringKeysEight_64-6 29.1ns ± 1% 27.5ns ± 0% -5.46% (p=0.000 n=10+10) MapStringKeysEight_1M-6 29.1ns ± 1% 27.6ns ± 0% -5.49% (p=0.000 n=10+10) Change-Id: I9ec98e41b233031e0e96c4e13d86a324f628ed4a Reviewed-on: https://go-review.googlesource.com/40771 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-06-01 15:46:16 +00:00

1 2

91 commits