mirror of
				https://github.com/golang/go.git
				synced 2025-10-31 16:50:58 +00:00 
			
		
		
		
	
	
		
			12 commits
		
	
	
	| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|  Michael Munday | 854e892ce1 | cmd/compile: optimize shift pairs and masks on s390x Optimize combinations of left and right shifts by a constant value into a 'rotate then insert selected bits [into zero]' instruction. Use the same instruction for contiguous masks since it has some benefits over 'and immediate' (not restricted to 32-bits, does not overwrite source register). To keep the complexity of this change under control I've only implemented 64 bit operations for now. There are a lot more optimizations that can be done with this instruction family. However, since their function overlaps with other instructions we need to be somewhat careful not to break existing optimization rules by creating optimization dead ends. This is particularly true of the load/store merging rules which contain lots of zero extensions and shifts. This CL does interfere with the store merging rules when an operand is shifted left before it is stored: binary.BigEndian.PutUint64(b, x << 1) This is unfortunate but it's not critical and somewhat complex so I plan to fix that in a follow up CL. file before after Δ % addr2line 4117446 4117282 -164 -0.004% api 4945184 4942752 -2432 -0.049% asm 4998079 4991891 -6188 -0.124% buildid 2685158 2684074 -1084 -0.040% cgo 4553732 4553394 -338 -0.007% compile 19294446 19245070 -49376 -0.256% cover 4897105 4891319 -5786 -0.118% dist 3544389 3542785 -1604 -0.045% doc 3926795 3927617 +822 +0.021% fix 3302958 3293868 -9090 -0.275% link 6546274 6543456 -2818 -0.043% nm 4102021 4100825 -1196 -0.029% objdump 4542431 4548483 +6052 +0.133% pack 2482465 2416389 -66076 -2.662% pprof 13366541 13363915 -2626 -0.020% test2json 2829007 2761515 -67492 -2.386% trace 10216164 10219684 +3520 +0.034% vet 6773956 6773572 -384 -0.006% total 107124151 106917891 -206260 -0.193% Change-Id: I7591cce41e06867ba10a745daae9333513062746 Reviewed-on: https://go-review.googlesource.com/c/go/+/233317 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Trust: Michael Munday <mike.munday@ibm.com> | ||
|  diaxu01 | 01aad9ea93 | cmd/compile: Optimize ARM64's code with EON This patch fuses pattern '(MVN (XOR x y))' into '(EON x y)'. Change-Id: I269c98ce198d51a4945ce8bd0e1024acbd1b7609 Reviewed-on: https://go-review.googlesource.com/c/go/+/239638 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> | ||
|  Josh Bleecher Snyder | 50b11318fe | cmd/compile: use oneBit instead of isPowerOfTwo in bit optimization This optimization works on any integer with exactly one bit set. This is identical to being a power of two, except in the most negative number. Use oneBit instead. The rule now triggers in a few more places in std+cmd, in packages encoding/asn1, crypto/elliptic, and vendor/golang.org/x/crypto/cryptobyte. This change obviates the need for CL 222479 by doing this optimization consistently in the compiler. Change-Id: I983c6235290fdc634fda5e11b10f1f8ce041272f Reviewed-on: https://go-review.googlesource.com/c/go/+/229124 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> | ||
|  LE Manh Cuong | c5f142fa9f | cmd/compile: optimize bitset tests The assembly output for x & c == c, where c is power of 2: MOVQ "".set+8(SP), AX ANDQ $8, AX CMPQ AX, $8 SETEQ "".~r2+24(SP) With optimization using bitset: MOVQ "".set+8(SP), AX BTL $3, AX SETCS "".~r2+24(SP) output less than 1 instruction. However, there is no speed improvement: name old time/op new time/op delta AllBitSet-8 0.35ns ± 0% 0.35ns ± 0% ~ (all equal) Fixes #31904 Change-Id: I5dca4e410bf45716ed2145e3473979ec997e35d4 Reviewed-on: https://go-review.googlesource.com/c/go/+/175957 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> | ||
|  Ben Shi | c6bf9a8109 | cmd/compile: optimize AMD64's bit wise operation Currently "arr[idx] |= 0x80" is compiled to MOVLload->BTSL->MOVLstore. And this CL optimizes it to a single BTSLconstmodify. Other bit wise operations with a direct memory operand are also implemented. 1. The size of the executable bin/go decreases about 4KB, and the total size of pkg/linux_amd64 (excluding cmd/compile) decreases about 0.6KB. 2. There a little improvement in the go1 benchmark test (excluding noise). name old time/op new time/op delta BinaryTree17-4 2.66s ± 4% 2.66s ± 3% ~ (p=0.596 n=49+49) Fannkuch11-4 2.38s ± 2% 2.32s ± 2% -2.69% (p=0.000 n=50+50) FmtFprintfEmpty-4 42.7ns ± 4% 43.2ns ± 7% +1.31% (p=0.009 n=50+50) FmtFprintfString-4 71.0ns ± 5% 72.0ns ± 3% +1.33% (p=0.000 n=50+50) FmtFprintfInt-4 80.7ns ± 4% 80.6ns ± 3% ~ (p=0.931 n=50+50) FmtFprintfIntInt-4 125ns ± 3% 126ns ± 4% ~ (p=0.051 n=50+50) FmtFprintfPrefixedInt-4 158ns ± 1% 142ns ± 3% -9.84% (p=0.000 n=36+50) FmtFprintfFloat-4 215ns ± 4% 212ns ± 4% -1.23% (p=0.002 n=50+50) FmtManyArgs-4 519ns ± 3% 510ns ± 3% -1.77% (p=0.000 n=50+50) GobDecode-4 6.49ms ± 6% 6.52ms ± 5% ~ (p=0.866 n=50+50) GobEncode-4 5.93ms ± 8% 6.01ms ± 7% ~ (p=0.076 n=50+50) Gzip-4 222ms ± 4% 224ms ± 8% +0.80% (p=0.001 n=50+50) Gunzip-4 36.6ms ± 5% 36.4ms ± 4% ~ (p=0.093 n=50+50) HTTPClientServer-4 59.1µs ± 1% 58.9µs ± 2% -0.24% (p=0.039 n=49+48) JSONEncode-4 9.23ms ± 4% 9.21ms ± 5% ~ (p=0.244 n=50+50) JSONDecode-4 48.8ms ± 4% 48.7ms ± 4% ~ (p=0.653 n=50+50) Mandelbrot200-4 3.81ms ± 4% 3.80ms ± 3% ~ (p=0.834 n=50+50) GoParse-4 3.20ms ± 5% 3.19ms ± 5% ~ (p=0.494 n=50+50) RegexpMatchEasy0_32-4 78.1ns ± 2% 77.4ns ± 3% -0.86% (p=0.005 n=50+50) RegexpMatchEasy0_1K-4 233ns ± 3% 233ns ± 3% ~ (p=0.074 n=50+50) RegexpMatchEasy1_32-4 74.2ns ± 3% 73.4ns ± 3% -1.06% (p=0.000 n=50+50) RegexpMatchEasy1_1K-4 369ns ± 2% 364ns ± 4% -1.41% (p=0.000 n=36+50) RegexpMatchMedium_32-4 109ns ± 4% 107ns ± 3% -2.06% (p=0.001 n=50+50) RegexpMatchMedium_1K-4 31.5µs ± 3% 30.8µs ± 3% -2.20% (p=0.000 n=50+50) RegexpMatchHard_32-4 1.57µs ± 3% 1.56µs ± 2% -0.57% (p=0.016 n=50+50) RegexpMatchHard_1K-4 47.4µs ± 4% 47.0µs ± 3% -0.82% (p=0.008 n=50+50) Revcomp-4 414ms ± 7% 412ms ± 7% ~ (p=0.285 n=50+50) Template-4 64.3ms ± 4% 62.7ms ± 3% -2.44% (p=0.000 n=50+50) TimeParse-4 316ns ± 3% 313ns ± 3% ~ (p=0.122 n=50+50) TimeFormat-4 291ns ± 3% 293ns ± 3% +0.80% (p=0.001 n=50+50) [Geo mean] 46.5µs 46.2µs -0.81% name old speed new speed delta GobDecode-4 118MB/s ± 6% 118MB/s ± 5% ~ (p=0.863 n=50+50) GobEncode-4 130MB/s ± 9% 128MB/s ± 8% ~ (p=0.076 n=50+50) Gzip-4 87.4MB/s ± 4% 86.8MB/s ± 7% -0.78% (p=0.002 n=50+50) Gunzip-4 531MB/s ± 5% 533MB/s ± 4% ~ (p=0.093 n=50+50) JSONEncode-4 210MB/s ± 4% 211MB/s ± 5% ~ (p=0.247 n=50+50) JSONDecode-4 39.8MB/s ± 4% 39.9MB/s ± 4% ~ (p=0.654 n=50+50) GoParse-4 18.1MB/s ± 5% 18.2MB/s ± 5% ~ (p=0.493 n=50+50) RegexpMatchEasy0_32-4 410MB/s ± 2% 413MB/s ± 3% +0.86% (p=0.004 n=50+50) RegexpMatchEasy0_1K-4 4.39GB/s ± 3% 4.38GB/s ± 3% ~ (p=0.063 n=50+50) RegexpMatchEasy1_32-4 432MB/s ± 3% 436MB/s ± 3% +1.07% (p=0.000 n=50+50) RegexpMatchEasy1_1K-4 2.77GB/s ± 2% 2.81GB/s ± 4% +1.46% (p=0.000 n=36+50) RegexpMatchMedium_32-4 9.16MB/s ± 3% 9.35MB/s ± 4% +2.09% (p=0.001 n=50+50) RegexpMatchMedium_1K-4 32.5MB/s ± 3% 33.2MB/s ± 3% +2.25% (p=0.000 n=50+50) RegexpMatchHard_32-4 20.4MB/s ± 3% 20.5MB/s ± 2% +0.56% (p=0.017 n=50+50) RegexpMatchHard_1K-4 21.6MB/s ± 4% 21.8MB/s ± 3% +0.83% (p=0.008 n=50+50) Revcomp-4 613MB/s ± 4% 618MB/s ± 7% ~ (p=0.152 n=48+50) Template-4 30.2MB/s ± 4% 30.9MB/s ± 3% +2.49% (p=0.000 n=50+50) [Geo mean] 127MB/s 128MB/s +0.64% Change-Id: If405198283855d75697f66cf894b2bef458f620e Reviewed-on: https://go-review.googlesource.com/135422 Reviewed-by: Keith Randall <khr@golang.org> | ||
|  Ben Shi | 9f2411894b | cmd/compile: optimize arm's bit operation BFC (Bit Field Clear) was introduced in ARMv7, which can simplify ANDconst and BICconst. And this CL implements that optimization. 1. The total size of pkg/android_arm decreases about 3KB, excluding cmd/compile/. 2. There is no regression in the go1 benchmark result, and some cases (FmtFprintfEmpty-4 and RegexpMatchMedium_32-4) even get slight improvement. name old time/op new time/op delta BinaryTree17-4 25.3s ± 1% 25.2s ± 1% ~ (p=0.072 n=30+29) Fannkuch11-4 13.3s ± 0% 13.3s ± 0% +0.13% (p=0.000 n=30+26) FmtFprintfEmpty-4 407ns ± 0% 394ns ± 0% -3.19% (p=0.000 n=26+28) FmtFprintfString-4 664ns ± 0% 662ns ± 0% -0.22% (p=0.000 n=30+30) FmtFprintfInt-4 712ns ± 0% 706ns ± 0% -0.79% (p=0.000 n=30+30) FmtFprintfIntInt-4 1.06µs ± 0% 1.05µs ± 0% -0.38% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 1.16µs ± 0% 1.16µs ± 0% -0.13% (p=0.000 n=30+29) FmtFprintfFloat-4 2.24µs ± 0% 2.23µs ± 0% -0.51% (p=0.000 n=29+21) FmtManyArgs-4 4.09µs ± 0% 4.06µs ± 0% -0.83% (p=0.000 n=28+30) GobDecode-4 55.0ms ± 5% 55.4ms ± 5% ~ (p=0.307 n=30+30) GobEncode-4 51.2ms ± 1% 51.9ms ± 1% +1.23% (p=0.000 n=29+30) Gzip-4 2.64s ± 0% 2.60s ± 0% -1.35% (p=0.000 n=30+29) Gunzip-4 309ms ± 0% 308ms ± 0% -0.27% (p=0.000 n=30+30) HTTPClientServer-4 1.03ms ± 5% 1.02ms ± 4% ~ (p=0.117 n=30+29) JSONEncode-4 101ms ± 2% 101ms ± 2% ~ (p=0.338 n=29+29) JSONDecode-4 383ms ± 2% 382ms ± 2% ~ (p=0.751 n=26+30) Mandelbrot200-4 18.4ms ± 0% 18.4ms ± 0% -0.10% (p=0.000 n=29+29) GoParse-4 22.6ms ± 0% 22.5ms ± 0% -0.39% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 761ns ± 0% 750ns ± 0% -1.47% (p=0.000 n=26+29) RegexpMatchEasy0_1K-4 4.33µs ± 0% 4.34µs ± 0% +0.27% (p=0.000 n=25+28) RegexpMatchEasy1_32-4 809ns ± 0% 795ns ± 0% -1.74% (p=0.000 n=27+25) RegexpMatchEasy1_1K-4 5.54µs ± 0% 5.53µs ± 0% -0.18% (p=0.000 n=29+29) RegexpMatchMedium_32-4 1.11µs ± 0% 1.08µs ± 0% -2.78% (p=0.000 n=27+29) RegexpMatchMedium_1K-4 255µs ± 0% 255µs ± 0% -0.02% (p=0.029 n=30+30) RegexpMatchHard_32-4 14.7µs ± 0% 14.7µs ± 0% -0.28% (p=0.000 n=30+29) RegexpMatchHard_1K-4 439µs ± 0% 439µs ± 0% ~ (p=0.907 n=23+27) Revcomp-4 41.9ms ± 1% 41.9ms ± 1% ~ (p=0.230 n=28+30) Template-4 522ms ± 1% 528ms ± 1% +1.25% (p=0.000 n=30+30) TimeParse-4 3.34µs ± 0% 3.35µs ± 0% +0.23% (p=0.000 n=30+27) TimeFormat-4 6.06µs ± 0% 6.13µs ± 0% +1.08% (p=0.000 n=29+29) [Geo mean] 384µs 382µs -0.37% name old speed new speed delta GobDecode-4 14.0MB/s ± 5% 13.9MB/s ± 5% ~ (p=0.308 n=30+30) GobEncode-4 15.0MB/s ± 1% 14.8MB/s ± 1% -1.22% (p=0.000 n=29+30) Gzip-4 7.36MB/s ± 0% 7.46MB/s ± 0% +1.35% (p=0.000 n=30+30) Gunzip-4 62.8MB/s ± 0% 63.0MB/s ± 0% +0.27% (p=0.000 n=30+30) JSONEncode-4 19.2MB/s ± 2% 19.2MB/s ± 2% ~ (p=0.312 n=29+29) JSONDecode-4 5.05MB/s ± 3% 5.08MB/s ± 2% ~ (p=0.356 n=29+30) GoParse-4 2.56MB/s ± 0% 2.57MB/s ± 0% +0.39% (p=0.000 n=23+27) RegexpMatchEasy0_32-4 42.0MB/s ± 0% 42.6MB/s ± 0% +1.50% (p=0.000 n=26+28) RegexpMatchEasy0_1K-4 236MB/s ± 0% 236MB/s ± 0% -0.27% (p=0.000 n=25+28) RegexpMatchEasy1_32-4 39.6MB/s ± 0% 40.2MB/s ± 0% +1.73% (p=0.000 n=27+27) RegexpMatchEasy1_1K-4 185MB/s ± 0% 185MB/s ± 0% +0.18% (p=0.000 n=29+29) RegexpMatchMedium_32-4 900kB/s ± 0% 920kB/s ± 0% +2.22% (p=0.000 n=29+29) RegexpMatchMedium_1K-4 4.02MB/s ± 0% 4.02MB/s ± 0% +0.07% (p=0.004 n=30+27) RegexpMatchHard_32-4 2.17MB/s ± 0% 2.18MB/s ± 0% +0.46% (p=0.000 n=30+26) RegexpMatchHard_1K-4 2.33MB/s ± 0% 2.33MB/s ± 0% ~ (all equal) Revcomp-4 60.6MB/s ± 1% 60.7MB/s ± 1% ~ (p=0.207 n=28+30) Template-4 3.72MB/s ± 1% 3.67MB/s ± 1% -1.23% (p=0.000 n=30+30) [Geo mean] 12.9MB/s 12.9MB/s +0.29% Change-Id: I07f497f8bb476c950dc555491d00c9066fb64a4e Reviewed-on: https://go-review.googlesource.com/134232 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> | ||
|  Ben Shi | 0a382e0b7f | cmd/compile: optimize ARMv7 code "AND $0xffff0000, Rx" will be encoded to 12 bytes. 1. MOVWload from the constant pool to Rtmp 2. AND Rtmp, Rx 3. a 4-byte item in the constant pool It can be simplified to 8 bytes on ARMv7, since ARMv7 has "MOVW $imm-16, Rx". 1. MOVW $0xffff, Rtmp 2. BIC Rtmp, Rx The above optimization also applies to BICconst, ADDconst and SUBconst. 1. The total size of pkg/android_arm (excluding cmd/compile) decreases about 2KB. 2. The go1 benchmark shows no regression, exlcuding noise. name old time/op new time/op delta BinaryTree17-4 25.5s ± 1% 25.2s ± 1% -0.85% (p=0.000 n=30+30) Fannkuch11-4 13.3s ± 0% 13.3s ± 0% +0.16% (p=0.000 n=24+25) FmtFprintfEmpty-4 397ns ± 0% 394ns ± 0% -0.64% (p=0.000 n=30+30) FmtFprintfString-4 679ns ± 0% 678ns ± 0% ~ (p=0.093 n=30+29) FmtFprintfInt-4 708ns ± 0% 707ns ± 0% -0.19% (p=0.000 n=27+28) FmtFprintfIntInt-4 1.05µs ± 0% 1.05µs ± 0% -0.07% (p=0.001 n=18+30) FmtFprintfPrefixedInt-4 1.16µs ± 0% 1.15µs ± 0% -0.41% (p=0.000 n=29+30) FmtFprintfFloat-4 2.26µs ± 0% 2.23µs ± 1% -1.40% (p=0.000 n=30+30) FmtManyArgs-4 3.96µs ± 0% 3.95µs ± 0% -0.29% (p=0.000 n=29+30) GobDecode-4 52.9ms ± 2% 53.4ms ± 2% +0.92% (p=0.004 n=28+30) GobEncode-4 49.7ms ± 2% 49.8ms ± 2% ~ (p=0.890 n=30+26) Gzip-4 2.61s ± 0% 2.60s ± 0% -0.36% (p=0.000 n=29+29) Gunzip-4 312ms ± 0% 311ms ± 0% -0.13% (p=0.000 n=30+28) HTTPClientServer-4 1.02ms ± 8% 1.00ms ± 7% ~ (p=0.224 n=29+26) JSONEncode-4 125ms ± 1% 124ms ± 3% -1.05% (p=0.000 n=25+30) JSONDecode-4 432ms ± 1% 436ms ± 2% ~ (p=0.277 n=26+30) Mandelbrot200-4 18.4ms ± 0% 18.4ms ± 0% +0.02% (p=0.001 n=28+25) GoParse-4 22.4ms ± 1% 22.3ms ± 1% -0.41% (p=0.000 n=28+28) RegexpMatchEasy0_32-4 697ns ± 0% 706ns ± 0% +1.23% (p=0.000 n=19+30) RegexpMatchEasy0_1K-4 4.27µs ± 0% 4.26µs ± 0% -0.06% (p=0.000 n=30+30) RegexpMatchEasy1_32-4 741ns ± 0% 735ns ± 0% -0.86% (p=0.000 n=26+30) RegexpMatchEasy1_1K-4 5.49µs ± 0% 5.49µs ± 0% -0.03% (p=0.023 n=25+30) RegexpMatchMedium_32-4 1.05µs ± 2% 1.04µs ± 2% ~ (p=0.893 n=30+30) RegexpMatchMedium_1K-4 261µs ± 0% 261µs ± 0% -0.11% (p=0.000 n=29+30) RegexpMatchHard_32-4 14.9µs ± 0% 14.9µs ± 0% -0.36% (p=0.000 n=23+29) RegexpMatchHard_1K-4 446µs ± 0% 445µs ± 0% -0.17% (p=0.000 n=30+29) Revcomp-4 41.6ms ± 1% 41.7ms ± 1% +0.27% (p=0.040 n=28+30) Template-4 531ms ± 0% 532ms ± 1% ~ (p=0.059 n=30+30) TimeParse-4 3.40µs ± 0% 3.33µs ± 0% -2.02% (p=0.000 n=30+30) TimeFormat-4 6.14µs ± 0% 6.11µs ± 0% -0.45% (p=0.000 n=27+29) [Geo mean] 384µs 383µs -0.27% name old speed new speed delta GobDecode-4 14.5MB/s ± 2% 14.4MB/s ± 2% -0.90% (p=0.005 n=28+30) GobEncode-4 15.4MB/s ± 2% 15.4MB/s ± 2% ~ (p=0.741 n=30+25) Gzip-4 7.44MB/s ± 0% 7.47MB/s ± 1% +0.37% (p=0.000 n=25+30) Gunzip-4 62.3MB/s ± 0% 62.4MB/s ± 0% +0.13% (p=0.000 n=30+28) JSONEncode-4 15.5MB/s ± 1% 15.6MB/s ± 3% +1.07% (p=0.000 n=25+30) JSONDecode-4 4.48MB/s ± 0% 4.46MB/s ± 2% ~ (p=0.655 n=23+30) GoParse-4 2.58MB/s ± 1% 2.59MB/s ± 1% +0.42% (p=0.000 n=28+29) RegexpMatchEasy0_32-4 45.9MB/s ± 0% 45.3MB/s ± 0% -1.23% (p=0.000 n=28+30) RegexpMatchEasy0_1K-4 240MB/s ± 0% 240MB/s ± 0% +0.07% (p=0.000 n=30+30) RegexpMatchEasy1_32-4 43.2MB/s ± 0% 43.5MB/s ± 0% +0.85% (p=0.000 n=30+28) RegexpMatchEasy1_1K-4 186MB/s ± 0% 186MB/s ± 0% +0.03% (p=0.026 n=25+30) RegexpMatchMedium_32-4 955kB/s ± 2% 960kB/s ± 2% ~ (p=0.084 n=30+30) RegexpMatchMedium_1K-4 3.92MB/s ± 0% 3.93MB/s ± 0% +0.14% (p=0.000 n=29+30) RegexpMatchHard_32-4 2.14MB/s ± 0% 2.15MB/s ± 0% +0.31% (p=0.000 n=30+26) RegexpMatchHard_1K-4 2.30MB/s ± 0% 2.30MB/s ± 0% ~ (all equal) Revcomp-4 61.1MB/s ± 1% 60.9MB/s ± 1% -0.27% (p=0.039 n=28+30) Template-4 3.66MB/s ± 0% 3.65MB/s ± 1% -0.14% (p=0.045 n=30+30) [Geo mean] 12.8MB/s 12.8MB/s +0.04% Change-Id: I02370e2584b4c041fddd324c97628fd6f0c12183 Reviewed-on: https://go-review.googlesource.com/123179 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> | ||
|  Ben Shi | b75c5c5992 | cmd/compile: optimize AMD64 with more read-modify-write operations 6 more operations which do read-modify-write with a constant source operand are added. 1. The total size of pkg/linux_amd64 decreases about 3KB, excluding cmd/compile. 2. The go1 benckmark shows a slight improvement. name old time/op new time/op delta BinaryTree17-4 2.61s ± 4% 2.67s ± 2% +2.26% (p=0.000 n=30+29) Fannkuch11-4 2.39s ± 2% 2.32s ± 2% -2.67% (p=0.000 n=30+30) FmtFprintfEmpty-4 44.0ns ± 4% 41.7ns ± 4% -5.15% (p=0.000 n=30+30) FmtFprintfString-4 74.2ns ± 4% 72.3ns ± 4% -2.59% (p=0.000 n=30+30) FmtFprintfInt-4 81.7ns ± 3% 78.8ns ± 4% -3.54% (p=0.000 n=27+30) FmtFprintfIntInt-4 130ns ± 4% 124ns ± 5% -4.60% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 154ns ± 3% 152ns ± 3% -1.13% (p=0.012 n=30+30) FmtFprintfFloat-4 215ns ± 4% 212ns ± 5% -1.56% (p=0.002 n=30+30) FmtManyArgs-4 522ns ± 3% 512ns ± 3% -1.84% (p=0.001 n=30+30) GobDecode-4 6.42ms ± 5% 6.49ms ± 7% ~ (p=0.070 n=30+30) GobEncode-4 6.07ms ± 8% 5.98ms ± 8% ~ (p=0.150 n=30+30) Gzip-4 236ms ± 4% 223ms ± 4% -5.57% (p=0.000 n=30+30) Gunzip-4 37.4ms ± 3% 36.7ms ± 4% -2.03% (p=0.000 n=30+30) HTTPClientServer-4 58.7µs ± 1% 58.5µs ± 2% -0.37% (p=0.018 n=30+29) JSONEncode-4 12.0ms ± 4% 12.1ms ± 3% ~ (p=0.112 n=30+30) JSONDecode-4 54.5ms ± 3% 55.5ms ± 4% +1.80% (p=0.006 n=30+30) Mandelbrot200-4 3.78ms ± 4% 3.78ms ± 4% ~ (p=0.173 n=30+30) GoParse-4 3.16ms ± 5% 3.22ms ± 5% +1.75% (p=0.010 n=30+30) RegexpMatchEasy0_32-4 76.6ns ± 1% 75.9ns ± 3% ~ (p=0.672 n=25+30) RegexpMatchEasy0_1K-4 252ns ± 3% 253ns ± 3% +0.57% (p=0.027 n=30+30) RegexpMatchEasy1_32-4 69.8ns ± 4% 70.2ns ± 6% ~ (p=0.539 n=30+30) RegexpMatchEasy1_1K-4 374ns ± 3% 373ns ± 5% ~ (p=0.263 n=30+30) RegexpMatchMedium_32-4 107ns ± 4% 109ns ± 3% ~ (p=0.067 n=30+30) RegexpMatchMedium_1K-4 33.9µs ± 5% 34.1µs ± 4% ~ (p=0.297 n=30+30) RegexpMatchHard_32-4 1.54µs ± 3% 1.56µs ± 4% +1.43% (p=0.002 n=30+30) RegexpMatchHard_1K-4 46.6µs ± 3% 47.0µs ± 3% ~ (p=0.055 n=30+30) Revcomp-4 411ms ± 6% 407ms ± 6% ~ (p=0.219 n=30+30) Template-4 66.8ms ± 3% 64.8ms ± 5% -3.01% (p=0.000 n=30+30) TimeParse-4 312ns ± 2% 319ns ± 3% +2.50% (p=0.000 n=30+30) TimeFormat-4 296ns ± 5% 299ns ± 3% +0.93% (p=0.005 n=30+30) [Geo mean] 47.5µs 47.1µs -0.75% name old speed new speed delta GobDecode-4 120MB/s ± 5% 118MB/s ± 6% ~ (p=0.072 n=30+30) GobEncode-4 127MB/s ± 8% 129MB/s ± 8% ~ (p=0.150 n=30+30) Gzip-4 82.1MB/s ± 4% 87.0MB/s ± 4% +5.90% (p=0.000 n=30+30) Gunzip-4 519MB/s ± 4% 529MB/s ± 4% +2.07% (p=0.001 n=30+30) JSONEncode-4 162MB/s ± 4% 161MB/s ± 3% ~ (p=0.110 n=30+30) JSONDecode-4 35.6MB/s ± 3% 35.0MB/s ± 4% -1.77% (p=0.007 n=30+30) GoParse-4 18.3MB/s ± 4% 18.0MB/s ± 4% -1.72% (p=0.009 n=30+30) RegexpMatchEasy0_32-4 418MB/s ± 1% 422MB/s ± 3% ~ (p=0.645 n=25+30) RegexpMatchEasy0_1K-4 4.06GB/s ± 3% 4.04GB/s ± 3% -0.57% (p=0.033 n=30+30) RegexpMatchEasy1_32-4 459MB/s ± 4% 456MB/s ± 6% ~ (p=0.530 n=30+30) RegexpMatchEasy1_1K-4 2.73GB/s ± 3% 2.75GB/s ± 5% ~ (p=0.279 n=30+30) RegexpMatchMedium_32-4 9.28MB/s ± 5% 9.18MB/s ± 4% ~ (p=0.086 n=30+30) RegexpMatchMedium_1K-4 30.2MB/s ± 4% 30.0MB/s ± 4% ~ (p=0.300 n=30+30) RegexpMatchHard_32-4 20.8MB/s ± 3% 20.5MB/s ± 4% -1.41% (p=0.002 n=30+30) RegexpMatchHard_1K-4 22.0MB/s ± 3% 21.8MB/s ± 3% ~ (p=0.051 n=30+30) Revcomp-4 619MB/s ± 7% 625MB/s ± 7% ~ (p=0.219 n=30+30) Template-4 29.0MB/s ± 3% 29.9MB/s ± 4% +3.11% (p=0.000 n=30+30) [Geo mean] 123MB/s 123MB/s +0.28% Change-Id: I850652cfd53329c1af804b7f57f4393d8097bb0d Reviewed-on: https://go-review.googlesource.com/121135 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> | ||
|  Alberto Donizetti | 188e2bf897 | test/codegen: port arm64 BIC/EON/ORN and masking tests And delete them from asm_test. Change-Id: I24f421b87e8cb4770c887a6dfd58eacd0088947d Reviewed-on: https://go-review.googlesource.com/106056 Reviewed-by: Keith Randall <khr@golang.org> | ||
|  Giovanni Bajo | 79112707bb | cmd/compile: add patterns for bit set/clear/complement on amd64 This patch completes implementation of BT(Q|L), and adds support
for BT(S|R|C)(Q|L).
Example of code changes from time.(*Time).addSec:
        if t.wall&hasMonotonic != 0 {
  0x1073465               488b08                  MOVQ 0(AX), CX
  0x1073468               4889ca                  MOVQ CX, DX
  0x107346b               48c1e93f                SHRQ $0x3f, CX
  0x107346f               48c1e13f                SHLQ $0x3f, CX
  0x1073473               48f7c1ffffffff          TESTQ $-0x1, CX
  0x107347a               746b                    JE 0x10734e7
        if t.wall&hasMonotonic != 0 {
  0x1073435               488b08                  MOVQ 0(AX), CX
  0x1073438               480fbae13f              BTQ $0x3f, CX
  0x107343d               7363                    JAE 0x10734a2
Another example:
                        t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
  0x10734c8               4881e1ffffff3f          ANDQ $0x3fffffff, CX
  0x10734cf               48c1e61e                SHLQ $0x1e, SI
  0x10734d3               4809ce                  ORQ CX, SI
  0x10734d6               48b90000000000000080    MOVQ $0x8000000000000000, CX
  0x10734e0               4809f1                  ORQ SI, CX
  0x10734e3               488908                  MOVQ CX, 0(AX)
                        t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
  0x107348b		4881e2ffffff3f		ANDQ $0x3fffffff, DX
  0x1073492		48c1e61e		SHLQ $0x1e, SI
  0x1073496		4809f2			ORQ SI, DX
  0x1073499		480fbaea3f		BTSQ $0x3f, DX
  0x107349e		488910			MOVQ DX, 0(AX)
Go1 benchmarks seem unaffected, and I would be surprised
otherwise:
name                     old time/op    new time/op     delta
BinaryTree17-4              2.64s ± 4%      2.56s ± 9%  -2.92%  (p=0.008 n=9+9)
Fannkuch11-4                2.90s ± 1%      2.95s ± 3%  +1.76%  (p=0.010 n=10+9)
FmtFprintfEmpty-4          35.3ns ± 1%     34.5ns ± 2%  -2.34%  (p=0.004 n=9+8)
FmtFprintfString-4         57.0ns ± 1%     58.4ns ± 5%  +2.52%  (p=0.029 n=9+10)
FmtFprintfInt-4            59.8ns ± 3%     59.8ns ± 6%    ~     (p=0.565 n=10+10)
FmtFprintfIntInt-4         93.9ns ± 3%     91.2ns ± 5%  -2.94%  (p=0.014 n=10+9)
FmtFprintfPrefixedInt-4     107ns ± 6%      104ns ± 6%    ~     (p=0.099 n=10+10)
FmtFprintfFloat-4           187ns ± 3%      188ns ± 3%    ~     (p=0.505 n=10+9)
FmtManyArgs-4               410ns ± 1%      415ns ± 6%    ~     (p=0.649 n=8+10)
GobDecode-4                5.30ms ± 3%     5.27ms ± 3%    ~     (p=0.436 n=10+10)
GobEncode-4                4.62ms ± 5%     4.47ms ± 2%  -3.24%  (p=0.001 n=9+10)
Gzip-4                      197ms ± 4%      193ms ± 3%    ~     (p=0.123 n=10+10)
Gunzip-4                   30.4ms ± 3%     30.1ms ± 3%    ~     (p=0.481 n=10+10)
HTTPClientServer-4         76.3µs ± 1%     76.0µs ± 1%    ~     (p=0.236 n=8+9)
JSONEncode-4               10.5ms ± 9%     10.3ms ± 3%    ~     (p=0.280 n=10+10)
JSONDecode-4               42.3ms ±10%     41.3ms ± 2%    ~     (p=0.053 n=9+10)
Mandelbrot200-4            3.80ms ± 2%     3.72ms ± 2%  -2.15%  (p=0.001 n=9+10)
GoParse-4                  2.88ms ±10%     2.81ms ± 2%    ~     (p=0.247 n=10+10)
RegexpMatchEasy0_32-4      69.5ns ± 4%     68.6ns ± 2%    ~     (p=0.171 n=10+10)
RegexpMatchEasy0_1K-4       165ns ± 3%      162ns ± 3%    ~     (p=0.137 n=10+10)
RegexpMatchEasy1_32-4      65.7ns ± 6%     64.4ns ± 2%  -2.02%  (p=0.037 n=10+10)
RegexpMatchEasy1_1K-4       278ns ± 2%      279ns ± 3%    ~     (p=0.991 n=8+9)
RegexpMatchMedium_32-4     99.3ns ± 3%     98.5ns ± 4%    ~     (p=0.457 n=10+9)
RegexpMatchMedium_1K-4     30.1µs ± 1%     30.4µs ± 2%    ~     (p=0.173 n=8+10)
RegexpMatchHard_32-4       1.40µs ± 2%     1.41µs ± 4%    ~     (p=0.565 n=10+10)
RegexpMatchHard_1K-4       42.5µs ± 1%     41.5µs ± 3%  -2.13%  (p=0.002 n=8+9)
Revcomp-4                   332ms ± 4%      328ms ± 5%    ~     (p=0.720 n=9+10)
Template-4                 48.3ms ± 2%     49.6ms ± 3%  +2.56%  (p=0.002 n=8+10)
TimeParse-4                 252ns ± 2%      249ns ± 3%    ~     (p=0.116 n=9+10)
TimeFormat-4                262ns ± 4%      252ns ± 3%  -4.01%  (p=0.000 n=9+10)
name                     old speed      new speed       delta
GobDecode-4               145MB/s ± 3%    146MB/s ± 3%    ~     (p=0.436 n=10+10)
GobEncode-4               166MB/s ± 5%    172MB/s ± 2%  +3.28%  (p=0.001 n=9+10)
Gzip-4                   98.6MB/s ± 4%  100.4MB/s ± 3%    ~     (p=0.123 n=10+10)
Gunzip-4                  639MB/s ± 3%    645MB/s ± 3%    ~     (p=0.481 n=10+10)
JSONEncode-4              185MB/s ± 8%    189MB/s ± 3%    ~     (p=0.280 n=10+10)
JSONDecode-4             46.0MB/s ± 9%   47.0MB/s ± 2%  +2.21%  (p=0.046 n=9+10)
GoParse-4                20.1MB/s ± 9%   20.6MB/s ± 2%    ~     (p=0.239 n=10+10)
RegexpMatchEasy0_32-4     460MB/s ± 4%    467MB/s ± 2%    ~     (p=0.165 n=10+10)
RegexpMatchEasy0_1K-4    6.19GB/s ± 3%   6.28GB/s ± 3%    ~     (p=0.165 n=10+10)
RegexpMatchEasy1_32-4     487MB/s ± 5%    497MB/s ± 2%  +2.00%  (p=0.043 n=10+10)
RegexpMatchEasy1_1K-4    3.67GB/s ± 2%   3.67GB/s ± 3%    ~     (p=0.963 n=8+9)
RegexpMatchMedium_32-4   10.1MB/s ± 3%   10.1MB/s ± 4%    ~     (p=0.435 n=10+9)
RegexpMatchMedium_1K-4   34.0MB/s ± 1%   33.7MB/s ± 2%    ~     (p=0.173 n=8+10)
RegexpMatchHard_32-4     22.9MB/s ± 2%   22.7MB/s ± 4%    ~     (p=0.565 n=10+10)
RegexpMatchHard_1K-4     24.0MB/s ± 3%   24.7MB/s ± 3%  +2.64%  (p=0.001 n=9+9)
Revcomp-4                 766MB/s ± 4%    775MB/s ± 5%    ~     (p=0.720 n=9+10)
Template-4               40.2MB/s ± 2%   39.2MB/s ± 3%  -2.47%  (p=0.002 n=8+10)
The rules match ~1800 times during all.bash.
Fixes #18943
Change-Id: I64be1ada34e89c486dfd935bf429b35652117ed4
Reviewed-on: https://go-review.googlesource.com/94766
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org> | ||
|  Alberto Donizetti | 644b2dafc2 | test/codegen: add copyright headers to new codegen files Change-Id: I9fe6572d1043ef9ee09c0925059ded554ad24c6b Reviewed-on: https://go-review.googlesource.com/98215 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> | ||
|  Giovanni Bajo | c9438cb198 | test: add support for code generation tests (asmcheck) The top-level test harness is modified to support a new kind of test: "asmcheck". This is meant to replace asm_test.go as an easier and more readable way to test code generation. I've added a couple of codegen tests to get initial feedback on the syntax. I've created them under a common "codegen" subdirectory, so that it's easier to run them all with "go run run.go -v codegen". The asmcheck syntax allows to insert line comments that can specify a regular expression to match in the assembly code, for multiple architectures (the testsuite will automatically build each testfile multiple times, one per mentioned architecture). Negative matches are unsupported for now, so this cannot fully replace asm_test yet. Change-Id: Ifdbba389f01d55e63e73c99e5f5449e642101d55 Reviewed-on: https://go-review.googlesource.com/97355 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com> |