go/src/cmd/compile/internal/arm/ssa.go

982 lines
28 KiB
Go
Raw Normal View History

// Copyright 2016 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package arm
import (
"fmt"
"math"
cmd/compile: optimize arm's bit operation BFC (Bit Field Clear) was introduced in ARMv7, which can simplify ANDconst and BICconst. And this CL implements that optimization. 1. The total size of pkg/android_arm decreases about 3KB, excluding cmd/compile/. 2. There is no regression in the go1 benchmark result, and some cases (FmtFprintfEmpty-4 and RegexpMatchMedium_32-4) even get slight improvement. name old time/op new time/op delta BinaryTree17-4 25.3s ± 1% 25.2s ± 1% ~ (p=0.072 n=30+29) Fannkuch11-4 13.3s ± 0% 13.3s ± 0% +0.13% (p=0.000 n=30+26) FmtFprintfEmpty-4 407ns ± 0% 394ns ± 0% -3.19% (p=0.000 n=26+28) FmtFprintfString-4 664ns ± 0% 662ns ± 0% -0.22% (p=0.000 n=30+30) FmtFprintfInt-4 712ns ± 0% 706ns ± 0% -0.79% (p=0.000 n=30+30) FmtFprintfIntInt-4 1.06µs ± 0% 1.05µs ± 0% -0.38% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 1.16µs ± 0% 1.16µs ± 0% -0.13% (p=0.000 n=30+29) FmtFprintfFloat-4 2.24µs ± 0% 2.23µs ± 0% -0.51% (p=0.000 n=29+21) FmtManyArgs-4 4.09µs ± 0% 4.06µs ± 0% -0.83% (p=0.000 n=28+30) GobDecode-4 55.0ms ± 5% 55.4ms ± 5% ~ (p=0.307 n=30+30) GobEncode-4 51.2ms ± 1% 51.9ms ± 1% +1.23% (p=0.000 n=29+30) Gzip-4 2.64s ± 0% 2.60s ± 0% -1.35% (p=0.000 n=30+29) Gunzip-4 309ms ± 0% 308ms ± 0% -0.27% (p=0.000 n=30+30) HTTPClientServer-4 1.03ms ± 5% 1.02ms ± 4% ~ (p=0.117 n=30+29) JSONEncode-4 101ms ± 2% 101ms ± 2% ~ (p=0.338 n=29+29) JSONDecode-4 383ms ± 2% 382ms ± 2% ~ (p=0.751 n=26+30) Mandelbrot200-4 18.4ms ± 0% 18.4ms ± 0% -0.10% (p=0.000 n=29+29) GoParse-4 22.6ms ± 0% 22.5ms ± 0% -0.39% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 761ns ± 0% 750ns ± 0% -1.47% (p=0.000 n=26+29) RegexpMatchEasy0_1K-4 4.33µs ± 0% 4.34µs ± 0% +0.27% (p=0.000 n=25+28) RegexpMatchEasy1_32-4 809ns ± 0% 795ns ± 0% -1.74% (p=0.000 n=27+25) RegexpMatchEasy1_1K-4 5.54µs ± 0% 5.53µs ± 0% -0.18% (p=0.000 n=29+29) RegexpMatchMedium_32-4 1.11µs ± 0% 1.08µs ± 0% -2.78% (p=0.000 n=27+29) RegexpMatchMedium_1K-4 255µs ± 0% 255µs ± 0% -0.02% (p=0.029 n=30+30) RegexpMatchHard_32-4 14.7µs ± 0% 14.7µs ± 0% -0.28% (p=0.000 n=30+29) RegexpMatchHard_1K-4 439µs ± 0% 439µs ± 0% ~ (p=0.907 n=23+27) Revcomp-4 41.9ms ± 1% 41.9ms ± 1% ~ (p=0.230 n=28+30) Template-4 522ms ± 1% 528ms ± 1% +1.25% (p=0.000 n=30+30) TimeParse-4 3.34µs ± 0% 3.35µs ± 0% +0.23% (p=0.000 n=30+27) TimeFormat-4 6.06µs ± 0% 6.13µs ± 0% +1.08% (p=0.000 n=29+29) [Geo mean] 384µs 382µs -0.37% name old speed new speed delta GobDecode-4 14.0MB/s ± 5% 13.9MB/s ± 5% ~ (p=0.308 n=30+30) GobEncode-4 15.0MB/s ± 1% 14.8MB/s ± 1% -1.22% (p=0.000 n=29+30) Gzip-4 7.36MB/s ± 0% 7.46MB/s ± 0% +1.35% (p=0.000 n=30+30) Gunzip-4 62.8MB/s ± 0% 63.0MB/s ± 0% +0.27% (p=0.000 n=30+30) JSONEncode-4 19.2MB/s ± 2% 19.2MB/s ± 2% ~ (p=0.312 n=29+29) JSONDecode-4 5.05MB/s ± 3% 5.08MB/s ± 2% ~ (p=0.356 n=29+30) GoParse-4 2.56MB/s ± 0% 2.57MB/s ± 0% +0.39% (p=0.000 n=23+27) RegexpMatchEasy0_32-4 42.0MB/s ± 0% 42.6MB/s ± 0% +1.50% (p=0.000 n=26+28) RegexpMatchEasy0_1K-4 236MB/s ± 0% 236MB/s ± 0% -0.27% (p=0.000 n=25+28) RegexpMatchEasy1_32-4 39.6MB/s ± 0% 40.2MB/s ± 0% +1.73% (p=0.000 n=27+27) RegexpMatchEasy1_1K-4 185MB/s ± 0% 185MB/s ± 0% +0.18% (p=0.000 n=29+29) RegexpMatchMedium_32-4 900kB/s ± 0% 920kB/s ± 0% +2.22% (p=0.000 n=29+29) RegexpMatchMedium_1K-4 4.02MB/s ± 0% 4.02MB/s ± 0% +0.07% (p=0.004 n=30+27) RegexpMatchHard_32-4 2.17MB/s ± 0% 2.18MB/s ± 0% +0.46% (p=0.000 n=30+26) RegexpMatchHard_1K-4 2.33MB/s ± 0% 2.33MB/s ± 0% ~ (all equal) Revcomp-4 60.6MB/s ± 1% 60.7MB/s ± 1% ~ (p=0.207 n=28+30) Template-4 3.72MB/s ± 1% 3.67MB/s ± 1% -1.23% (p=0.000 n=30+30) [Geo mean] 12.9MB/s 12.9MB/s +0.29% Change-Id: I07f497f8bb476c950dc555491d00c9066fb64a4e Reviewed-on: https://go-review.googlesource.com/134232 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-10 08:29:52 +00:00
"math/bits"
"cmd/compile/internal/gc"
"cmd/compile/internal/logopt"
"cmd/compile/internal/ssa"
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
"cmd/compile/internal/types"
"cmd/internal/obj"
"cmd/internal/obj/arm"
cmd/compile: optimize MOVBS/MOVBU/MOVHS/MOVHU on ARMv6 and ARMv7 MOVBS/MOVBU/MOVHS/MOVHU can be optimized with a single instruction on ARMv6 and ARMv7, instead of a pair of left/right shifts. The benchmark tests show big improvement in special cases and a little improvement in total. 1. A special case gets about 29% improvement. name old time/op new time/op delta TypePro-4 3.81ms ± 1% 2.71ms ± 1% -28.97% (p=0.000 n=26+25) The source code of this case can be found at https://github.com/benshi001/ugo1/blob/master/typepromotion_test.go 2. There is a little improvement in the go1 benchmark, excluding the noise. name old time/op new time/op delta BinaryTree17-4 42.1s ± 3% 42.1s ± 2% ~ (p=0.883 n=28+30) Fannkuch11-4 24.3s ± 4% 24.7s ± 7% +1.64% (p=0.026 n=30+30) FmtFprintfEmpty-4 833ns ± 2% 835ns ± 2% ~ (p=0.371 n=26+28) FmtFprintfString-4 1.36µs ± 3% 1.35µs ± 1% ~ (p=0.202 n=26+23) FmtFprintfInt-4 1.42µs ± 3% 1.43µs ± 1% +0.66% (p=0.000 n=26+27) FmtFprintfIntInt-4 2.10µs ± 1% 2.10µs ± 2% ~ (p=0.104 n=25+26) FmtFprintfPrefixedInt-4 2.37µs ± 2% 2.33µs ± 1% -1.75% (p=0.000 n=25+28) FmtFprintfFloat-4 4.50µs ± 0% 4.37µs ± 1% -2.81% (p=0.000 n=23+25) FmtManyArgs-4 8.08µs ± 0% 8.13µs ± 3% ~ (p=0.160 n=23+26) GobDecode-4 102ms ± 4% 103ms ± 4% +1.08% (p=0.001 n=28+26) GobEncode-4 96.0ms ± 2% 95.2ms ± 3% -0.81% (p=0.000 n=24+25) Gzip-4 4.17s ± 3% 4.11s ± 2% -1.45% (p=0.000 n=25+25) Gunzip-4 597ms ± 2% 594ms ± 2% -0.57% (p=0.000 n=24+26) HTTPClientServer-4 708µs ± 4% 708µs ± 4% ~ (p=0.852 n=28+28) JSONEncode-4 241ms ± 1% 245ms ± 3% +1.62% (p=0.000 n=27+28) JSONDecode-4 906ms ± 3% 889ms ± 3% -1.85% (p=0.000 n=23+24) Mandelbrot200-4 41.8ms ± 1% 41.8ms ± 1% ~ (p=0.929 n=25+24) GoParse-4 47.1ms ± 2% 45.3ms ± 4% -3.80% (p=0.000 n=28+24) RegexpMatchEasy0_32-4 1.27µs ± 2% 1.28µs ± 1% +0.77% (p=0.000 n=26+28) RegexpMatchEasy0_1K-4 8.08µs ± 9% 7.83µs ±10% -3.10% (p=0.012 n=26+26) RegexpMatchEasy1_32-4 1.29µs ± 5% 1.29µs ± 2% ~ (p=0.301 n=26+29) RegexpMatchEasy1_1K-4 10.5µs ± 4% 10.3µs ± 5% -1.95% (p=0.003 n=26+26) RegexpMatchMedium_32-4 1.94µs ± 1% 1.95µs ± 1% ~ (p=0.251 n=24+27) RegexpMatchMedium_1K-4 502µs ± 2% 502µs ± 2% ~ (p=0.336 n=25+28) RegexpMatchHard_32-4 26.7µs ± 1% 26.6µs ± 3% ~ (p=0.454 n=27+26) RegexpMatchHard_1K-4 801µs ± 3% 799µs ± 2% ~ (p=0.097 n=24+26) Revcomp-4 73.5ms ± 5% 73.2ms ± 3% ~ (p=0.240 n=26+26) Template-4 1.07s ± 2% 1.05s ± 1% -2.39% (p=0.000 n=26+24) TimeParse-4 6.87µs ± 1% 6.85µs ± 1% ~ (p=0.094 n=28+23) TimeFormat-4 13.4µs ± 1% 13.4µs ± 1% ~ (p=0.664 n=25+29) [Geo mean] 717µs 713µs -0.54% name old speed new speed delta GobDecode-4 7.52MB/s ± 4% 7.44MB/s ± 4% -1.10% (p=0.001 n=28+26) GobEncode-4 7.99MB/s ± 2% 8.06MB/s ± 3% +0.81% (p=0.000 n=24+25) Gzip-4 4.66MB/s ± 3% 4.72MB/s ± 2% +1.43% (p=0.000 n=25+25) Gunzip-4 32.5MB/s ± 2% 32.7MB/s ± 2% +0.56% (p=0.001 n=24+26) JSONEncode-4 8.04MB/s ± 1% 7.92MB/s ± 3% -1.59% (p=0.000 n=27+28) JSONDecode-4 2.14MB/s ± 3% 2.18MB/s ± 3% +1.90% (p=0.000 n=23+24) GoParse-4 1.23MB/s ± 3% 1.28MB/s ± 4% +4.23% (p=0.000 n=30+24) RegexpMatchEasy0_32-4 25.2MB/s ± 2% 25.0MB/s ± 1% -0.76% (p=0.000 n=26+28) RegexpMatchEasy0_1K-4 127MB/s ± 8% 131MB/s ± 9% +3.29% (p=0.012 n=26+26) RegexpMatchEasy1_32-4 24.8MB/s ± 5% 24.8MB/s ± 2% ~ (p=0.339 n=26+29) RegexpMatchEasy1_1K-4 97.9MB/s ± 4% 99.8MB/s ± 5% +1.98% (p=0.004 n=26+26) RegexpMatchMedium_32-4 514kB/s ± 3% 515kB/s ± 3% ~ (p=0.391 n=28+28) RegexpMatchMedium_1K-4 2.04MB/s ± 2% 2.04MB/s ± 2% ~ (p=0.517 n=25+28) RegexpMatchHard_32-4 1.20MB/s ± 3% 1.20MB/s ± 3% ~ (p=0.203 n=28+28) RegexpMatchHard_1K-4 1.28MB/s ± 3% 1.28MB/s ± 2% ~ (p=0.499 n=24+26) Revcomp-4 34.6MB/s ± 4% 34.7MB/s ± 3% ~ (p=0.245 n=26+26) Template-4 1.81MB/s ± 2% 1.85MB/s ± 3% +2.30% (p=0.000 n=26+25) [Geo mean] 6.82MB/s 6.88MB/s +0.84% fixes #20653 Change-Id: Ief0d6e726e517e51ae511325b21ee72598e759ff Reviewed-on: https://go-review.googlesource.com/71992 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-20 03:50:15 +00:00
"cmd/internal/objabi"
)
// loadByType returns the load instruction of the given type.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func loadByType(t *types.Type) obj.As {
if t.IsFloat() {
switch t.Size() {
case 4:
return arm.AMOVF
case 8:
return arm.AMOVD
}
} else {
switch t.Size() {
case 1:
if t.IsSigned() {
return arm.AMOVB
} else {
return arm.AMOVBU
}
case 2:
if t.IsSigned() {
return arm.AMOVH
} else {
return arm.AMOVHU
}
case 4:
return arm.AMOVW
}
}
panic("bad load type")
}
// storeByType returns the store instruction of the given type.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func storeByType(t *types.Type) obj.As {
if t.IsFloat() {
switch t.Size() {
case 4:
return arm.AMOVF
case 8:
return arm.AMOVD
}
} else {
switch t.Size() {
case 1:
return arm.AMOVB
case 2:
return arm.AMOVH
case 4:
return arm.AMOVW
}
}
panic("bad store type")
}
// shift type is used as Offset in obj.TYPE_SHIFT operands to encode shifted register operands
type shift int64
// copied from ../../../internal/obj/util.go:/TYPE_SHIFT
func (v shift) String() string {
op := "<<>>->@>"[((v>>5)&3)<<1:]
if v&(1<<4) != 0 {
// register shift
return fmt.Sprintf("R%d%c%cR%d", v&15, op[0], op[1], (v>>8)&15)
} else {
// constant shift
return fmt.Sprintf("R%d%c%c%d", v&15, op[0], op[1], (v>>7)&31)
}
}
// makeshift encodes a register shifted by a constant
func makeshift(reg int16, typ int64, s int64) shift {
return shift(int64(reg&0xf) | typ | (s&31)<<7)
}
// genshift generates a Prog for r = r0 op (r1 shifted by n)
func genshift(s *gc.SSAGenState, as obj.As, r0, r1, r int16, typ int64, n int64) *obj.Prog {
p := s.Prog(as)
p.From.Type = obj.TYPE_SHIFT
p.From.Offset = int64(makeshift(r1, typ, n))
p.Reg = r0
if r != 0 {
p.To.Type = obj.TYPE_REG
p.To.Reg = r
}
return p
}
// makeregshift encodes a register shifted by a register
func makeregshift(r1 int16, typ int64, r2 int16) shift {
return shift(int64(r1&0xf) | typ | int64(r2&0xf)<<8 | 1<<4)
}
// genregshift generates a Prog for r = r0 op (r1 shifted by r2)
func genregshift(s *gc.SSAGenState, as obj.As, r0, r1, r2, r int16, typ int64) *obj.Prog {
p := s.Prog(as)
p.From.Type = obj.TYPE_SHIFT
p.From.Offset = int64(makeregshift(r1, typ, r2))
p.Reg = r0
if r != 0 {
p.To.Type = obj.TYPE_REG
p.To.Reg = r
}
return p
}
cmd/compile: optimize arm's bit operation BFC (Bit Field Clear) was introduced in ARMv7, which can simplify ANDconst and BICconst. And this CL implements that optimization. 1. The total size of pkg/android_arm decreases about 3KB, excluding cmd/compile/. 2. There is no regression in the go1 benchmark result, and some cases (FmtFprintfEmpty-4 and RegexpMatchMedium_32-4) even get slight improvement. name old time/op new time/op delta BinaryTree17-4 25.3s ± 1% 25.2s ± 1% ~ (p=0.072 n=30+29) Fannkuch11-4 13.3s ± 0% 13.3s ± 0% +0.13% (p=0.000 n=30+26) FmtFprintfEmpty-4 407ns ± 0% 394ns ± 0% -3.19% (p=0.000 n=26+28) FmtFprintfString-4 664ns ± 0% 662ns ± 0% -0.22% (p=0.000 n=30+30) FmtFprintfInt-4 712ns ± 0% 706ns ± 0% -0.79% (p=0.000 n=30+30) FmtFprintfIntInt-4 1.06µs ± 0% 1.05µs ± 0% -0.38% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 1.16µs ± 0% 1.16µs ± 0% -0.13% (p=0.000 n=30+29) FmtFprintfFloat-4 2.24µs ± 0% 2.23µs ± 0% -0.51% (p=0.000 n=29+21) FmtManyArgs-4 4.09µs ± 0% 4.06µs ± 0% -0.83% (p=0.000 n=28+30) GobDecode-4 55.0ms ± 5% 55.4ms ± 5% ~ (p=0.307 n=30+30) GobEncode-4 51.2ms ± 1% 51.9ms ± 1% +1.23% (p=0.000 n=29+30) Gzip-4 2.64s ± 0% 2.60s ± 0% -1.35% (p=0.000 n=30+29) Gunzip-4 309ms ± 0% 308ms ± 0% -0.27% (p=0.000 n=30+30) HTTPClientServer-4 1.03ms ± 5% 1.02ms ± 4% ~ (p=0.117 n=30+29) JSONEncode-4 101ms ± 2% 101ms ± 2% ~ (p=0.338 n=29+29) JSONDecode-4 383ms ± 2% 382ms ± 2% ~ (p=0.751 n=26+30) Mandelbrot200-4 18.4ms ± 0% 18.4ms ± 0% -0.10% (p=0.000 n=29+29) GoParse-4 22.6ms ± 0% 22.5ms ± 0% -0.39% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 761ns ± 0% 750ns ± 0% -1.47% (p=0.000 n=26+29) RegexpMatchEasy0_1K-4 4.33µs ± 0% 4.34µs ± 0% +0.27% (p=0.000 n=25+28) RegexpMatchEasy1_32-4 809ns ± 0% 795ns ± 0% -1.74% (p=0.000 n=27+25) RegexpMatchEasy1_1K-4 5.54µs ± 0% 5.53µs ± 0% -0.18% (p=0.000 n=29+29) RegexpMatchMedium_32-4 1.11µs ± 0% 1.08µs ± 0% -2.78% (p=0.000 n=27+29) RegexpMatchMedium_1K-4 255µs ± 0% 255µs ± 0% -0.02% (p=0.029 n=30+30) RegexpMatchHard_32-4 14.7µs ± 0% 14.7µs ± 0% -0.28% (p=0.000 n=30+29) RegexpMatchHard_1K-4 439µs ± 0% 439µs ± 0% ~ (p=0.907 n=23+27) Revcomp-4 41.9ms ± 1% 41.9ms ± 1% ~ (p=0.230 n=28+30) Template-4 522ms ± 1% 528ms ± 1% +1.25% (p=0.000 n=30+30) TimeParse-4 3.34µs ± 0% 3.35µs ± 0% +0.23% (p=0.000 n=30+27) TimeFormat-4 6.06µs ± 0% 6.13µs ± 0% +1.08% (p=0.000 n=29+29) [Geo mean] 384µs 382µs -0.37% name old speed new speed delta GobDecode-4 14.0MB/s ± 5% 13.9MB/s ± 5% ~ (p=0.308 n=30+30) GobEncode-4 15.0MB/s ± 1% 14.8MB/s ± 1% -1.22% (p=0.000 n=29+30) Gzip-4 7.36MB/s ± 0% 7.46MB/s ± 0% +1.35% (p=0.000 n=30+30) Gunzip-4 62.8MB/s ± 0% 63.0MB/s ± 0% +0.27% (p=0.000 n=30+30) JSONEncode-4 19.2MB/s ± 2% 19.2MB/s ± 2% ~ (p=0.312 n=29+29) JSONDecode-4 5.05MB/s ± 3% 5.08MB/s ± 2% ~ (p=0.356 n=29+30) GoParse-4 2.56MB/s ± 0% 2.57MB/s ± 0% +0.39% (p=0.000 n=23+27) RegexpMatchEasy0_32-4 42.0MB/s ± 0% 42.6MB/s ± 0% +1.50% (p=0.000 n=26+28) RegexpMatchEasy0_1K-4 236MB/s ± 0% 236MB/s ± 0% -0.27% (p=0.000 n=25+28) RegexpMatchEasy1_32-4 39.6MB/s ± 0% 40.2MB/s ± 0% +1.73% (p=0.000 n=27+27) RegexpMatchEasy1_1K-4 185MB/s ± 0% 185MB/s ± 0% +0.18% (p=0.000 n=29+29) RegexpMatchMedium_32-4 900kB/s ± 0% 920kB/s ± 0% +2.22% (p=0.000 n=29+29) RegexpMatchMedium_1K-4 4.02MB/s ± 0% 4.02MB/s ± 0% +0.07% (p=0.004 n=30+27) RegexpMatchHard_32-4 2.17MB/s ± 0% 2.18MB/s ± 0% +0.46% (p=0.000 n=30+26) RegexpMatchHard_1K-4 2.33MB/s ± 0% 2.33MB/s ± 0% ~ (all equal) Revcomp-4 60.6MB/s ± 1% 60.7MB/s ± 1% ~ (p=0.207 n=28+30) Template-4 3.72MB/s ± 1% 3.67MB/s ± 1% -1.23% (p=0.000 n=30+30) [Geo mean] 12.9MB/s 12.9MB/s +0.29% Change-Id: I07f497f8bb476c950dc555491d00c9066fb64a4e Reviewed-on: https://go-review.googlesource.com/134232 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-10 08:29:52 +00:00
// find a (lsb, width) pair for BFC
// lsb must be in [0, 31], width must be in [1, 32 - lsb]
// return (0xffffffff, 0) if v is not a binary like 0...01...10...0
func getBFC(v uint32) (uint32, uint32) {
var m, l uint32
// BFC is not applicable with zero
if v == 0 {
return 0xffffffff, 0
}
// find the lowest set bit, for example l=2 for 0x3ffffffc
l = uint32(bits.TrailingZeros32(v))
// m-1 represents the highest set bit index, for example m=30 for 0x3ffffffc
m = 32 - uint32(bits.LeadingZeros32(v))
// check if v is a binary like 0...01...10...0
if (1<<m)-(1<<l) == v {
// it must be m > l for non-zero v
return l, m - l
}
// invalid
return 0xffffffff, 0
}
func ssaGenValue(s *gc.SSAGenState, v *ssa.Value) {
switch v.Op {
case ssa.OpCopy, ssa.OpARMMOVWreg:
if v.Type.IsMemory() {
return
}
x := v.Args[0].Reg()
y := v.Reg()
if x == y {
return
}
as := arm.AMOVW
if v.Type.IsFloat() {
switch v.Type.Size() {
case 4:
as = arm.AMOVF
case 8:
as = arm.AMOVD
default:
panic("bad float size")
}
}
p := s.Prog(as)
p.From.Type = obj.TYPE_REG
p.From.Reg = x
p.To.Type = obj.TYPE_REG
p.To.Reg = y
case ssa.OpARMMOVWnop:
if v.Reg() != v.Args[0].Reg() {
v.Fatalf("input[0] and output not in same register %s", v.LongString())
}
// nothing to do
case ssa.OpLoadReg:
if v.Type.IsFlags() {
v.Fatalf("load flags not implemented: %v", v.LongString())
return
}
p := s.Prog(loadByType(v.Type))
gc.AddrAuto(&p.From, v.Args[0])
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpStoreReg:
if v.Type.IsFlags() {
v.Fatalf("store flags not implemented: %v", v.LongString())
return
}
p := s.Prog(storeByType(v.Type))
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
gc.AddrAuto(&p.To, v)
case ssa.OpARMADD,
ssa.OpARMADC,
ssa.OpARMSUB,
ssa.OpARMSBC,
ssa.OpARMRSB,
ssa.OpARMAND,
ssa.OpARMOR,
ssa.OpARMXOR,
ssa.OpARMBIC,
ssa.OpARMMUL,
ssa.OpARMADDF,
ssa.OpARMADDD,
ssa.OpARMSUBF,
ssa.OpARMSUBD,
ssa.OpARMSLL,
ssa.OpARMSRL,
ssa.OpARMSRA,
ssa.OpARMMULF,
ssa.OpARMMULD,
cmd/compile: optimize ARM code with NMULF/NMULD NMULF and NMULD are efficient FP instructions, and the go compiler can use them to generate better code. The benchmark tests of my patch did not show general change, but big improvement in special cases. 1.A special test case improved 12.6%. https://github.com/benshi001/ugo1/blob/master/fpmul_test.go name old time/op new time/op delta FPMul-4 398µs ± 1% 348µs ± 1% -12.64% (p=0.000 n=40+40) 2. the compilecmp test showed little change. name old time/op new time/op delta Template 2.30s ± 1% 2.31s ± 1% ~ (p=0.754 n=17+19) Unicode 1.31s ± 3% 1.32s ± 5% ~ (p=0.265 n=20+20) GoTypes 7.73s ± 2% 7.73s ± 1% ~ (p=0.925 n=20+20) Compiler 37.0s ± 1% 37.3s ± 2% +0.79% (p=0.002 n=19+20) SSA 83.8s ± 4% 83.5s ± 2% ~ (p=0.964 n=20+17) Flate 1.43s ± 2% 1.44s ± 1% ~ (p=0.602 n=20+20) GoParser 1.82s ± 2% 1.81s ± 2% ~ (p=0.141 n=19+20) Reflect 5.08s ± 2% 5.08s ± 3% ~ (p=0.835 n=20+19) Tar 2.36s ± 1% 2.35s ± 1% ~ (p=0.195 n=18+17) XML 2.57s ± 2% 2.56s ± 1% ~ (p=0.283 n=20+17) [Geo mean] 4.74s 4.75s +0.05% name old user-time/op new user-time/op delta Template 2.75s ± 2% 2.75s ± 0% ~ (p=0.620 n=20+15) Unicode 1.59s ± 4% 1.60s ± 4% ~ (p=0.479 n=20+19) GoTypes 9.48s ± 1% 9.47s ± 1% ~ (p=0.743 n=20+20) Compiler 45.7s ± 1% 45.7s ± 1% ~ (p=0.482 n=19+20) SSA 109s ± 1% 109s ± 2% ~ (p=0.800 n=18+20) Flate 1.67s ± 3% 1.67s ± 3% ~ (p=0.598 n=19+18) GoParser 2.15s ± 4% 2.13s ± 3% ~ (p=0.153 n=20+20) Reflect 5.95s ± 2% 5.95s ± 2% ~ (p=0.961 n=19+20) Tar 2.93s ± 2% 2.92s ± 3% ~ (p=0.242 n=20+19) XML 3.02s ± 3% 3.04s ± 3% ~ (p=0.233 n=19+18) [Geo mean] 5.74s 5.74s -0.04% name old text-bytes new text-bytes delta HelloSize 588kB ± 0% 588kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. The go1 benchmark showed little change in total. name old time/op new time/op delta BinaryTree17-4 41.8s ± 1% 41.8s ± 1% ~ (p=0.388 n=40+39) Fannkuch11-4 24.1s ± 1% 24.1s ± 1% ~ (p=0.077 n=40+40) FmtFprintfEmpty-4 834ns ± 1% 831ns ± 1% -0.31% (p=0.002 n=40+37) FmtFprintfString-4 1.34µs ± 1% 1.34µs ± 0% ~ (p=0.387 n=40+40) FmtFprintfInt-4 1.44µs ± 1% 1.44µs ± 1% ~ (p=0.421 n=40+40) FmtFprintfIntInt-4 2.09µs ± 0% 2.09µs ± 1% ~ (p=0.589 n=40+39) FmtFprintfPrefixedInt-4 2.32µs ± 1% 2.33µs ± 1% +0.15% (p=0.001 n=40+40) FmtFprintfFloat-4 4.51µs ± 0% 4.44µs ± 1% -1.50% (p=0.000 n=40+40) FmtManyArgs-4 7.94µs ± 0% 7.97µs ± 0% +0.36% (p=0.001 n=32+40) GobDecode-4 104ms ± 1% 102ms ± 2% -1.27% (p=0.000 n=39+37) GobEncode-4 90.5ms ± 1% 90.9ms ± 2% +0.40% (p=0.006 n=37+40) Gzip-4 4.10s ± 2% 4.08s ± 1% -0.30% (p=0.004 n=40+40) Gunzip-4 603ms ± 0% 602ms ± 1% ~ (p=0.303 n=37+40) HTTPClientServer-4 672µs ± 3% 658µs ± 2% -2.08% (p=0.000 n=39+37) JSONEncode-4 238ms ± 1% 239ms ± 0% +0.26% (p=0.001 n=40+25) JSONDecode-4 884ms ± 1% 885ms ± 1% +0.16% (p=0.012 n=40+40) Mandelbrot200-4 49.3ms ± 0% 49.3ms ± 0% ~ (p=0.588 n=40+38) GoParse-4 46.3ms ± 1% 46.4ms ± 2% ~ (p=0.487 n=40+40) RegexpMatchEasy0_32-4 1.28µs ± 1% 1.28µs ± 0% +0.12% (p=0.003 n=40+40) RegexpMatchEasy0_1K-4 7.78µs ± 5% 7.78µs ± 4% ~ (p=0.825 n=40+40) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.29µs ± 0% ~ (p=0.659 n=40+40) RegexpMatchEasy1_1K-4 10.3µs ± 3% 10.4µs ± 2% ~ (p=0.266 n=40+40) RegexpMatchMedium_32-4 2.05µs ± 1% 2.05µs ± 0% -0.18% (p=0.002 n=40+28) RegexpMatchMedium_1K-4 533µs ± 1% 534µs ± 1% ~ (p=0.397 n=37+40) RegexpMatchHard_32-4 28.9µs ± 1% 28.9µs ± 1% -0.22% (p=0.002 n=40+40) RegexpMatchHard_1K-4 868µs ± 1% 870µs ± 1% +0.21% (p=0.015 n=40+40) Revcomp-4 67.3ms ± 1% 67.2ms ± 2% ~ (p=0.262 n=38+39) Template-4 1.07s ± 1% 1.07s ± 1% ~ (p=0.276 n=40+40) TimeParse-4 7.16µs ± 1% 7.16µs ± 1% ~ (p=0.610 n=39+40) TimeFormat-4 13.3µs ± 1% 13.3µs ± 1% ~ (p=0.617 n=38+40) [Geo mean] 720µs 719µs -0.13% name old speed new speed delta GobDecode-4 7.39MB/s ± 1% 7.49MB/s ± 2% +1.25% (p=0.000 n=39+38) GobEncode-4 8.48MB/s ± 1% 8.45MB/s ± 2% -0.40% (p=0.005 n=37+40) Gzip-4 4.74MB/s ± 2% 4.75MB/s ± 1% +0.30% (p=0.018 n=40+40) Gunzip-4 32.2MB/s ± 0% 32.2MB/s ± 1% ~ (p=0.272 n=36+40) JSONEncode-4 8.15MB/s ± 1% 8.13MB/s ± 0% -0.26% (p=0.003 n=40+25) JSONDecode-4 2.19MB/s ± 1% 2.19MB/s ± 1% ~ (p=0.676 n=40+40) GoParse-4 1.25MB/s ± 2% 1.25MB/s ± 2% ~ (p=0.823 n=40+40) RegexpMatchEasy0_32-4 25.1MB/s ± 1% 25.1MB/s ± 0% -0.12% (p=0.006 n=40+40) RegexpMatchEasy0_1K-4 132MB/s ± 5% 132MB/s ± 5% ~ (p=0.821 n=40+40) RegexpMatchEasy1_32-4 24.7MB/s ± 1% 24.7MB/s ± 0% ~ (p=0.630 n=40+40) RegexpMatchEasy1_1K-4 99.1MB/s ± 3% 98.8MB/s ± 2% ~ (p=0.268 n=40+40) RegexpMatchMedium_32-4 487kB/s ± 2% 490kB/s ± 0% +0.51% (p=0.001 n=40+40) RegexpMatchMedium_1K-4 1.92MB/s ± 1% 1.92MB/s ± 1% ~ (p=0.208 n=39+40) RegexpMatchHard_32-4 1.11MB/s ± 1% 1.11MB/s ± 0% +0.36% (p=0.000 n=40+33) RegexpMatchHard_1K-4 1.18MB/s ± 1% 1.18MB/s ± 1% ~ (p=0.207 n=40+37) Revcomp-4 37.8MB/s ± 1% 37.8MB/s ± 2% ~ (p=0.276 n=38+39) Template-4 1.82MB/s ± 1% 1.81MB/s ± 1% ~ (p=0.122 n=38+40) [Geo mean] 6.81MB/s 6.81MB/s +0.06% fixes #19843 Change-Id: Ief3a0c2b15f59d40c7b40f2784eeb71196685b59 Reviewed-on: https://go-review.googlesource.com/61150 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-02 08:14:08 +00:00
ssa.OpARMNMULF,
ssa.OpARMNMULD,
ssa.OpARMDIVF,
ssa.OpARMDIVD:
r := v.Reg()
r1 := v.Args[0].Reg()
r2 := v.Args[1].Reg()
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = r2
p.Reg = r1
p.To.Type = obj.TYPE_REG
p.To.Reg = r
case ssa.OpARMSRR:
genregshift(s, arm.AMOVW, 0, v.Args[0].Reg(), v.Args[1].Reg(), v.Reg(), arm.SHIFT_RR)
case ssa.OpARMMULAF, ssa.OpARMMULAD, ssa.OpARMMULSF, ssa.OpARMMULSD, ssa.OpARMFMULAD:
cmd/compile: optimize ARM code with MULAF/MULSF/MULAD/MULSD The go compiler can generate better ARM code with those more efficient FP instructions. And there is little improvement in total but big improvement in special cases. 1. The size of pkg/linux_arm/math.a shrinks by 2.4%. 2. there is neither improvement nor regression in compilecmp benchmark. name old time/op new time/op delta Template 2.32s ± 2% 2.32s ± 1% ~ (p=1.000 n=9+10) Unicode 1.32s ± 4% 1.32s ± 4% ~ (p=0.912 n=10+10) GoTypes 7.76s ± 1% 7.79s ± 1% ~ (p=0.447 n=9+10) Compiler 37.4s ± 2% 37.2s ± 2% ~ (p=0.218 n=10+10) SSA 84.8s ± 2% 85.0s ± 1% ~ (p=0.604 n=10+9) Flate 1.45s ± 2% 1.44s ± 2% ~ (p=0.075 n=10+10) GoParser 1.82s ± 1% 1.81s ± 1% ~ (p=0.190 n=10+10) Reflect 5.06s ± 1% 5.05s ± 1% ~ (p=0.315 n=10+9) Tar 2.37s ± 1% 2.37s ± 2% ~ (p=0.912 n=10+10) XML 2.56s ± 1% 2.58s ± 2% ~ (p=0.089 n=10+10) [Geo mean] 4.77s 4.77s -0.08% name old user-time/op new user-time/op delta Template 2.74s ± 2% 2.75s ± 2% ~ (p=0.856 n=9+10) Unicode 1.61s ± 4% 1.62s ± 3% ~ (p=0.693 n=10+10) GoTypes 9.55s ± 1% 9.49s ± 2% ~ (p=0.056 n=9+10) Compiler 45.9s ± 1% 45.8s ± 1% ~ (p=0.345 n=9+10) SSA 110s ± 1% 110s ± 1% ~ (p=0.763 n=9+10) Flate 1.68s ± 2% 1.68s ± 3% ~ (p=0.616 n=10+10) GoParser 2.14s ± 4% 2.14s ± 1% ~ (p=0.825 n=10+9) Reflect 5.95s ± 1% 5.97s ± 3% ~ (p=0.951 n=9+10) Tar 2.94s ± 3% 2.93s ± 2% ~ (p=0.359 n=10+10) XML 3.03s ± 3% 3.07s ± 6% ~ (p=0.166 n=10+10) [Geo mean] 5.76s 5.77s +0.12% name old text-bytes new text-bytes delta HelloSize 588kB ± 0% 588kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. The performance of Mandelbrot200 improves 15%, though little improvement in total. name old time/op new time/op delta BinaryTree17-4 41.7s ± 1% 41.7s ± 1% ~ (p=0.264 n=29+23) Fannkuch11-4 24.2s ± 0% 24.1s ± 1% -0.13% (p=0.050 n=30+30) FmtFprintfEmpty-4 826ns ± 1% 824ns ± 1% -0.24% (p=0.038 n=25+30) FmtFprintfString-4 1.38µs ± 1% 1.38µs ± 0% -0.42% (p=0.000 n=27+25) FmtFprintfInt-4 1.46µs ± 1% 1.46µs ± 0% ~ (p=0.060 n=30+23) FmtFprintfIntInt-4 2.11µs ± 1% 2.08µs ± 0% -1.04% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 2.23µs ± 1% 2.22µs ± 1% -0.51% (p=0.000 n=30+30) FmtFprintfFloat-4 4.49µs ± 1% 4.48µs ± 1% -0.22% (p=0.004 n=26+30) FmtManyArgs-4 8.06µs ± 1% 8.12µs ± 1% +0.68% (p=0.000 n=25+30) GobDecode-4 104ms ± 1% 104ms ± 2% ~ (p=0.362 n=29+29) GobEncode-4 92.9ms ± 1% 92.8ms ± 2% ~ (p=0.786 n=30+30) Gzip-4 4.12s ± 1% 4.12s ± 1% ~ (p=0.314 n=30+30) Gunzip-4 602ms ± 1% 603ms ± 1% ~ (p=0.164 n=30+30) HTTPClientServer-4 659µs ± 1% 655µs ± 2% -0.64% (p=0.006 n=25+28) JSONEncode-4 234ms ± 1% 235ms ± 1% +0.29% (p=0.050 n=30+30) JSONDecode-4 912ms ± 0% 911ms ± 0% ~ (p=0.385 n=18+24) Mandelbrot200-4 49.2ms ± 0% 41.7ms ± 0% -15.35% (p=0.000 n=25+27) GoParse-4 46.3ms ± 1% 46.3ms ± 2% ~ (p=0.572 n=30+30) RegexpMatchEasy0_32-4 1.29µs ± 1% 1.27µs ± 0% -1.59% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 7.62µs ± 4% 7.71µs ± 3% ~ (p=0.074 n=30+30) RegexpMatchEasy1_32-4 1.31µs ± 0% 1.30µs ± 1% -0.71% (p=0.000 n=23+30) RegexpMatchEasy1_1K-4 10.3µs ± 3% 10.3µs ± 5% ~ (p=0.105 n=30+30) RegexpMatchMedium_32-4 2.06µs ± 1% 2.06µs ± 1% ~ (p=0.100 n=30+30) RegexpMatchMedium_1K-4 533µs ± 1% 534µs ± 1% ~ (p=0.254 n=29+30) RegexpMatchHard_32-4 28.9µs ± 0% 28.9µs ± 0% ~ (p=0.154 n=30+30) RegexpMatchHard_1K-4 868µs ± 1% 867µs ± 0% ~ (p=0.729 n=30+23) Revcomp-4 66.9ms ± 1% 67.2ms ± 2% ~ (p=0.102 n=28+29) Template-4 1.07s ± 1% 1.06s ± 1% -0.53% (p=0.000 n=30+30) TimeParse-4 7.07µs ± 1% 7.01µs ± 0% -0.85% (p=0.000 n=30+25) TimeFormat-4 13.1µs ± 0% 13.2µs ± 1% +0.77% (p=0.000 n=27+27) [Geo mean] 721µs 716µs -0.70% name old speed new speed delta GobDecode-4 7.38MB/s ± 1% 7.37MB/s ± 2% ~ (p=0.399 n=29+29) GobEncode-4 8.26MB/s ± 1% 8.27MB/s ± 2% ~ (p=0.790 n=30+30) Gzip-4 4.71MB/s ± 1% 4.71MB/s ± 1% ~ (p=0.885 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.2MB/s ± 1% ~ (p=0.190 n=30+30) JSONEncode-4 8.28MB/s ± 1% 8.25MB/s ± 1% ~ (p=0.053 n=30+30) JSONDecode-4 2.13MB/s ± 0% 2.12MB/s ± 1% ~ (p=0.072 n=18+30) GoParse-4 1.25MB/s ± 1% 1.25MB/s ± 2% ~ (p=0.863 n=30+30) RegexpMatchEasy0_32-4 24.8MB/s ± 0% 25.2MB/s ± 1% +1.61% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 134MB/s ± 4% 133MB/s ± 3% ~ (p=0.074 n=30+30) RegexpMatchEasy1_32-4 24.5MB/s ± 0% 24.6MB/s ± 1% +0.72% (p=0.000 n=23+30) RegexpMatchEasy1_1K-4 99.1MB/s ± 3% 99.8MB/s ± 5% ~ (p=0.105 n=30+30) RegexpMatchMedium_32-4 483kB/s ± 1% 487kB/s ± 1% +0.83% (p=0.002 n=30+30) RegexpMatchMedium_1K-4 1.92MB/s ± 1% 1.92MB/s ± 1% ~ (p=0.058 n=30+30) RegexpMatchHard_32-4 1.10MB/s ± 0% 1.11MB/s ± 0% ~ (p=0.804 n=30+30) RegexpMatchHard_1K-4 1.18MB/s ± 0% 1.18MB/s ± 0% ~ (all equal) Revcomp-4 38.0MB/s ± 1% 37.8MB/s ± 2% ~ (p=0.098 n=28+29) Template-4 1.82MB/s ± 1% 1.83MB/s ± 1% +0.55% (p=0.000 n=29+29) [Geo mean] 6.79MB/s 6.79MB/s +0.09% Change-Id: Ia91991c2c5c59c5df712de85a83b13a21c0a554b Reviewed-on: https://go-review.googlesource.com/63770 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-09-14 06:52:51 +00:00
r := v.Reg()
r0 := v.Args[0].Reg()
r1 := v.Args[1].Reg()
r2 := v.Args[2].Reg()
if r != r0 {
v.Fatalf("result and addend are not in the same register: %v", v.LongString())
}
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = r2
p.Reg = r1
p.To.Type = obj.TYPE_REG
p.To.Reg = r
case ssa.OpARMADDS,
ssa.OpARMSUBS:
r := v.Reg0()
r1 := v.Args[0].Reg()
r2 := v.Args[1].Reg()
p := s.Prog(v.Op.Asm())
p.Scond = arm.C_SBIT
p.From.Type = obj.TYPE_REG
p.From.Reg = r2
p.Reg = r1
p.To.Type = obj.TYPE_REG
p.To.Reg = r
case ssa.OpARMSRAcond:
// ARM shift instructions uses only the low-order byte of the shift amount
// generate conditional instructions to deal with large shifts
// flag is already set
// SRA.HS $31, Rarg0, Rdst // shift 31 bits to get the sign bit
// SRA.LO Rarg1, Rarg0, Rdst
r := v.Reg()
r1 := v.Args[0].Reg()
r2 := v.Args[1].Reg()
p := s.Prog(arm.ASRA)
p.Scond = arm.C_SCOND_HS
p.From.Type = obj.TYPE_CONST
p.From.Offset = 31
p.Reg = r1
p.To.Type = obj.TYPE_REG
p.To.Reg = r
p = s.Prog(arm.ASRA)
p.Scond = arm.C_SCOND_LO
p.From.Type = obj.TYPE_REG
p.From.Reg = r2
p.Reg = r1
p.To.Type = obj.TYPE_REG
p.To.Reg = r
cmd/compile: optimized ARM code with BFX/BFXU BFX&BFXU were introduced in ARMv6T2. A single BFX or BFXU is more efficiently than a pair of left-shift/right-shift in bit field extraction. This patch implements this optimization. And the benchmark tests show big improvement in special cases and little change in total. 1. There is big improvement in a special test case. name old time/op new time/op delta BFX-4 665µs ± 1% 595µs ± 0% -10.61% (p=0.000 n=20+20) (The test case: https://github.com/benshi001/ugo1/blob/master/bfx_test.go) 2. The compilecmp benchmark shows no regression. name old time/op new time/op delta Template 2.33s ± 2% 2.34s ± 2% ~ (p=0.356 n=9+10) Unicode 1.32s ± 2% 1.30s ± 2% ~ (p=0.139 n=9+8) GoTypes 7.77s ± 1% 7.76s ± 1% ~ (p=0.780 n=10+9) Compiler 37.3s ± 1% 37.1s ± 1% ~ (p=0.211 n=10+9) SSA 84.3s ± 2% 84.3s ± 2% ~ (p=0.842 n=10+9) Flate 1.45s ± 1% 1.45s ± 3% ~ (p=0.853 n=10+10) GoParser 1.83s ± 2% 1.83s ± 2% ~ (p=0.739 n=10+10) Reflect 5.08s ± 2% 5.09s ± 2% ~ (p=0.720 n=9+10) Tar 2.44s ± 1% 2.44s ± 2% ~ (p=0.684 n=10+10) XML 2.62s ± 2% 2.62s ± 2% ~ (p=0.529 n=10+10) [Geo mean] 4.80s 4.79s -0.06% name old user-time/op new user-time/op delta Template 2.76s ± 2% 2.75s ± 3% ~ (p=0.893 n=10+10) Unicode 1.63s ± 1% 1.60s ± 1% -2.07% (p=0.000 n=8+9) GoTypes 9.54s ± 1% 9.52s ± 1% ~ (p=0.215 n=10+10) Compiler 46.0s ± 1% 46.0s ± 1% ~ (p=0.853 n=10+10) SSA 110s ± 1% 110s ± 1% ~ (p=0.838 n=10+10) Flate 1.69s ± 3% 1.69s ± 5% ~ (p=0.957 n=10+10) GoParser 2.15s ± 2% 2.15s ± 2% ~ (p=0.749 n=10+10) Reflect 6.03s ± 1% 5.99s ± 2% ~ (p=0.060 n=9+10) Tar 3.02s ± 2% 2.99s ± 2% ~ (p=0.214 n=10+10) XML 3.10s ± 2% 3.08s ± 2% ~ (p=0.732 n=9+10) [Geo mean] 5.82s 5.79s -0.41% name old text-bytes new text-bytes delta HelloSize 589kB ± 0% 589kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 76.9kB ± 0% 76.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. The go1 benchmark shows little change in total. (excluding noise) name old time/op new time/op delta BinaryTree17-4 41.5s ± 1% 41.6s ± 1% ~ (p=0.373 n=30+26) Fannkuch11-4 23.6s ± 1% 23.6s ± 1% +0.28% (p=0.003 n=29+30) FmtFprintfEmpty-4 826ns ± 1% 827ns ± 1% ~ (p=0.155 n=30+30) FmtFprintfString-4 1.35µs ± 1% 1.35µs ± 1% ~ (p=0.499 n=30+30) FmtFprintfInt-4 1.43µs ± 1% 1.41µs ± 1% -1.19% (p=0.000 n=30+30) FmtFprintfIntInt-4 2.15µs ± 1% 2.11µs ± 1% -1.78% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 2.21µs ± 1% 2.21µs ± 1% ~ (p=0.881 n=30+30) FmtFprintfFloat-4 4.41µs ± 1% 4.44µs ± 0% +0.64% (p=0.000 n=30+30) FmtManyArgs-4 8.06µs ± 1% 8.06µs ± 0% ~ (p=0.871 n=30+30) GobDecode-4 103ms ± 1% 104ms ± 2% +0.54% (p=0.013 n=28+29) GobEncode-4 92.4ms ± 1% 92.6ms ± 1% ~ (p=0.447 n=30+29) Gzip-4 4.17s ± 1% 4.06s ± 1% -2.56% (p=0.000 n=29+30) Gunzip-4 603ms ± 1% 602ms ± 1% ~ (p=0.423 n=30+30) HTTPClientServer-4 688µs ± 2% 674µs ± 3% -2.09% (p=0.000 n=29+30) JSONEncode-4 237ms ± 1% 237ms ± 1% ~ (p=0.061 n=29+30) JSONDecode-4 907ms ± 1% 910ms ± 1% ~ (p=0.061 n=30+30) Mandelbrot200-4 41.7ms ± 0% 41.7ms ± 0% +0.19% (p=0.000 n=24+20) GoParse-4 45.7ms ± 2% 45.5ms ± 2% -0.29% (p=0.005 n=30+30) RegexpMatchEasy0_32-4 1.27µs ± 0% 1.27µs ± 0% +0.12% (p=0.031 n=30+30) RegexpMatchEasy0_1K-4 7.77µs ± 4% 7.73µs ± 3% ~ (p=0.169 n=30+30) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.29µs ± 1% ~ (p=0.126 n=30+30) RegexpMatchEasy1_1K-4 10.4µs ± 3% 10.3µs ± 2% -1.32% (p=0.004 n=30+29) RegexpMatchMedium_32-4 2.06µs ± 0% 2.06µs ± 0% ~ (p=0.071 n=30+30) RegexpMatchMedium_1K-4 531µs ± 1% 530µs ± 0% ~ (p=0.121 n=30+23) RegexpMatchHard_32-4 28.7µs ± 1% 28.6µs ± 1% -0.21% (p=0.001 n=30+27) RegexpMatchHard_1K-4 860µs ± 1% 857µs ± 1% ~ (p=0.105 n=30+27) Revcomp-4 67.3ms ± 2% 67.3ms ± 2% ~ (p=0.805 n=29+29) Template-4 1.08s ± 1% 1.08s ± 1% ~ (p=0.260 n=30+30) TimeParse-4 7.04µs ± 0% 7.04µs ± 0% ~ (p=0.315 n=30+30) TimeFormat-4 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.077 n=30+30) [Geo mean] 715µs 713µs -0.30% name old speed new speed delta GobDecode-4 7.42MB/s ± 1% 7.38MB/s ± 2% -0.54% (p=0.011 n=28+29) GobEncode-4 8.30MB/s ± 1% 8.29MB/s ± 1% ~ (p=0.484 n=30+29) Gzip-4 4.65MB/s ± 2% 4.78MB/s ± 1% +2.73% (p=0.000 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.2MB/s ± 1% ~ (p=0.357 n=30+30) JSONEncode-4 8.18MB/s ± 1% 8.19MB/s ± 1% ~ (p=0.052 n=29+30) JSONDecode-4 2.14MB/s ± 1% 2.13MB/s ± 1% ~ (p=0.074 n=30+29) GoParse-4 1.27MB/s ± 1% 1.27MB/s ± 2% ~ (p=0.618 n=24+30) RegexpMatchEasy0_32-4 25.2MB/s ± 0% 25.2MB/s ± 0% -0.12% (p=0.031 n=30+30) RegexpMatchEasy0_1K-4 132MB/s ± 5% 132MB/s ± 2% ~ (p=0.171 n=30+30) RegexpMatchEasy1_32-4 24.8MB/s ± 1% 24.9MB/s ± 1% ~ (p=0.106 n=30+30) RegexpMatchEasy1_1K-4 98.4MB/s ± 3% 99.6MB/s ± 4% +1.19% (p=0.011 n=30+30) RegexpMatchMedium_32-4 483kB/s ± 1% 484kB/s ± 1% ~ (p=0.426 n=30+30) RegexpMatchMedium_1K-4 1.93MB/s ± 1% 1.93MB/s ± 0% ~ (p=0.157 n=30+17) RegexpMatchHard_32-4 1.12MB/s ± 1% 1.12MB/s ± 0% +0.33% (p=0.001 n=30+24) RegexpMatchHard_1K-4 1.19MB/s ± 1% 1.19MB/s ± 1% ~ (p=0.290 n=30+30) Revcomp-4 37.8MB/s ± 2% 37.8MB/s ± 1% ~ (p=0.815 n=29+29) Template-4 1.80MB/s ± 1% 1.80MB/s ± 1% ~ (p=0.586 n=30+30) [Geo mean] 6.80MB/s 6.81MB/s +0.25% fixes #20966 Change-Id: Idb5567bbe988c875315b8c98c128957cd474ccc5 Reviewed-on: https://go-review.googlesource.com/64950 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com>
2017-09-20 08:48:34 +00:00
case ssa.OpARMBFX, ssa.OpARMBFXU:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
p.From.Offset = v.AuxInt >> 8
p.SetFrom3(obj.Addr{Type: obj.TYPE_CONST, Offset: v.AuxInt & 0xff})
p.Reg = v.Args[0].Reg()
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
cmd/compile: optimize arm's bit operation BFC (Bit Field Clear) was introduced in ARMv7, which can simplify ANDconst and BICconst. And this CL implements that optimization. 1. The total size of pkg/android_arm decreases about 3KB, excluding cmd/compile/. 2. There is no regression in the go1 benchmark result, and some cases (FmtFprintfEmpty-4 and RegexpMatchMedium_32-4) even get slight improvement. name old time/op new time/op delta BinaryTree17-4 25.3s ± 1% 25.2s ± 1% ~ (p=0.072 n=30+29) Fannkuch11-4 13.3s ± 0% 13.3s ± 0% +0.13% (p=0.000 n=30+26) FmtFprintfEmpty-4 407ns ± 0% 394ns ± 0% -3.19% (p=0.000 n=26+28) FmtFprintfString-4 664ns ± 0% 662ns ± 0% -0.22% (p=0.000 n=30+30) FmtFprintfInt-4 712ns ± 0% 706ns ± 0% -0.79% (p=0.000 n=30+30) FmtFprintfIntInt-4 1.06µs ± 0% 1.05µs ± 0% -0.38% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 1.16µs ± 0% 1.16µs ± 0% -0.13% (p=0.000 n=30+29) FmtFprintfFloat-4 2.24µs ± 0% 2.23µs ± 0% -0.51% (p=0.000 n=29+21) FmtManyArgs-4 4.09µs ± 0% 4.06µs ± 0% -0.83% (p=0.000 n=28+30) GobDecode-4 55.0ms ± 5% 55.4ms ± 5% ~ (p=0.307 n=30+30) GobEncode-4 51.2ms ± 1% 51.9ms ± 1% +1.23% (p=0.000 n=29+30) Gzip-4 2.64s ± 0% 2.60s ± 0% -1.35% (p=0.000 n=30+29) Gunzip-4 309ms ± 0% 308ms ± 0% -0.27% (p=0.000 n=30+30) HTTPClientServer-4 1.03ms ± 5% 1.02ms ± 4% ~ (p=0.117 n=30+29) JSONEncode-4 101ms ± 2% 101ms ± 2% ~ (p=0.338 n=29+29) JSONDecode-4 383ms ± 2% 382ms ± 2% ~ (p=0.751 n=26+30) Mandelbrot200-4 18.4ms ± 0% 18.4ms ± 0% -0.10% (p=0.000 n=29+29) GoParse-4 22.6ms ± 0% 22.5ms ± 0% -0.39% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 761ns ± 0% 750ns ± 0% -1.47% (p=0.000 n=26+29) RegexpMatchEasy0_1K-4 4.33µs ± 0% 4.34µs ± 0% +0.27% (p=0.000 n=25+28) RegexpMatchEasy1_32-4 809ns ± 0% 795ns ± 0% -1.74% (p=0.000 n=27+25) RegexpMatchEasy1_1K-4 5.54µs ± 0% 5.53µs ± 0% -0.18% (p=0.000 n=29+29) RegexpMatchMedium_32-4 1.11µs ± 0% 1.08µs ± 0% -2.78% (p=0.000 n=27+29) RegexpMatchMedium_1K-4 255µs ± 0% 255µs ± 0% -0.02% (p=0.029 n=30+30) RegexpMatchHard_32-4 14.7µs ± 0% 14.7µs ± 0% -0.28% (p=0.000 n=30+29) RegexpMatchHard_1K-4 439µs ± 0% 439µs ± 0% ~ (p=0.907 n=23+27) Revcomp-4 41.9ms ± 1% 41.9ms ± 1% ~ (p=0.230 n=28+30) Template-4 522ms ± 1% 528ms ± 1% +1.25% (p=0.000 n=30+30) TimeParse-4 3.34µs ± 0% 3.35µs ± 0% +0.23% (p=0.000 n=30+27) TimeFormat-4 6.06µs ± 0% 6.13µs ± 0% +1.08% (p=0.000 n=29+29) [Geo mean] 384µs 382µs -0.37% name old speed new speed delta GobDecode-4 14.0MB/s ± 5% 13.9MB/s ± 5% ~ (p=0.308 n=30+30) GobEncode-4 15.0MB/s ± 1% 14.8MB/s ± 1% -1.22% (p=0.000 n=29+30) Gzip-4 7.36MB/s ± 0% 7.46MB/s ± 0% +1.35% (p=0.000 n=30+30) Gunzip-4 62.8MB/s ± 0% 63.0MB/s ± 0% +0.27% (p=0.000 n=30+30) JSONEncode-4 19.2MB/s ± 2% 19.2MB/s ± 2% ~ (p=0.312 n=29+29) JSONDecode-4 5.05MB/s ± 3% 5.08MB/s ± 2% ~ (p=0.356 n=29+30) GoParse-4 2.56MB/s ± 0% 2.57MB/s ± 0% +0.39% (p=0.000 n=23+27) RegexpMatchEasy0_32-4 42.0MB/s ± 0% 42.6MB/s ± 0% +1.50% (p=0.000 n=26+28) RegexpMatchEasy0_1K-4 236MB/s ± 0% 236MB/s ± 0% -0.27% (p=0.000 n=25+28) RegexpMatchEasy1_32-4 39.6MB/s ± 0% 40.2MB/s ± 0% +1.73% (p=0.000 n=27+27) RegexpMatchEasy1_1K-4 185MB/s ± 0% 185MB/s ± 0% +0.18% (p=0.000 n=29+29) RegexpMatchMedium_32-4 900kB/s ± 0% 920kB/s ± 0% +2.22% (p=0.000 n=29+29) RegexpMatchMedium_1K-4 4.02MB/s ± 0% 4.02MB/s ± 0% +0.07% (p=0.004 n=30+27) RegexpMatchHard_32-4 2.17MB/s ± 0% 2.18MB/s ± 0% +0.46% (p=0.000 n=30+26) RegexpMatchHard_1K-4 2.33MB/s ± 0% 2.33MB/s ± 0% ~ (all equal) Revcomp-4 60.6MB/s ± 1% 60.7MB/s ± 1% ~ (p=0.207 n=28+30) Template-4 3.72MB/s ± 1% 3.67MB/s ± 1% -1.23% (p=0.000 n=30+30) [Geo mean] 12.9MB/s 12.9MB/s +0.29% Change-Id: I07f497f8bb476c950dc555491d00c9066fb64a4e Reviewed-on: https://go-review.googlesource.com/134232 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-10 08:29:52 +00:00
case ssa.OpARMANDconst, ssa.OpARMBICconst:
// try to optimize ANDconst and BICconst to BFC, which saves bytes and ticks
// BFC is only available on ARMv7, and its result and source are in the same register
if objabi.GOARM == 7 && v.Reg() == v.Args[0].Reg() {
var val uint32
if v.Op == ssa.OpARMANDconst {
val = ^uint32(v.AuxInt)
} else { // BICconst
val = uint32(v.AuxInt)
}
lsb, width := getBFC(val)
// omit BFC for ARM's imm12
if 8 < width && width < 24 {
p := s.Prog(arm.ABFC)
p.From.Type = obj.TYPE_CONST
p.From.Offset = int64(width)
p.SetFrom3(obj.Addr{Type: obj.TYPE_CONST, Offset: int64(lsb)})
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
break
}
}
// fall back to ordinary form
fallthrough
case ssa.OpARMADDconst,
ssa.OpARMADCconst,
ssa.OpARMSUBconst,
ssa.OpARMSBCconst,
ssa.OpARMRSBconst,
ssa.OpARMRSCconst,
ssa.OpARMORconst,
ssa.OpARMXORconst,
ssa.OpARMSLLconst,
ssa.OpARMSRLconst,
ssa.OpARMSRAconst:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
p.From.Offset = v.AuxInt
p.Reg = v.Args[0].Reg()
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpARMADDSconst,
ssa.OpARMSUBSconst,
ssa.OpARMRSBSconst:
p := s.Prog(v.Op.Asm())
p.Scond = arm.C_SBIT
p.From.Type = obj.TYPE_CONST
p.From.Offset = v.AuxInt
p.Reg = v.Args[0].Reg()
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg0()
case ssa.OpARMSRRconst:
genshift(s, arm.AMOVW, 0, v.Args[0].Reg(), v.Reg(), arm.SHIFT_RR, v.AuxInt)
case ssa.OpARMADDshiftLL,
ssa.OpARMADCshiftLL,
ssa.OpARMSUBshiftLL,
ssa.OpARMSBCshiftLL,
ssa.OpARMRSBshiftLL,
ssa.OpARMRSCshiftLL,
ssa.OpARMANDshiftLL,
ssa.OpARMORshiftLL,
ssa.OpARMXORshiftLL,
ssa.OpARMBICshiftLL:
genshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Reg(), arm.SHIFT_LL, v.AuxInt)
case ssa.OpARMADDSshiftLL,
ssa.OpARMSUBSshiftLL,
ssa.OpARMRSBSshiftLL:
p := genshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Reg0(), arm.SHIFT_LL, v.AuxInt)
p.Scond = arm.C_SBIT
case ssa.OpARMADDshiftRL,
ssa.OpARMADCshiftRL,
ssa.OpARMSUBshiftRL,
ssa.OpARMSBCshiftRL,
ssa.OpARMRSBshiftRL,
ssa.OpARMRSCshiftRL,
ssa.OpARMANDshiftRL,
ssa.OpARMORshiftRL,
ssa.OpARMXORshiftRL,
ssa.OpARMBICshiftRL:
genshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Reg(), arm.SHIFT_LR, v.AuxInt)
case ssa.OpARMADDSshiftRL,
ssa.OpARMSUBSshiftRL,
ssa.OpARMRSBSshiftRL:
p := genshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Reg0(), arm.SHIFT_LR, v.AuxInt)
p.Scond = arm.C_SBIT
case ssa.OpARMADDshiftRA,
ssa.OpARMADCshiftRA,
ssa.OpARMSUBshiftRA,
ssa.OpARMSBCshiftRA,
ssa.OpARMRSBshiftRA,
ssa.OpARMRSCshiftRA,
ssa.OpARMANDshiftRA,
ssa.OpARMORshiftRA,
ssa.OpARMXORshiftRA,
ssa.OpARMBICshiftRA:
genshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Reg(), arm.SHIFT_AR, v.AuxInt)
case ssa.OpARMADDSshiftRA,
ssa.OpARMSUBSshiftRA,
ssa.OpARMRSBSshiftRA:
p := genshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Reg0(), arm.SHIFT_AR, v.AuxInt)
p.Scond = arm.C_SBIT
case ssa.OpARMXORshiftRR:
genshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Reg(), arm.SHIFT_RR, v.AuxInt)
case ssa.OpARMMVNshiftLL:
genshift(s, v.Op.Asm(), 0, v.Args[0].Reg(), v.Reg(), arm.SHIFT_LL, v.AuxInt)
case ssa.OpARMMVNshiftRL:
genshift(s, v.Op.Asm(), 0, v.Args[0].Reg(), v.Reg(), arm.SHIFT_LR, v.AuxInt)
case ssa.OpARMMVNshiftRA:
genshift(s, v.Op.Asm(), 0, v.Args[0].Reg(), v.Reg(), arm.SHIFT_AR, v.AuxInt)
case ssa.OpARMMVNshiftLLreg:
genregshift(s, v.Op.Asm(), 0, v.Args[0].Reg(), v.Args[1].Reg(), v.Reg(), arm.SHIFT_LL)
case ssa.OpARMMVNshiftRLreg:
genregshift(s, v.Op.Asm(), 0, v.Args[0].Reg(), v.Args[1].Reg(), v.Reg(), arm.SHIFT_LR)
case ssa.OpARMMVNshiftRAreg:
genregshift(s, v.Op.Asm(), 0, v.Args[0].Reg(), v.Args[1].Reg(), v.Reg(), arm.SHIFT_AR)
case ssa.OpARMADDshiftLLreg,
ssa.OpARMADCshiftLLreg,
ssa.OpARMSUBshiftLLreg,
ssa.OpARMSBCshiftLLreg,
ssa.OpARMRSBshiftLLreg,
ssa.OpARMRSCshiftLLreg,
ssa.OpARMANDshiftLLreg,
ssa.OpARMORshiftLLreg,
ssa.OpARMXORshiftLLreg,
ssa.OpARMBICshiftLLreg:
genregshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Args[2].Reg(), v.Reg(), arm.SHIFT_LL)
case ssa.OpARMADDSshiftLLreg,
ssa.OpARMSUBSshiftLLreg,
ssa.OpARMRSBSshiftLLreg:
p := genregshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Args[2].Reg(), v.Reg0(), arm.SHIFT_LL)
p.Scond = arm.C_SBIT
case ssa.OpARMADDshiftRLreg,
ssa.OpARMADCshiftRLreg,
ssa.OpARMSUBshiftRLreg,
ssa.OpARMSBCshiftRLreg,
ssa.OpARMRSBshiftRLreg,
ssa.OpARMRSCshiftRLreg,
ssa.OpARMANDshiftRLreg,
ssa.OpARMORshiftRLreg,
ssa.OpARMXORshiftRLreg,
ssa.OpARMBICshiftRLreg:
genregshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Args[2].Reg(), v.Reg(), arm.SHIFT_LR)
case ssa.OpARMADDSshiftRLreg,
ssa.OpARMSUBSshiftRLreg,
ssa.OpARMRSBSshiftRLreg:
p := genregshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Args[2].Reg(), v.Reg0(), arm.SHIFT_LR)
p.Scond = arm.C_SBIT
case ssa.OpARMADDshiftRAreg,
ssa.OpARMADCshiftRAreg,
ssa.OpARMSUBshiftRAreg,
ssa.OpARMSBCshiftRAreg,
ssa.OpARMRSBshiftRAreg,
ssa.OpARMRSCshiftRAreg,
ssa.OpARMANDshiftRAreg,
ssa.OpARMORshiftRAreg,
ssa.OpARMXORshiftRAreg,
ssa.OpARMBICshiftRAreg:
genregshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Args[2].Reg(), v.Reg(), arm.SHIFT_AR)
case ssa.OpARMADDSshiftRAreg,
ssa.OpARMSUBSshiftRAreg,
ssa.OpARMRSBSshiftRAreg:
p := genregshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Args[2].Reg(), v.Reg0(), arm.SHIFT_AR)
p.Scond = arm.C_SBIT
case ssa.OpARMHMUL,
ssa.OpARMHMULU:
// 32-bit high multiplication
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
p.Reg = v.Args[1].Reg()
p.To.Type = obj.TYPE_REGREG
p.To.Reg = v.Reg()
p.To.Offset = arm.REGTMP // throw away low 32-bit into tmp register
case ssa.OpARMMULLU:
// 32-bit multiplication, results 64-bit, high 32-bit in out0, low 32-bit in out1
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
p.Reg = v.Args[1].Reg()
p.To.Type = obj.TYPE_REGREG
p.To.Reg = v.Reg0() // high 32-bit
p.To.Offset = int64(v.Reg1()) // low 32-bit
cmd/compile: optimize ARM with MULS MULS was introduced in ARMv7 and corresponding to MULA. This patch duplicated all MULA related SSA rules with MULS. Here was the contrast test result against the original go compiler. There was no improvement in total, but big improvement in special cases. 1. A specific test case accelerated 18.62%. (https://github.com/benshi001/ugo1/blob/master/mulsub_test.go) name old time/op new time/op delta MulSub-4 270µs ± 0% 219µs ± 0% -18.62% (p=0.000 n=35+40) 2. Total size of all .a files in pkg/ shrank by 0.002%. 3. The compilecmp benchmark showed no decline. name old time/op new time/op delta Template 2.37s ± 3% 2.36s ± 1% ~ (p=0.233 n=19+18) Unicode 1.32s ± 2% 1.34s ± 5% +1.32% (p=0.011 n=20+18) GoTypes 7.88s ± 1% 7.87s ± 1% ~ (p=0.758 n=20+20) Compiler 37.5s ± 1% 37.6s ± 1% ~ (p=0.194 n=20+19) SSA 83.7s ± 2% 83.5s ± 2% ~ (p=0.569 n=20+19) Flate 1.46s ± 3% 1.45s ± 1% ~ (p=0.619 n=20+17) GoParser 1.87s ± 2% 1.85s ± 1% -0.58% (p=0.048 n=20+18) Reflect 5.10s ± 2% 5.11s ± 2% ~ (p=0.365 n=19+20) Tar 1.78s ± 2% 1.78s ± 2% ~ (p=0.531 n=19+20) XML 2.62s ± 1% 2.61s ± 2% ~ (p=0.057 n=17+19) [Geo mean] 4.68s 4.67s -0.07% name old user-time/op new user-time/op delta Template 2.80s ± 1% 2.79s ± 2% ~ (p=0.686 n=17+20) Unicode 1.61s ± 4% 1.63s ± 6% ~ (p=0.222 n=20+20) GoTypes 9.59s ± 1% 9.60s ± 1% ~ (p=0.482 n=17+20) Compiler 46.1s ± 1% 46.2s ± 1% ~ (p=0.373 n=20+18) SSA 108s ± 1% 108s ± 2% ~ (p=0.784 n=20+20) Flate 1.68s ± 3% 1.69s ± 3% ~ (p=0.335 n=20+19) GoParser 2.20s ± 4% 2.19s ± 2% ~ (p=0.844 n=20+18) Reflect 5.97s ± 3% 6.01s ± 2% ~ (p=0.184 n=20+20) Tar 2.11s ± 2% 2.11s ± 4% ~ (p=0.961 n=19+20) XML 3.07s ± 1% 3.07s ± 3% ~ (p=0.786 n=16+19) [Geo mean] 5.61s 5.62s +0.19% name old text-bytes new text-bytes delta HelloSize 586kB ± 0% 586kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 4. The go1 benchmark showed no decline in total. name old time/op new time/op delta BinaryTree17-4 41.7s ± 1% 41.7s ± 1% ~ (p=0.966 n=40+40) Fannkuch11-4 23.6s ± 0% 23.6s ± 1% -0.23% (p=0.000 n=40+40) FmtFprintfEmpty-4 844ns ± 1% 834ns ± 1% -1.23% (p=0.000 n=40+40) FmtFprintfString-4 1.39µs ± 1% 1.40µs ± 1% +0.71% (p=0.000 n=40+40) FmtFprintfInt-4 1.44µs ± 1% 1.45µs ± 1% +0.70% (p=0.000 n=40+40) FmtFprintfIntInt-4 2.10µs ± 1% 2.10µs ± 1% +0.30% (p=0.000 n=40+40) FmtFprintfPrefixedInt-4 2.49µs ± 0% 2.50µs ± 1% +0.66% (p=0.000 n=32+40) FmtFprintfFloat-4 4.42µs ± 1% 4.46µs ± 2% +0.94% (p=0.000 n=40+40) FmtManyArgs-4 8.31µs ± 1% 8.22µs ± 1% -1.09% (p=0.000 n=40+40) GobDecode-4 105ms ± 1% 102ms ± 1% -2.30% (p=0.000 n=39+39) GobEncode-4 90.2ms ± 1% 88.7ms ± 1% -1.66% (p=0.000 n=40+39) Gzip-4 4.17s ± 1% 4.16s ± 1% ~ (p=0.785 n=40+40) Gunzip-4 608ms ± 1% 608ms ± 1% ~ (p=0.481 n=40+40) HTTPClientServer-4 697µs ± 2% 684µs ± 3% -1.89% (p=0.000 n=37+40) JSONEncode-4 255ms ± 1% 256ms ± 1% +0.35% (p=0.000 n=40+40) JSONDecode-4 920ms ± 1% 926ms ± 1% +0.64% (p=0.000 n=40+39) Mandelbrot200-4 49.3ms ± 1% 49.3ms ± 0% +0.07% (p=0.005 n=40+40) GoParse-4 46.8ms ± 2% 46.7ms ± 1% ~ (p=1.000 n=40+40) RegexpMatchEasy0_32-4 1.27µs ± 0% 1.27µs ± 1% ~ (p=0.057 n=40+40) RegexpMatchEasy0_1K-4 7.97µs ± 7% 7.92µs ± 5% ~ (p=0.094 n=40+40) RegexpMatchEasy1_32-4 1.28µs ± 1% 1.28µs ± 1% ~ (p=0.406 n=40+40) RegexpMatchEasy1_1K-4 10.5µs ± 4% 10.5µs ± 3% ~ (p=0.855 n=40+40) RegexpMatchMedium_32-4 2.04µs ± 0% 2.04µs ± 1% -0.22% (p=0.000 n=39+40) RegexpMatchMedium_1K-4 541µs ± 0% 540µs ± 1% -0.25% (p=0.000 n=40+38) RegexpMatchHard_32-4 29.3µs ± 1% 29.3µs ± 0% ~ (p=0.149 n=40+40) RegexpMatchHard_1K-4 878µs ± 1% 880µs ± 0% +0.14% (p=0.005 n=36+35) Revcomp-4 81.8ms ± 2% 81.4ms ± 2% -0.43% (p=0.015 n=38+39) Template-4 1.05s ± 1% 1.05s ± 1% ~ (p=0.302 n=40+35) TimeParse-4 7.18µs ± 1% 7.26µs ± 1% +1.05% (p=0.000 n=40+36) TimeFormat-4 13.1µs ± 1% 13.1µs ± 1% ~ (p=0.698 n=37+40) [Geo mean] 733µs 732µs -0.16% name old speed new speed delta GobDecode-4 7.34MB/s ± 1% 7.51MB/s ± 1% +2.36% (p=0.000 n=39+39) GobEncode-4 8.51MB/s ± 1% 8.65MB/s ± 1% +1.69% (p=0.000 n=40+39) Gzip-4 4.66MB/s ± 1% 4.66MB/s ± 1% ~ (p=0.783 n=40+40) Gunzip-4 31.9MB/s ± 1% 31.9MB/s ± 1% ~ (p=0.466 n=40+40) JSONEncode-4 7.61MB/s ± 1% 7.58MB/s ± 1% -0.35% (p=0.001 n=40+40) JSONDecode-4 2.11MB/s ± 1% 2.10MB/s ± 1% -0.52% (p=0.000 n=38+39) GoParse-4 1.24MB/s ± 2% 1.24MB/s ± 1% ~ (p=0.556 n=40+39) RegexpMatchEasy0_32-4 25.1MB/s ± 0% 25.1MB/s ± 1% ~ (p=0.064 n=40+40) RegexpMatchEasy0_1K-4 129MB/s ± 8% 129MB/s ± 5% ~ (p=0.094 n=40+40) RegexpMatchEasy1_32-4 25.0MB/s ± 1% 25.1MB/s ± 1% ~ (p=0.331 n=40+40) RegexpMatchEasy1_1K-4 97.7MB/s ± 4% 97.8MB/s ± 3% ~ (p=0.851 n=40+40) RegexpMatchMedium_32-4 490kB/s ± 0% 490kB/s ± 0% ~ (all equal) RegexpMatchMedium_1K-4 1.89MB/s ± 0% 1.90MB/s ± 1% +0.12% (p=0.031 n=40+40) RegexpMatchHard_32-4 1.09MB/s ± 1% 1.09MB/s ± 1% ~ (p=0.597 n=40+40) RegexpMatchHard_1K-4 1.16MB/s ± 1% 1.16MB/s ± 1% ~ (p=0.565 n=40+35) Revcomp-4 31.1MB/s ± 2% 31.2MB/s ± 2% +0.44% (p=0.018 n=38+39) Template-4 1.85MB/s ± 1% 1.85MB/s ± 1% ~ (p=0.873 n=40+40) [Geo mean] 6.66MB/s 6.67MB/s +0.26% Change-Id: Icc972d8a78ea06c32c3aa15733ff0537c82c2dc7 Reviewed-on: https://go-review.googlesource.com/58950 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com>
2017-08-25 12:07:01 +00:00
case ssa.OpARMMULA, ssa.OpARMMULS:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
p.Reg = v.Args[1].Reg()
p.To.Type = obj.TYPE_REGREG2
p.To.Reg = v.Reg() // result
p.To.Offset = int64(v.Args[2].Reg()) // addend
case ssa.OpARMMOVWconst:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
p.From.Offset = v.AuxInt
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpARMMOVFconst,
ssa.OpARMMOVDconst:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_FCONST
p.From.Val = math.Float64frombits(uint64(v.AuxInt))
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpARMCMP,
ssa.OpARMCMN,
ssa.OpARMTST,
ssa.OpARMTEQ,
ssa.OpARMCMPF,
ssa.OpARMCMPD:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
// Special layout in ARM assembly
// Comparing to x86, the operands of ARM's CMP are reversed.
p.From.Reg = v.Args[1].Reg()
p.Reg = v.Args[0].Reg()
case ssa.OpARMCMPconst,
ssa.OpARMCMNconst,
ssa.OpARMTSTconst,
ssa.OpARMTEQconst:
// Special layout in ARM assembly
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
p.From.Offset = v.AuxInt
p.Reg = v.Args[0].Reg()
case ssa.OpARMCMPF0,
ssa.OpARMCMPD0:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
cmd/compile: optimize ARM code with CMN/TST/TEQ CMN/TST/TEQ were supported since ARMv4, which can be used to simplify comparisons. This patch implements the optimization and here are the benchmark results. 1. A special test case got 18.21% improvement. name old time/op new time/op delta TSTTEQ-4 806µs ± 1% 659µs ± 0% -18.21% (p=0.000 n=20+18) (https://github.com/benshi001/ugo1/blob/master/tstteq_test.go) 2. There is no regression in the compilecmp benchmark. name old time/op new time/op delta Template 2.31s ± 1% 2.30s ± 1% ~ (p=0.661 n=10+9) Unicode 1.32s ± 3% 1.32s ± 5% ~ (p=0.280 n=10+10) GoTypes 7.69s ± 1% 7.65s ± 0% -0.52% (p=0.027 n=10+8) Compiler 36.5s ± 1% 36.4s ± 1% ~ (p=0.546 n=9+9) SSA 85.1s ± 2% 84.9s ± 1% ~ (p=0.529 n=10+10) Flate 1.43s ± 2% 1.43s ± 2% ~ (p=0.661 n=10+9) GoParser 1.81s ± 2% 1.81s ± 1% ~ (p=0.796 n=10+10) Reflect 5.10s ± 2% 5.09s ± 1% ~ (p=0.853 n=10+10) Tar 2.47s ± 1% 2.48s ± 1% ~ (p=0.123 n=10+10) XML 2.59s ± 1% 2.58s ± 1% ~ (p=0.853 n=10+10) [Geo mean] 4.78s 4.77s -0.17% name old user-time/op new user-time/op delta Template 2.72s ± 3% 2.73s ± 2% ~ (p=0.928 n=10+10) Unicode 1.58s ± 4% 1.60s ± 1% ~ (p=0.087 n=10+9) GoTypes 9.41s ± 2% 9.36s ± 1% ~ (p=0.060 n=10+10) Compiler 44.4s ± 2% 44.2s ± 2% ~ (p=0.289 n=10+10) SSA 110s ± 2% 110s ± 1% ~ (p=0.739 n=10+10) Flate 1.67s ± 2% 1.63s ± 3% ~ (p=0.063 n=10+10) GoParser 2.12s ± 1% 2.12s ± 2% ~ (p=0.840 n=10+10) Reflect 5.94s ± 1% 5.98s ± 1% ~ (p=0.063 n=9+10) Tar 3.01s ± 2% 3.02s ± 2% ~ (p=0.584 n=10+10) XML 3.04s ± 3% 3.02s ± 2% ~ (p=0.696 n=10+10) [Geo mean] 5.73s 5.72s -0.20% name old text-bytes new text-bytes delta HelloSize 579kB ± 0% 579kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.8kB ± 0% 72.8kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. There is little change in the go1 benchmark (excluding the noise). name old time/op new time/op delta BinaryTree17-4 40.3s ± 1% 40.6s ± 1% +0.80% (p=0.000 n=30+30) Fannkuch11-4 24.2s ± 1% 24.1s ± 0% ~ (p=0.093 n=30+30) FmtFprintfEmpty-4 834ns ± 0% 826ns ± 0% -0.93% (p=0.000 n=29+24) FmtFprintfString-4 1.39µs ± 1% 1.36µs ± 0% -2.02% (p=0.000 n=30+30) FmtFprintfInt-4 1.43µs ± 1% 1.44µs ± 1% ~ (p=0.155 n=30+29) FmtFprintfIntInt-4 2.09µs ± 0% 2.11µs ± 0% +1.16% (p=0.000 n=28+30) FmtFprintfPrefixedInt-4 2.33µs ± 1% 2.36µs ± 0% +1.25% (p=0.000 n=30+30) FmtFprintfFloat-4 4.27µs ± 1% 4.32µs ± 1% +1.27% (p=0.000 n=30+30) FmtManyArgs-4 8.18µs ± 0% 8.14µs ± 0% -0.46% (p=0.000 n=25+27) GobDecode-4 101ms ± 1% 101ms ± 1% ~ (p=0.182 n=29+29) GobEncode-4 89.6ms ± 1% 87.8ms ± 2% -2.02% (p=0.000 n=30+29) Gzip-4 4.07s ± 1% 4.08s ± 1% ~ (p=0.173 n=30+27) Gunzip-4 602ms ± 1% 600ms ± 1% -0.29% (p=0.000 n=29+28) HTTPClientServer-4 679µs ± 4% 683µs ± 3% ~ (p=0.197 n=30+30) JSONEncode-4 241ms ± 1% 239ms ± 1% -0.84% (p=0.000 n=30+30) JSONDecode-4 903ms ± 1% 882ms ± 1% -2.33% (p=0.000 n=30+30) Mandelbrot200-4 41.8ms ± 0% 41.8ms ± 0% ~ (p=0.719 n=30+30) GoParse-4 45.5ms ± 1% 45.8ms ± 1% +0.52% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 1.27µs ± 1% 1.27µs ± 0% -0.60% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 7.77µs ± 6% 7.69µs ± 4% -0.96% (p=0.040 n=30+30) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 10.3µs ± 6% 10.2µs ± 3% ~ (p=0.453 n=30+27) RegexpMatchMedium_32-4 1.98µs ± 1% 2.00µs ± 1% +0.85% (p=0.000 n=30+29) RegexpMatchMedium_1K-4 503µs ± 0% 503µs ± 1% ~ (p=0.752 n=30+30) RegexpMatchHard_32-4 27.1µs ± 1% 26.5µs ± 0% -1.96% (p=0.000 n=30+24) RegexpMatchHard_1K-4 809µs ± 1% 799µs ± 1% -1.29% (p=0.000 n=29+30) Revcomp-4 67.3ms ± 2% 67.2ms ± 1% ~ (p=0.265 n=29+29) Template-4 1.08s ± 1% 1.07s ± 0% -1.39% (p=0.000 n=30+22) TimeParse-4 6.93µs ± 1% 6.96µs ± 1% +0.40% (p=0.005 n=30+30) TimeFormat-4 13.3µs ± 0% 13.3µs ± 1% ~ (p=0.734 n=30+30) [Geo mean] 709µs 707µs -0.32% name old speed new speed delta GobDecode-4 7.59MB/s ± 1% 7.57MB/s ± 1% ~ (p=0.145 n=29+29) GobEncode-4 8.56MB/s ± 1% 8.74MB/s ± 1% +2.07% (p=0.000 n=30+29) Gzip-4 4.76MB/s ± 1% 4.75MB/s ± 1% -0.25% (p=0.037 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.3MB/s ± 1% +0.29% (p=0.000 n=29+28) JSONEncode-4 8.04MB/s ± 1% 8.11MB/s ± 1% +0.85% (p=0.000 n=30+30) JSONDecode-4 2.15MB/s ± 1% 2.20MB/s ± 1% +2.29% (p=0.000 n=30+30) GoParse-4 1.27MB/s ± 1% 1.26MB/s ± 1% -0.73% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 25.1MB/s ± 1% 25.3MB/s ± 0% +0.61% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 131MB/s ± 6% 133MB/s ± 4% +1.35% (p=0.009 n=28+30) RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.0MB/s ± 1% +0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 99.2MB/s ± 6% 100.2MB/s ± 3% ~ (p=0.448 n=30+27) RegexpMatchMedium_32-4 503kB/s ± 1% 500kB/s ± 0% -0.66% (p=0.002 n=30+24) RegexpMatchMedium_1K-4 2.04MB/s ± 0% 2.04MB/s ± 1% ~ (p=0.358 n=30+30) RegexpMatchHard_32-4 1.18MB/s ± 1% 1.20MB/s ± 1% +1.75% (p=0.000 n=30+30) RegexpMatchHard_1K-4 1.26MB/s ± 1% 1.28MB/s ± 1% +1.42% (p=0.000 n=30+30) Revcomp-4 37.8MB/s ± 2% 37.8MB/s ± 1% ~ (p=0.266 n=29+29) Template-4 1.80MB/s ± 1% 1.82MB/s ± 1% +1.46% (p=0.000 n=30+30) [Geo mean] 6.91MB/s 6.96MB/s +0.70% fixes #21583 Change-Id: I24065a80588ccae7de3ad732a3cfb0026cf7e214 Reviewed-on: https://go-review.googlesource.com/67490 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-02 03:09:28 +00:00
case ssa.OpARMCMPshiftLL, ssa.OpARMCMNshiftLL, ssa.OpARMTSTshiftLL, ssa.OpARMTEQshiftLL:
genshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), 0, arm.SHIFT_LL, v.AuxInt)
cmd/compile: optimize ARM code with CMN/TST/TEQ CMN/TST/TEQ were supported since ARMv4, which can be used to simplify comparisons. This patch implements the optimization and here are the benchmark results. 1. A special test case got 18.21% improvement. name old time/op new time/op delta TSTTEQ-4 806µs ± 1% 659µs ± 0% -18.21% (p=0.000 n=20+18) (https://github.com/benshi001/ugo1/blob/master/tstteq_test.go) 2. There is no regression in the compilecmp benchmark. name old time/op new time/op delta Template 2.31s ± 1% 2.30s ± 1% ~ (p=0.661 n=10+9) Unicode 1.32s ± 3% 1.32s ± 5% ~ (p=0.280 n=10+10) GoTypes 7.69s ± 1% 7.65s ± 0% -0.52% (p=0.027 n=10+8) Compiler 36.5s ± 1% 36.4s ± 1% ~ (p=0.546 n=9+9) SSA 85.1s ± 2% 84.9s ± 1% ~ (p=0.529 n=10+10) Flate 1.43s ± 2% 1.43s ± 2% ~ (p=0.661 n=10+9) GoParser 1.81s ± 2% 1.81s ± 1% ~ (p=0.796 n=10+10) Reflect 5.10s ± 2% 5.09s ± 1% ~ (p=0.853 n=10+10) Tar 2.47s ± 1% 2.48s ± 1% ~ (p=0.123 n=10+10) XML 2.59s ± 1% 2.58s ± 1% ~ (p=0.853 n=10+10) [Geo mean] 4.78s 4.77s -0.17% name old user-time/op new user-time/op delta Template 2.72s ± 3% 2.73s ± 2% ~ (p=0.928 n=10+10) Unicode 1.58s ± 4% 1.60s ± 1% ~ (p=0.087 n=10+9) GoTypes 9.41s ± 2% 9.36s ± 1% ~ (p=0.060 n=10+10) Compiler 44.4s ± 2% 44.2s ± 2% ~ (p=0.289 n=10+10) SSA 110s ± 2% 110s ± 1% ~ (p=0.739 n=10+10) Flate 1.67s ± 2% 1.63s ± 3% ~ (p=0.063 n=10+10) GoParser 2.12s ± 1% 2.12s ± 2% ~ (p=0.840 n=10+10) Reflect 5.94s ± 1% 5.98s ± 1% ~ (p=0.063 n=9+10) Tar 3.01s ± 2% 3.02s ± 2% ~ (p=0.584 n=10+10) XML 3.04s ± 3% 3.02s ± 2% ~ (p=0.696 n=10+10) [Geo mean] 5.73s 5.72s -0.20% name old text-bytes new text-bytes delta HelloSize 579kB ± 0% 579kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.8kB ± 0% 72.8kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. There is little change in the go1 benchmark (excluding the noise). name old time/op new time/op delta BinaryTree17-4 40.3s ± 1% 40.6s ± 1% +0.80% (p=0.000 n=30+30) Fannkuch11-4 24.2s ± 1% 24.1s ± 0% ~ (p=0.093 n=30+30) FmtFprintfEmpty-4 834ns ± 0% 826ns ± 0% -0.93% (p=0.000 n=29+24) FmtFprintfString-4 1.39µs ± 1% 1.36µs ± 0% -2.02% (p=0.000 n=30+30) FmtFprintfInt-4 1.43µs ± 1% 1.44µs ± 1% ~ (p=0.155 n=30+29) FmtFprintfIntInt-4 2.09µs ± 0% 2.11µs ± 0% +1.16% (p=0.000 n=28+30) FmtFprintfPrefixedInt-4 2.33µs ± 1% 2.36µs ± 0% +1.25% (p=0.000 n=30+30) FmtFprintfFloat-4 4.27µs ± 1% 4.32µs ± 1% +1.27% (p=0.000 n=30+30) FmtManyArgs-4 8.18µs ± 0% 8.14µs ± 0% -0.46% (p=0.000 n=25+27) GobDecode-4 101ms ± 1% 101ms ± 1% ~ (p=0.182 n=29+29) GobEncode-4 89.6ms ± 1% 87.8ms ± 2% -2.02% (p=0.000 n=30+29) Gzip-4 4.07s ± 1% 4.08s ± 1% ~ (p=0.173 n=30+27) Gunzip-4 602ms ± 1% 600ms ± 1% -0.29% (p=0.000 n=29+28) HTTPClientServer-4 679µs ± 4% 683µs ± 3% ~ (p=0.197 n=30+30) JSONEncode-4 241ms ± 1% 239ms ± 1% -0.84% (p=0.000 n=30+30) JSONDecode-4 903ms ± 1% 882ms ± 1% -2.33% (p=0.000 n=30+30) Mandelbrot200-4 41.8ms ± 0% 41.8ms ± 0% ~ (p=0.719 n=30+30) GoParse-4 45.5ms ± 1% 45.8ms ± 1% +0.52% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 1.27µs ± 1% 1.27µs ± 0% -0.60% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 7.77µs ± 6% 7.69µs ± 4% -0.96% (p=0.040 n=30+30) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 10.3µs ± 6% 10.2µs ± 3% ~ (p=0.453 n=30+27) RegexpMatchMedium_32-4 1.98µs ± 1% 2.00µs ± 1% +0.85% (p=0.000 n=30+29) RegexpMatchMedium_1K-4 503µs ± 0% 503µs ± 1% ~ (p=0.752 n=30+30) RegexpMatchHard_32-4 27.1µs ± 1% 26.5µs ± 0% -1.96% (p=0.000 n=30+24) RegexpMatchHard_1K-4 809µs ± 1% 799µs ± 1% -1.29% (p=0.000 n=29+30) Revcomp-4 67.3ms ± 2% 67.2ms ± 1% ~ (p=0.265 n=29+29) Template-4 1.08s ± 1% 1.07s ± 0% -1.39% (p=0.000 n=30+22) TimeParse-4 6.93µs ± 1% 6.96µs ± 1% +0.40% (p=0.005 n=30+30) TimeFormat-4 13.3µs ± 0% 13.3µs ± 1% ~ (p=0.734 n=30+30) [Geo mean] 709µs 707µs -0.32% name old speed new speed delta GobDecode-4 7.59MB/s ± 1% 7.57MB/s ± 1% ~ (p=0.145 n=29+29) GobEncode-4 8.56MB/s ± 1% 8.74MB/s ± 1% +2.07% (p=0.000 n=30+29) Gzip-4 4.76MB/s ± 1% 4.75MB/s ± 1% -0.25% (p=0.037 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.3MB/s ± 1% +0.29% (p=0.000 n=29+28) JSONEncode-4 8.04MB/s ± 1% 8.11MB/s ± 1% +0.85% (p=0.000 n=30+30) JSONDecode-4 2.15MB/s ± 1% 2.20MB/s ± 1% +2.29% (p=0.000 n=30+30) GoParse-4 1.27MB/s ± 1% 1.26MB/s ± 1% -0.73% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 25.1MB/s ± 1% 25.3MB/s ± 0% +0.61% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 131MB/s ± 6% 133MB/s ± 4% +1.35% (p=0.009 n=28+30) RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.0MB/s ± 1% +0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 99.2MB/s ± 6% 100.2MB/s ± 3% ~ (p=0.448 n=30+27) RegexpMatchMedium_32-4 503kB/s ± 1% 500kB/s ± 0% -0.66% (p=0.002 n=30+24) RegexpMatchMedium_1K-4 2.04MB/s ± 0% 2.04MB/s ± 1% ~ (p=0.358 n=30+30) RegexpMatchHard_32-4 1.18MB/s ± 1% 1.20MB/s ± 1% +1.75% (p=0.000 n=30+30) RegexpMatchHard_1K-4 1.26MB/s ± 1% 1.28MB/s ± 1% +1.42% (p=0.000 n=30+30) Revcomp-4 37.8MB/s ± 2% 37.8MB/s ± 1% ~ (p=0.266 n=29+29) Template-4 1.80MB/s ± 1% 1.82MB/s ± 1% +1.46% (p=0.000 n=30+30) [Geo mean] 6.91MB/s 6.96MB/s +0.70% fixes #21583 Change-Id: I24065a80588ccae7de3ad732a3cfb0026cf7e214 Reviewed-on: https://go-review.googlesource.com/67490 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-02 03:09:28 +00:00
case ssa.OpARMCMPshiftRL, ssa.OpARMCMNshiftRL, ssa.OpARMTSTshiftRL, ssa.OpARMTEQshiftRL:
genshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), 0, arm.SHIFT_LR, v.AuxInt)
cmd/compile: optimize ARM code with CMN/TST/TEQ CMN/TST/TEQ were supported since ARMv4, which can be used to simplify comparisons. This patch implements the optimization and here are the benchmark results. 1. A special test case got 18.21% improvement. name old time/op new time/op delta TSTTEQ-4 806µs ± 1% 659µs ± 0% -18.21% (p=0.000 n=20+18) (https://github.com/benshi001/ugo1/blob/master/tstteq_test.go) 2. There is no regression in the compilecmp benchmark. name old time/op new time/op delta Template 2.31s ± 1% 2.30s ± 1% ~ (p=0.661 n=10+9) Unicode 1.32s ± 3% 1.32s ± 5% ~ (p=0.280 n=10+10) GoTypes 7.69s ± 1% 7.65s ± 0% -0.52% (p=0.027 n=10+8) Compiler 36.5s ± 1% 36.4s ± 1% ~ (p=0.546 n=9+9) SSA 85.1s ± 2% 84.9s ± 1% ~ (p=0.529 n=10+10) Flate 1.43s ± 2% 1.43s ± 2% ~ (p=0.661 n=10+9) GoParser 1.81s ± 2% 1.81s ± 1% ~ (p=0.796 n=10+10) Reflect 5.10s ± 2% 5.09s ± 1% ~ (p=0.853 n=10+10) Tar 2.47s ± 1% 2.48s ± 1% ~ (p=0.123 n=10+10) XML 2.59s ± 1% 2.58s ± 1% ~ (p=0.853 n=10+10) [Geo mean] 4.78s 4.77s -0.17% name old user-time/op new user-time/op delta Template 2.72s ± 3% 2.73s ± 2% ~ (p=0.928 n=10+10) Unicode 1.58s ± 4% 1.60s ± 1% ~ (p=0.087 n=10+9) GoTypes 9.41s ± 2% 9.36s ± 1% ~ (p=0.060 n=10+10) Compiler 44.4s ± 2% 44.2s ± 2% ~ (p=0.289 n=10+10) SSA 110s ± 2% 110s ± 1% ~ (p=0.739 n=10+10) Flate 1.67s ± 2% 1.63s ± 3% ~ (p=0.063 n=10+10) GoParser 2.12s ± 1% 2.12s ± 2% ~ (p=0.840 n=10+10) Reflect 5.94s ± 1% 5.98s ± 1% ~ (p=0.063 n=9+10) Tar 3.01s ± 2% 3.02s ± 2% ~ (p=0.584 n=10+10) XML 3.04s ± 3% 3.02s ± 2% ~ (p=0.696 n=10+10) [Geo mean] 5.73s 5.72s -0.20% name old text-bytes new text-bytes delta HelloSize 579kB ± 0% 579kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.8kB ± 0% 72.8kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. There is little change in the go1 benchmark (excluding the noise). name old time/op new time/op delta BinaryTree17-4 40.3s ± 1% 40.6s ± 1% +0.80% (p=0.000 n=30+30) Fannkuch11-4 24.2s ± 1% 24.1s ± 0% ~ (p=0.093 n=30+30) FmtFprintfEmpty-4 834ns ± 0% 826ns ± 0% -0.93% (p=0.000 n=29+24) FmtFprintfString-4 1.39µs ± 1% 1.36µs ± 0% -2.02% (p=0.000 n=30+30) FmtFprintfInt-4 1.43µs ± 1% 1.44µs ± 1% ~ (p=0.155 n=30+29) FmtFprintfIntInt-4 2.09µs ± 0% 2.11µs ± 0% +1.16% (p=0.000 n=28+30) FmtFprintfPrefixedInt-4 2.33µs ± 1% 2.36µs ± 0% +1.25% (p=0.000 n=30+30) FmtFprintfFloat-4 4.27µs ± 1% 4.32µs ± 1% +1.27% (p=0.000 n=30+30) FmtManyArgs-4 8.18µs ± 0% 8.14µs ± 0% -0.46% (p=0.000 n=25+27) GobDecode-4 101ms ± 1% 101ms ± 1% ~ (p=0.182 n=29+29) GobEncode-4 89.6ms ± 1% 87.8ms ± 2% -2.02% (p=0.000 n=30+29) Gzip-4 4.07s ± 1% 4.08s ± 1% ~ (p=0.173 n=30+27) Gunzip-4 602ms ± 1% 600ms ± 1% -0.29% (p=0.000 n=29+28) HTTPClientServer-4 679µs ± 4% 683µs ± 3% ~ (p=0.197 n=30+30) JSONEncode-4 241ms ± 1% 239ms ± 1% -0.84% (p=0.000 n=30+30) JSONDecode-4 903ms ± 1% 882ms ± 1% -2.33% (p=0.000 n=30+30) Mandelbrot200-4 41.8ms ± 0% 41.8ms ± 0% ~ (p=0.719 n=30+30) GoParse-4 45.5ms ± 1% 45.8ms ± 1% +0.52% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 1.27µs ± 1% 1.27µs ± 0% -0.60% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 7.77µs ± 6% 7.69µs ± 4% -0.96% (p=0.040 n=30+30) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 10.3µs ± 6% 10.2µs ± 3% ~ (p=0.453 n=30+27) RegexpMatchMedium_32-4 1.98µs ± 1% 2.00µs ± 1% +0.85% (p=0.000 n=30+29) RegexpMatchMedium_1K-4 503µs ± 0% 503µs ± 1% ~ (p=0.752 n=30+30) RegexpMatchHard_32-4 27.1µs ± 1% 26.5µs ± 0% -1.96% (p=0.000 n=30+24) RegexpMatchHard_1K-4 809µs ± 1% 799µs ± 1% -1.29% (p=0.000 n=29+30) Revcomp-4 67.3ms ± 2% 67.2ms ± 1% ~ (p=0.265 n=29+29) Template-4 1.08s ± 1% 1.07s ± 0% -1.39% (p=0.000 n=30+22) TimeParse-4 6.93µs ± 1% 6.96µs ± 1% +0.40% (p=0.005 n=30+30) TimeFormat-4 13.3µs ± 0% 13.3µs ± 1% ~ (p=0.734 n=30+30) [Geo mean] 709µs 707µs -0.32% name old speed new speed delta GobDecode-4 7.59MB/s ± 1% 7.57MB/s ± 1% ~ (p=0.145 n=29+29) GobEncode-4 8.56MB/s ± 1% 8.74MB/s ± 1% +2.07% (p=0.000 n=30+29) Gzip-4 4.76MB/s ± 1% 4.75MB/s ± 1% -0.25% (p=0.037 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.3MB/s ± 1% +0.29% (p=0.000 n=29+28) JSONEncode-4 8.04MB/s ± 1% 8.11MB/s ± 1% +0.85% (p=0.000 n=30+30) JSONDecode-4 2.15MB/s ± 1% 2.20MB/s ± 1% +2.29% (p=0.000 n=30+30) GoParse-4 1.27MB/s ± 1% 1.26MB/s ± 1% -0.73% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 25.1MB/s ± 1% 25.3MB/s ± 0% +0.61% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 131MB/s ± 6% 133MB/s ± 4% +1.35% (p=0.009 n=28+30) RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.0MB/s ± 1% +0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 99.2MB/s ± 6% 100.2MB/s ± 3% ~ (p=0.448 n=30+27) RegexpMatchMedium_32-4 503kB/s ± 1% 500kB/s ± 0% -0.66% (p=0.002 n=30+24) RegexpMatchMedium_1K-4 2.04MB/s ± 0% 2.04MB/s ± 1% ~ (p=0.358 n=30+30) RegexpMatchHard_32-4 1.18MB/s ± 1% 1.20MB/s ± 1% +1.75% (p=0.000 n=30+30) RegexpMatchHard_1K-4 1.26MB/s ± 1% 1.28MB/s ± 1% +1.42% (p=0.000 n=30+30) Revcomp-4 37.8MB/s ± 2% 37.8MB/s ± 1% ~ (p=0.266 n=29+29) Template-4 1.80MB/s ± 1% 1.82MB/s ± 1% +1.46% (p=0.000 n=30+30) [Geo mean] 6.91MB/s 6.96MB/s +0.70% fixes #21583 Change-Id: I24065a80588ccae7de3ad732a3cfb0026cf7e214 Reviewed-on: https://go-review.googlesource.com/67490 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-02 03:09:28 +00:00
case ssa.OpARMCMPshiftRA, ssa.OpARMCMNshiftRA, ssa.OpARMTSTshiftRA, ssa.OpARMTEQshiftRA:
genshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), 0, arm.SHIFT_AR, v.AuxInt)
cmd/compile: optimize ARM code with CMN/TST/TEQ CMN/TST/TEQ were supported since ARMv4, which can be used to simplify comparisons. This patch implements the optimization and here are the benchmark results. 1. A special test case got 18.21% improvement. name old time/op new time/op delta TSTTEQ-4 806µs ± 1% 659µs ± 0% -18.21% (p=0.000 n=20+18) (https://github.com/benshi001/ugo1/blob/master/tstteq_test.go) 2. There is no regression in the compilecmp benchmark. name old time/op new time/op delta Template 2.31s ± 1% 2.30s ± 1% ~ (p=0.661 n=10+9) Unicode 1.32s ± 3% 1.32s ± 5% ~ (p=0.280 n=10+10) GoTypes 7.69s ± 1% 7.65s ± 0% -0.52% (p=0.027 n=10+8) Compiler 36.5s ± 1% 36.4s ± 1% ~ (p=0.546 n=9+9) SSA 85.1s ± 2% 84.9s ± 1% ~ (p=0.529 n=10+10) Flate 1.43s ± 2% 1.43s ± 2% ~ (p=0.661 n=10+9) GoParser 1.81s ± 2% 1.81s ± 1% ~ (p=0.796 n=10+10) Reflect 5.10s ± 2% 5.09s ± 1% ~ (p=0.853 n=10+10) Tar 2.47s ± 1% 2.48s ± 1% ~ (p=0.123 n=10+10) XML 2.59s ± 1% 2.58s ± 1% ~ (p=0.853 n=10+10) [Geo mean] 4.78s 4.77s -0.17% name old user-time/op new user-time/op delta Template 2.72s ± 3% 2.73s ± 2% ~ (p=0.928 n=10+10) Unicode 1.58s ± 4% 1.60s ± 1% ~ (p=0.087 n=10+9) GoTypes 9.41s ± 2% 9.36s ± 1% ~ (p=0.060 n=10+10) Compiler 44.4s ± 2% 44.2s ± 2% ~ (p=0.289 n=10+10) SSA 110s ± 2% 110s ± 1% ~ (p=0.739 n=10+10) Flate 1.67s ± 2% 1.63s ± 3% ~ (p=0.063 n=10+10) GoParser 2.12s ± 1% 2.12s ± 2% ~ (p=0.840 n=10+10) Reflect 5.94s ± 1% 5.98s ± 1% ~ (p=0.063 n=9+10) Tar 3.01s ± 2% 3.02s ± 2% ~ (p=0.584 n=10+10) XML 3.04s ± 3% 3.02s ± 2% ~ (p=0.696 n=10+10) [Geo mean] 5.73s 5.72s -0.20% name old text-bytes new text-bytes delta HelloSize 579kB ± 0% 579kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.8kB ± 0% 72.8kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. There is little change in the go1 benchmark (excluding the noise). name old time/op new time/op delta BinaryTree17-4 40.3s ± 1% 40.6s ± 1% +0.80% (p=0.000 n=30+30) Fannkuch11-4 24.2s ± 1% 24.1s ± 0% ~ (p=0.093 n=30+30) FmtFprintfEmpty-4 834ns ± 0% 826ns ± 0% -0.93% (p=0.000 n=29+24) FmtFprintfString-4 1.39µs ± 1% 1.36µs ± 0% -2.02% (p=0.000 n=30+30) FmtFprintfInt-4 1.43µs ± 1% 1.44µs ± 1% ~ (p=0.155 n=30+29) FmtFprintfIntInt-4 2.09µs ± 0% 2.11µs ± 0% +1.16% (p=0.000 n=28+30) FmtFprintfPrefixedInt-4 2.33µs ± 1% 2.36µs ± 0% +1.25% (p=0.000 n=30+30) FmtFprintfFloat-4 4.27µs ± 1% 4.32µs ± 1% +1.27% (p=0.000 n=30+30) FmtManyArgs-4 8.18µs ± 0% 8.14µs ± 0% -0.46% (p=0.000 n=25+27) GobDecode-4 101ms ± 1% 101ms ± 1% ~ (p=0.182 n=29+29) GobEncode-4 89.6ms ± 1% 87.8ms ± 2% -2.02% (p=0.000 n=30+29) Gzip-4 4.07s ± 1% 4.08s ± 1% ~ (p=0.173 n=30+27) Gunzip-4 602ms ± 1% 600ms ± 1% -0.29% (p=0.000 n=29+28) HTTPClientServer-4 679µs ± 4% 683µs ± 3% ~ (p=0.197 n=30+30) JSONEncode-4 241ms ± 1% 239ms ± 1% -0.84% (p=0.000 n=30+30) JSONDecode-4 903ms ± 1% 882ms ± 1% -2.33% (p=0.000 n=30+30) Mandelbrot200-4 41.8ms ± 0% 41.8ms ± 0% ~ (p=0.719 n=30+30) GoParse-4 45.5ms ± 1% 45.8ms ± 1% +0.52% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 1.27µs ± 1% 1.27µs ± 0% -0.60% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 7.77µs ± 6% 7.69µs ± 4% -0.96% (p=0.040 n=30+30) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 10.3µs ± 6% 10.2µs ± 3% ~ (p=0.453 n=30+27) RegexpMatchMedium_32-4 1.98µs ± 1% 2.00µs ± 1% +0.85% (p=0.000 n=30+29) RegexpMatchMedium_1K-4 503µs ± 0% 503µs ± 1% ~ (p=0.752 n=30+30) RegexpMatchHard_32-4 27.1µs ± 1% 26.5µs ± 0% -1.96% (p=0.000 n=30+24) RegexpMatchHard_1K-4 809µs ± 1% 799µs ± 1% -1.29% (p=0.000 n=29+30) Revcomp-4 67.3ms ± 2% 67.2ms ± 1% ~ (p=0.265 n=29+29) Template-4 1.08s ± 1% 1.07s ± 0% -1.39% (p=0.000 n=30+22) TimeParse-4 6.93µs ± 1% 6.96µs ± 1% +0.40% (p=0.005 n=30+30) TimeFormat-4 13.3µs ± 0% 13.3µs ± 1% ~ (p=0.734 n=30+30) [Geo mean] 709µs 707µs -0.32% name old speed new speed delta GobDecode-4 7.59MB/s ± 1% 7.57MB/s ± 1% ~ (p=0.145 n=29+29) GobEncode-4 8.56MB/s ± 1% 8.74MB/s ± 1% +2.07% (p=0.000 n=30+29) Gzip-4 4.76MB/s ± 1% 4.75MB/s ± 1% -0.25% (p=0.037 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.3MB/s ± 1% +0.29% (p=0.000 n=29+28) JSONEncode-4 8.04MB/s ± 1% 8.11MB/s ± 1% +0.85% (p=0.000 n=30+30) JSONDecode-4 2.15MB/s ± 1% 2.20MB/s ± 1% +2.29% (p=0.000 n=30+30) GoParse-4 1.27MB/s ± 1% 1.26MB/s ± 1% -0.73% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 25.1MB/s ± 1% 25.3MB/s ± 0% +0.61% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 131MB/s ± 6% 133MB/s ± 4% +1.35% (p=0.009 n=28+30) RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.0MB/s ± 1% +0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 99.2MB/s ± 6% 100.2MB/s ± 3% ~ (p=0.448 n=30+27) RegexpMatchMedium_32-4 503kB/s ± 1% 500kB/s ± 0% -0.66% (p=0.002 n=30+24) RegexpMatchMedium_1K-4 2.04MB/s ± 0% 2.04MB/s ± 1% ~ (p=0.358 n=30+30) RegexpMatchHard_32-4 1.18MB/s ± 1% 1.20MB/s ± 1% +1.75% (p=0.000 n=30+30) RegexpMatchHard_1K-4 1.26MB/s ± 1% 1.28MB/s ± 1% +1.42% (p=0.000 n=30+30) Revcomp-4 37.8MB/s ± 2% 37.8MB/s ± 1% ~ (p=0.266 n=29+29) Template-4 1.80MB/s ± 1% 1.82MB/s ± 1% +1.46% (p=0.000 n=30+30) [Geo mean] 6.91MB/s 6.96MB/s +0.70% fixes #21583 Change-Id: I24065a80588ccae7de3ad732a3cfb0026cf7e214 Reviewed-on: https://go-review.googlesource.com/67490 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-02 03:09:28 +00:00
case ssa.OpARMCMPshiftLLreg, ssa.OpARMCMNshiftLLreg, ssa.OpARMTSTshiftLLreg, ssa.OpARMTEQshiftLLreg:
genregshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Args[2].Reg(), 0, arm.SHIFT_LL)
cmd/compile: optimize ARM code with CMN/TST/TEQ CMN/TST/TEQ were supported since ARMv4, which can be used to simplify comparisons. This patch implements the optimization and here are the benchmark results. 1. A special test case got 18.21% improvement. name old time/op new time/op delta TSTTEQ-4 806µs ± 1% 659µs ± 0% -18.21% (p=0.000 n=20+18) (https://github.com/benshi001/ugo1/blob/master/tstteq_test.go) 2. There is no regression in the compilecmp benchmark. name old time/op new time/op delta Template 2.31s ± 1% 2.30s ± 1% ~ (p=0.661 n=10+9) Unicode 1.32s ± 3% 1.32s ± 5% ~ (p=0.280 n=10+10) GoTypes 7.69s ± 1% 7.65s ± 0% -0.52% (p=0.027 n=10+8) Compiler 36.5s ± 1% 36.4s ± 1% ~ (p=0.546 n=9+9) SSA 85.1s ± 2% 84.9s ± 1% ~ (p=0.529 n=10+10) Flate 1.43s ± 2% 1.43s ± 2% ~ (p=0.661 n=10+9) GoParser 1.81s ± 2% 1.81s ± 1% ~ (p=0.796 n=10+10) Reflect 5.10s ± 2% 5.09s ± 1% ~ (p=0.853 n=10+10) Tar 2.47s ± 1% 2.48s ± 1% ~ (p=0.123 n=10+10) XML 2.59s ± 1% 2.58s ± 1% ~ (p=0.853 n=10+10) [Geo mean] 4.78s 4.77s -0.17% name old user-time/op new user-time/op delta Template 2.72s ± 3% 2.73s ± 2% ~ (p=0.928 n=10+10) Unicode 1.58s ± 4% 1.60s ± 1% ~ (p=0.087 n=10+9) GoTypes 9.41s ± 2% 9.36s ± 1% ~ (p=0.060 n=10+10) Compiler 44.4s ± 2% 44.2s ± 2% ~ (p=0.289 n=10+10) SSA 110s ± 2% 110s ± 1% ~ (p=0.739 n=10+10) Flate 1.67s ± 2% 1.63s ± 3% ~ (p=0.063 n=10+10) GoParser 2.12s ± 1% 2.12s ± 2% ~ (p=0.840 n=10+10) Reflect 5.94s ± 1% 5.98s ± 1% ~ (p=0.063 n=9+10) Tar 3.01s ± 2% 3.02s ± 2% ~ (p=0.584 n=10+10) XML 3.04s ± 3% 3.02s ± 2% ~ (p=0.696 n=10+10) [Geo mean] 5.73s 5.72s -0.20% name old text-bytes new text-bytes delta HelloSize 579kB ± 0% 579kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.8kB ± 0% 72.8kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. There is little change in the go1 benchmark (excluding the noise). name old time/op new time/op delta BinaryTree17-4 40.3s ± 1% 40.6s ± 1% +0.80% (p=0.000 n=30+30) Fannkuch11-4 24.2s ± 1% 24.1s ± 0% ~ (p=0.093 n=30+30) FmtFprintfEmpty-4 834ns ± 0% 826ns ± 0% -0.93% (p=0.000 n=29+24) FmtFprintfString-4 1.39µs ± 1% 1.36µs ± 0% -2.02% (p=0.000 n=30+30) FmtFprintfInt-4 1.43µs ± 1% 1.44µs ± 1% ~ (p=0.155 n=30+29) FmtFprintfIntInt-4 2.09µs ± 0% 2.11µs ± 0% +1.16% (p=0.000 n=28+30) FmtFprintfPrefixedInt-4 2.33µs ± 1% 2.36µs ± 0% +1.25% (p=0.000 n=30+30) FmtFprintfFloat-4 4.27µs ± 1% 4.32µs ± 1% +1.27% (p=0.000 n=30+30) FmtManyArgs-4 8.18µs ± 0% 8.14µs ± 0% -0.46% (p=0.000 n=25+27) GobDecode-4 101ms ± 1% 101ms ± 1% ~ (p=0.182 n=29+29) GobEncode-4 89.6ms ± 1% 87.8ms ± 2% -2.02% (p=0.000 n=30+29) Gzip-4 4.07s ± 1% 4.08s ± 1% ~ (p=0.173 n=30+27) Gunzip-4 602ms ± 1% 600ms ± 1% -0.29% (p=0.000 n=29+28) HTTPClientServer-4 679µs ± 4% 683µs ± 3% ~ (p=0.197 n=30+30) JSONEncode-4 241ms ± 1% 239ms ± 1% -0.84% (p=0.000 n=30+30) JSONDecode-4 903ms ± 1% 882ms ± 1% -2.33% (p=0.000 n=30+30) Mandelbrot200-4 41.8ms ± 0% 41.8ms ± 0% ~ (p=0.719 n=30+30) GoParse-4 45.5ms ± 1% 45.8ms ± 1% +0.52% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 1.27µs ± 1% 1.27µs ± 0% -0.60% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 7.77µs ± 6% 7.69µs ± 4% -0.96% (p=0.040 n=30+30) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 10.3µs ± 6% 10.2µs ± 3% ~ (p=0.453 n=30+27) RegexpMatchMedium_32-4 1.98µs ± 1% 2.00µs ± 1% +0.85% (p=0.000 n=30+29) RegexpMatchMedium_1K-4 503µs ± 0% 503µs ± 1% ~ (p=0.752 n=30+30) RegexpMatchHard_32-4 27.1µs ± 1% 26.5µs ± 0% -1.96% (p=0.000 n=30+24) RegexpMatchHard_1K-4 809µs ± 1% 799µs ± 1% -1.29% (p=0.000 n=29+30) Revcomp-4 67.3ms ± 2% 67.2ms ± 1% ~ (p=0.265 n=29+29) Template-4 1.08s ± 1% 1.07s ± 0% -1.39% (p=0.000 n=30+22) TimeParse-4 6.93µs ± 1% 6.96µs ± 1% +0.40% (p=0.005 n=30+30) TimeFormat-4 13.3µs ± 0% 13.3µs ± 1% ~ (p=0.734 n=30+30) [Geo mean] 709µs 707µs -0.32% name old speed new speed delta GobDecode-4 7.59MB/s ± 1% 7.57MB/s ± 1% ~ (p=0.145 n=29+29) GobEncode-4 8.56MB/s ± 1% 8.74MB/s ± 1% +2.07% (p=0.000 n=30+29) Gzip-4 4.76MB/s ± 1% 4.75MB/s ± 1% -0.25% (p=0.037 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.3MB/s ± 1% +0.29% (p=0.000 n=29+28) JSONEncode-4 8.04MB/s ± 1% 8.11MB/s ± 1% +0.85% (p=0.000 n=30+30) JSONDecode-4 2.15MB/s ± 1% 2.20MB/s ± 1% +2.29% (p=0.000 n=30+30) GoParse-4 1.27MB/s ± 1% 1.26MB/s ± 1% -0.73% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 25.1MB/s ± 1% 25.3MB/s ± 0% +0.61% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 131MB/s ± 6% 133MB/s ± 4% +1.35% (p=0.009 n=28+30) RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.0MB/s ± 1% +0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 99.2MB/s ± 6% 100.2MB/s ± 3% ~ (p=0.448 n=30+27) RegexpMatchMedium_32-4 503kB/s ± 1% 500kB/s ± 0% -0.66% (p=0.002 n=30+24) RegexpMatchMedium_1K-4 2.04MB/s ± 0% 2.04MB/s ± 1% ~ (p=0.358 n=30+30) RegexpMatchHard_32-4 1.18MB/s ± 1% 1.20MB/s ± 1% +1.75% (p=0.000 n=30+30) RegexpMatchHard_1K-4 1.26MB/s ± 1% 1.28MB/s ± 1% +1.42% (p=0.000 n=30+30) Revcomp-4 37.8MB/s ± 2% 37.8MB/s ± 1% ~ (p=0.266 n=29+29) Template-4 1.80MB/s ± 1% 1.82MB/s ± 1% +1.46% (p=0.000 n=30+30) [Geo mean] 6.91MB/s 6.96MB/s +0.70% fixes #21583 Change-Id: I24065a80588ccae7de3ad732a3cfb0026cf7e214 Reviewed-on: https://go-review.googlesource.com/67490 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-02 03:09:28 +00:00
case ssa.OpARMCMPshiftRLreg, ssa.OpARMCMNshiftRLreg, ssa.OpARMTSTshiftRLreg, ssa.OpARMTEQshiftRLreg:
genregshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Args[2].Reg(), 0, arm.SHIFT_LR)
cmd/compile: optimize ARM code with CMN/TST/TEQ CMN/TST/TEQ were supported since ARMv4, which can be used to simplify comparisons. This patch implements the optimization and here are the benchmark results. 1. A special test case got 18.21% improvement. name old time/op new time/op delta TSTTEQ-4 806µs ± 1% 659µs ± 0% -18.21% (p=0.000 n=20+18) (https://github.com/benshi001/ugo1/blob/master/tstteq_test.go) 2. There is no regression in the compilecmp benchmark. name old time/op new time/op delta Template 2.31s ± 1% 2.30s ± 1% ~ (p=0.661 n=10+9) Unicode 1.32s ± 3% 1.32s ± 5% ~ (p=0.280 n=10+10) GoTypes 7.69s ± 1% 7.65s ± 0% -0.52% (p=0.027 n=10+8) Compiler 36.5s ± 1% 36.4s ± 1% ~ (p=0.546 n=9+9) SSA 85.1s ± 2% 84.9s ± 1% ~ (p=0.529 n=10+10) Flate 1.43s ± 2% 1.43s ± 2% ~ (p=0.661 n=10+9) GoParser 1.81s ± 2% 1.81s ± 1% ~ (p=0.796 n=10+10) Reflect 5.10s ± 2% 5.09s ± 1% ~ (p=0.853 n=10+10) Tar 2.47s ± 1% 2.48s ± 1% ~ (p=0.123 n=10+10) XML 2.59s ± 1% 2.58s ± 1% ~ (p=0.853 n=10+10) [Geo mean] 4.78s 4.77s -0.17% name old user-time/op new user-time/op delta Template 2.72s ± 3% 2.73s ± 2% ~ (p=0.928 n=10+10) Unicode 1.58s ± 4% 1.60s ± 1% ~ (p=0.087 n=10+9) GoTypes 9.41s ± 2% 9.36s ± 1% ~ (p=0.060 n=10+10) Compiler 44.4s ± 2% 44.2s ± 2% ~ (p=0.289 n=10+10) SSA 110s ± 2% 110s ± 1% ~ (p=0.739 n=10+10) Flate 1.67s ± 2% 1.63s ± 3% ~ (p=0.063 n=10+10) GoParser 2.12s ± 1% 2.12s ± 2% ~ (p=0.840 n=10+10) Reflect 5.94s ± 1% 5.98s ± 1% ~ (p=0.063 n=9+10) Tar 3.01s ± 2% 3.02s ± 2% ~ (p=0.584 n=10+10) XML 3.04s ± 3% 3.02s ± 2% ~ (p=0.696 n=10+10) [Geo mean] 5.73s 5.72s -0.20% name old text-bytes new text-bytes delta HelloSize 579kB ± 0% 579kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.8kB ± 0% 72.8kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. There is little change in the go1 benchmark (excluding the noise). name old time/op new time/op delta BinaryTree17-4 40.3s ± 1% 40.6s ± 1% +0.80% (p=0.000 n=30+30) Fannkuch11-4 24.2s ± 1% 24.1s ± 0% ~ (p=0.093 n=30+30) FmtFprintfEmpty-4 834ns ± 0% 826ns ± 0% -0.93% (p=0.000 n=29+24) FmtFprintfString-4 1.39µs ± 1% 1.36µs ± 0% -2.02% (p=0.000 n=30+30) FmtFprintfInt-4 1.43µs ± 1% 1.44µs ± 1% ~ (p=0.155 n=30+29) FmtFprintfIntInt-4 2.09µs ± 0% 2.11µs ± 0% +1.16% (p=0.000 n=28+30) FmtFprintfPrefixedInt-4 2.33µs ± 1% 2.36µs ± 0% +1.25% (p=0.000 n=30+30) FmtFprintfFloat-4 4.27µs ± 1% 4.32µs ± 1% +1.27% (p=0.000 n=30+30) FmtManyArgs-4 8.18µs ± 0% 8.14µs ± 0% -0.46% (p=0.000 n=25+27) GobDecode-4 101ms ± 1% 101ms ± 1% ~ (p=0.182 n=29+29) GobEncode-4 89.6ms ± 1% 87.8ms ± 2% -2.02% (p=0.000 n=30+29) Gzip-4 4.07s ± 1% 4.08s ± 1% ~ (p=0.173 n=30+27) Gunzip-4 602ms ± 1% 600ms ± 1% -0.29% (p=0.000 n=29+28) HTTPClientServer-4 679µs ± 4% 683µs ± 3% ~ (p=0.197 n=30+30) JSONEncode-4 241ms ± 1% 239ms ± 1% -0.84% (p=0.000 n=30+30) JSONDecode-4 903ms ± 1% 882ms ± 1% -2.33% (p=0.000 n=30+30) Mandelbrot200-4 41.8ms ± 0% 41.8ms ± 0% ~ (p=0.719 n=30+30) GoParse-4 45.5ms ± 1% 45.8ms ± 1% +0.52% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 1.27µs ± 1% 1.27µs ± 0% -0.60% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 7.77µs ± 6% 7.69µs ± 4% -0.96% (p=0.040 n=30+30) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 10.3µs ± 6% 10.2µs ± 3% ~ (p=0.453 n=30+27) RegexpMatchMedium_32-4 1.98µs ± 1% 2.00µs ± 1% +0.85% (p=0.000 n=30+29) RegexpMatchMedium_1K-4 503µs ± 0% 503µs ± 1% ~ (p=0.752 n=30+30) RegexpMatchHard_32-4 27.1µs ± 1% 26.5µs ± 0% -1.96% (p=0.000 n=30+24) RegexpMatchHard_1K-4 809µs ± 1% 799µs ± 1% -1.29% (p=0.000 n=29+30) Revcomp-4 67.3ms ± 2% 67.2ms ± 1% ~ (p=0.265 n=29+29) Template-4 1.08s ± 1% 1.07s ± 0% -1.39% (p=0.000 n=30+22) TimeParse-4 6.93µs ± 1% 6.96µs ± 1% +0.40% (p=0.005 n=30+30) TimeFormat-4 13.3µs ± 0% 13.3µs ± 1% ~ (p=0.734 n=30+30) [Geo mean] 709µs 707µs -0.32% name old speed new speed delta GobDecode-4 7.59MB/s ± 1% 7.57MB/s ± 1% ~ (p=0.145 n=29+29) GobEncode-4 8.56MB/s ± 1% 8.74MB/s ± 1% +2.07% (p=0.000 n=30+29) Gzip-4 4.76MB/s ± 1% 4.75MB/s ± 1% -0.25% (p=0.037 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.3MB/s ± 1% +0.29% (p=0.000 n=29+28) JSONEncode-4 8.04MB/s ± 1% 8.11MB/s ± 1% +0.85% (p=0.000 n=30+30) JSONDecode-4 2.15MB/s ± 1% 2.20MB/s ± 1% +2.29% (p=0.000 n=30+30) GoParse-4 1.27MB/s ± 1% 1.26MB/s ± 1% -0.73% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 25.1MB/s ± 1% 25.3MB/s ± 0% +0.61% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 131MB/s ± 6% 133MB/s ± 4% +1.35% (p=0.009 n=28+30) RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.0MB/s ± 1% +0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 99.2MB/s ± 6% 100.2MB/s ± 3% ~ (p=0.448 n=30+27) RegexpMatchMedium_32-4 503kB/s ± 1% 500kB/s ± 0% -0.66% (p=0.002 n=30+24) RegexpMatchMedium_1K-4 2.04MB/s ± 0% 2.04MB/s ± 1% ~ (p=0.358 n=30+30) RegexpMatchHard_32-4 1.18MB/s ± 1% 1.20MB/s ± 1% +1.75% (p=0.000 n=30+30) RegexpMatchHard_1K-4 1.26MB/s ± 1% 1.28MB/s ± 1% +1.42% (p=0.000 n=30+30) Revcomp-4 37.8MB/s ± 2% 37.8MB/s ± 1% ~ (p=0.266 n=29+29) Template-4 1.80MB/s ± 1% 1.82MB/s ± 1% +1.46% (p=0.000 n=30+30) [Geo mean] 6.91MB/s 6.96MB/s +0.70% fixes #21583 Change-Id: I24065a80588ccae7de3ad732a3cfb0026cf7e214 Reviewed-on: https://go-review.googlesource.com/67490 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-02 03:09:28 +00:00
case ssa.OpARMCMPshiftRAreg, ssa.OpARMCMNshiftRAreg, ssa.OpARMTSTshiftRAreg, ssa.OpARMTEQshiftRAreg:
genregshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Args[2].Reg(), 0, arm.SHIFT_AR)
case ssa.OpARMMOVWaddr:
p := s.Prog(arm.AMOVW)
p.From.Type = obj.TYPE_ADDR
p.From.Reg = v.Args[0].Reg()
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
var wantreg string
// MOVW $sym+off(base), R
// the assembler expands it as the following:
// - base is SP: add constant offset to SP (R13)
// when constant is large, tmp register (R11) may be used
// - base is SB: load external address from constant pool (use relocation)
switch v.Aux.(type) {
default:
v.Fatalf("aux is of unknown type %T", v.Aux)
case *obj.LSym:
wantreg = "SB"
gc.AddAux(&p.From, v)
case *gc.Node:
wantreg = "SP"
gc.AddAux(&p.From, v)
case nil:
// No sym, just MOVW $off(SP), R
wantreg = "SP"
p.From.Offset = v.AuxInt
}
if reg := v.Args[0].RegName(); reg != wantreg {
v.Fatalf("bad reg %s for symbol type %T, want %s", reg, v.Aux, wantreg)
}
case ssa.OpARMMOVBload,
ssa.OpARMMOVBUload,
ssa.OpARMMOVHload,
ssa.OpARMMOVHUload,
ssa.OpARMMOVWload,
ssa.OpARMMOVFload,
ssa.OpARMMOVDload:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_MEM
p.From.Reg = v.Args[0].Reg()
gc.AddAux(&p.From, v)
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpARMMOVBstore,
ssa.OpARMMOVHstore,
ssa.OpARMMOVWstore,
ssa.OpARMMOVFstore,
ssa.OpARMMOVDstore:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[1].Reg()
p.To.Type = obj.TYPE_MEM
p.To.Reg = v.Args[0].Reg()
gc.AddAux(&p.To, v)
cmd/compile: optimize ARM with more efficient MOVB/MOVBU/MOVH/MOVHU Like the indexed MOVW (MOVWloadidx/MOVWstoreidx) used in current ARM backend, the indexed MOVB/MOVBU/MOVH/MOVHU can also be used to generate further optimized ARM code. My patch implements this optimization. Here are some contrast test results against the original go compiler. 1. The total size of all .a files in pkg/ shrinks by 0.03%. 2. The compilecmp benchmark shows a little decline. name old time/op new time/op delta Template 2.35s ± 1% 2.37s ± 3% +0.94% (p=0.006 n=19+19) Unicode 1.33s ± 3% 1.33s ± 2% ~ (p=0.158 n=20+18) GoTypes 7.86s ± 2% 7.84s ± 1% ~ (p=0.284 n=19+18) Compiler 37.5s ± 1% 37.7s ± 2% ~ (p=0.101 n=20+19) SSA 83.4s ± 2% 83.6s ± 2% ~ (p=0.231 n=20+20) Flate 1.46s ± 2% 1.45s ± 1% ~ (p=0.097 n=20+17) GoParser 1.86s ± 2% 1.86s ± 4% ~ (p=0.738 n=20+20) Reflect 5.10s ± 1% 5.11s ± 1% ~ (p=0.290 n=20+18) Tar 1.78s ± 2% 1.77s ± 2% ~ (p=0.166 n=19+20) XML 2.61s ± 2% 2.61s ± 2% ~ (p=0.665 n=19+19) [Geo mean] 4.67s 4.68s +0.16% name old user-time/op new user-time/op delta Template 2.79s ± 3% 2.80s ± 2% ~ (p=0.662 n=20+20) Unicode 1.62s ± 3% 1.64s ± 4% ~ (p=0.252 n=20+20) GoTypes 9.58s ± 2% 9.62s ± 2% ~ (p=0.250 n=20+20) Compiler 46.2s ± 1% 46.2s ± 1% ~ (p=0.602 n=20+19) SSA 108s ± 1% 108s ± 2% ~ (p=0.242 n=18+20) Flate 1.69s ± 3% 1.69s ± 4% ~ (p=0.470 n=20+20) GoParser 2.16s ± 3% 2.20s ± 4% +1.70% (p=0.005 n=19+20) Reflect 6.02s ± 2% 6.02s ± 2% ~ (p=0.700 n=20+17) Tar 2.11s ± 2% 2.11s ± 3% ~ (p=0.480 n=18+20) XML 3.07s ± 2% 3.11s ± 4% +1.50% (p=0.043 n=20+20) [Geo mean] 5.61s 5.64s +0.55% name old text-bytes new text-bytes delta HelloSize 586kB ± 0% 586kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. The go1 benchmark shows improvement totally, and even more than 10% improvement in the test case Revcomp. name old time/op new time/op delta BinaryTree17-4 42.0s ± 1% 41.5s ± 1% -1.32% (p=0.000 n=39+40) Fannkuch11-4 24.1s ± 1% 23.6s ± 0% -2.38% (p=0.000 n=40+40) FmtFprintfEmpty-4 843ns ± 0% 839ns ± 1% -0.46% (p=0.000 n=33+40) FmtFprintfString-4 1.44µs ± 1% 1.37µs ± 1% -5.48% (p=0.000 n=40+35) FmtFprintfInt-4 1.44µs ± 1% 1.41µs ± 2% -1.50% (p=0.000 n=40+40) FmtFprintfIntInt-4 2.07µs ± 1% 2.06µs ± 0% -0.78% (p=0.000 n=40+40) FmtFprintfPrefixedInt-4 2.50µs ± 1% 2.33µs ± 1% -6.85% (p=0.000 n=40+40) FmtFprintfFloat-4 4.36µs ± 1% 4.34µs ± 0% -0.39% (p=0.017 n=40+40) FmtManyArgs-4 8.11µs ± 0% 8.00µs ± 0% -1.37% (p=0.000 n=40+40) GobDecode-4 105ms ± 2% 103ms ± 2% -2.17% (p=0.000 n=39+39) GobEncode-4 90.1ms ± 2% 88.6ms ± 1% -1.67% (p=0.000 n=40+39) Gzip-4 4.18s ± 1% 4.09s ± 1% -2.03% (p=0.000 n=40+40) Gunzip-4 608ms ± 1% 603ms ± 1% -0.86% (p=0.000 n=40+34) HTTPClientServer-4 674µs ± 3% 661µs ± 2% -1.82% (p=0.000 n=40+39) JSONEncode-4 256ms ± 1% 243ms ± 0% -5.11% (p=0.000 n=39+31) JSONDecode-4 915ms ± 1% 904ms ± 1% -1.18% (p=0.000 n=40+36) Mandelbrot200-4 49.2ms ± 0% 49.3ms ± 0% ~ (p=0.254 n=34+40) GoParse-4 46.9ms ± 2% 46.9ms ± 1% ~ (p=0.737 n=40+39) RegexpMatchEasy0_32-4 1.28µs ± 1% 1.27µs ± 1% -0.71% (p=0.000 n=40+40) RegexpMatchEasy0_1K-4 7.86µs ± 4% 7.67µs ± 4% -2.46% (p=0.000 n=38+40) RegexpMatchEasy1_32-4 1.28µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=40+40) RegexpMatchEasy1_1K-4 10.4µs ± 2% 10.3µs ± 2% -0.88% (p=0.003 n=40+39) RegexpMatchMedium_32-4 2.05µs ± 0% 2.04µs ± 0% -0.34% (p=0.000 n=40+33) RegexpMatchMedium_1K-4 541µs ± 1% 535µs ± 1% -1.02% (p=0.000 n=40+38) RegexpMatchHard_32-4 29.3µs ± 1% 29.1µs ± 1% -0.51% (p=0.000 n=40+40) RegexpMatchHard_1K-4 881µs ± 1% 871µs ± 1% -1.15% (p=0.000 n=40+40) Revcomp-4 81.7ms ± 2% 67.5ms ± 2% -17.37% (p=0.000 n=39+39) Template-4 1.05s ± 1% 1.08s ± 2% +3.67% (p=0.000 n=40+40) TimeParse-4 7.24µs ± 1% 7.09µs ± 1% -2.13% (p=0.000 n=40+40) TimeFormat-4 13.2µs ± 1% 13.1µs ± 0% -0.31% (p=0.007 n=40+31) [Geo mean] 733µs 718µs -2.03% name old speed new speed delta GobDecode-4 7.28MB/s ± 2% 7.44MB/s ± 2% +2.23% (p=0.000 n=39+39) GobEncode-4 8.52MB/s ± 2% 8.67MB/s ± 1% +1.70% (p=0.000 n=40+39) Gzip-4 4.65MB/s ± 1% 4.74MB/s ± 1% +1.94% (p=0.000 n=37+40) Gunzip-4 31.9MB/s ± 1% 32.2MB/s ± 1% +0.90% (p=0.000 n=40+36) JSONEncode-4 7.57MB/s ± 1% 7.98MB/s ± 0% +5.41% (p=0.000 n=40+31) JSONDecode-4 2.12MB/s ± 1% 2.15MB/s ± 1% +1.23% (p=0.000 n=40+40) GoParse-4 1.23MB/s ± 1% 1.23MB/s ± 1% ~ (p=0.769 n=39+40) RegexpMatchEasy0_32-4 25.0MB/s ± 1% 25.2MB/s ± 1% +0.71% (p=0.000 n=40+40) RegexpMatchEasy0_1K-4 130MB/s ± 5% 134MB/s ± 4% +2.53% (p=0.000 n=38+40) RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.1MB/s ± 1% +0.55% (p=0.000 n=40+40) RegexpMatchEasy1_1K-4 98.5MB/s ± 2% 99.4MB/s ± 2% +0.88% (p=0.003 n=40+39) RegexpMatchMedium_32-4 490kB/s ± 0% 490kB/s ± 0% ~ (all equal) RegexpMatchMedium_1K-4 1.89MB/s ± 1% 1.91MB/s ± 1% +1.02% (p=0.000 n=40+38) RegexpMatchHard_32-4 1.10MB/s ± 1% 1.10MB/s ± 0% +0.41% (p=0.000 n=40+33) RegexpMatchHard_1K-4 1.16MB/s ± 1% 1.17MB/s ± 1% +1.21% (p=0.000 n=40+40) Revcomp-4 31.1MB/s ± 2% 37.6MB/s ± 2% +21.03% (p=0.000 n=39+39) Template-4 1.86MB/s ± 1% 1.79MB/s ± 1% -3.51% (p=0.000 n=40+38) [Geo mean] 6.66MB/s 6.80MB/s +2.13% fixes #21492 Change-Id: Ia26e7ca393f0a5f31de240e8ff9a220453ca7e0d Reviewed-on: https://go-review.googlesource.com/58450 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-24 10:51:34 +00:00
case ssa.OpARMMOVWloadidx, ssa.OpARMMOVBUloadidx, ssa.OpARMMOVBloadidx, ssa.OpARMMOVHUloadidx, ssa.OpARMMOVHloadidx:
// this is just shift 0 bits
fallthrough
case ssa.OpARMMOVWloadshiftLL:
p := genshift(s, v.Op.Asm(), 0, v.Args[1].Reg(), v.Reg(), arm.SHIFT_LL, v.AuxInt)
p.From.Reg = v.Args[0].Reg()
case ssa.OpARMMOVWloadshiftRL:
p := genshift(s, v.Op.Asm(), 0, v.Args[1].Reg(), v.Reg(), arm.SHIFT_LR, v.AuxInt)
p.From.Reg = v.Args[0].Reg()
case ssa.OpARMMOVWloadshiftRA:
p := genshift(s, v.Op.Asm(), 0, v.Args[1].Reg(), v.Reg(), arm.SHIFT_AR, v.AuxInt)
p.From.Reg = v.Args[0].Reg()
cmd/compile: optimize ARM with more efficient MOVB/MOVBU/MOVH/MOVHU Like the indexed MOVW (MOVWloadidx/MOVWstoreidx) used in current ARM backend, the indexed MOVB/MOVBU/MOVH/MOVHU can also be used to generate further optimized ARM code. My patch implements this optimization. Here are some contrast test results against the original go compiler. 1. The total size of all .a files in pkg/ shrinks by 0.03%. 2. The compilecmp benchmark shows a little decline. name old time/op new time/op delta Template 2.35s ± 1% 2.37s ± 3% +0.94% (p=0.006 n=19+19) Unicode 1.33s ± 3% 1.33s ± 2% ~ (p=0.158 n=20+18) GoTypes 7.86s ± 2% 7.84s ± 1% ~ (p=0.284 n=19+18) Compiler 37.5s ± 1% 37.7s ± 2% ~ (p=0.101 n=20+19) SSA 83.4s ± 2% 83.6s ± 2% ~ (p=0.231 n=20+20) Flate 1.46s ± 2% 1.45s ± 1% ~ (p=0.097 n=20+17) GoParser 1.86s ± 2% 1.86s ± 4% ~ (p=0.738 n=20+20) Reflect 5.10s ± 1% 5.11s ± 1% ~ (p=0.290 n=20+18) Tar 1.78s ± 2% 1.77s ± 2% ~ (p=0.166 n=19+20) XML 2.61s ± 2% 2.61s ± 2% ~ (p=0.665 n=19+19) [Geo mean] 4.67s 4.68s +0.16% name old user-time/op new user-time/op delta Template 2.79s ± 3% 2.80s ± 2% ~ (p=0.662 n=20+20) Unicode 1.62s ± 3% 1.64s ± 4% ~ (p=0.252 n=20+20) GoTypes 9.58s ± 2% 9.62s ± 2% ~ (p=0.250 n=20+20) Compiler 46.2s ± 1% 46.2s ± 1% ~ (p=0.602 n=20+19) SSA 108s ± 1% 108s ± 2% ~ (p=0.242 n=18+20) Flate 1.69s ± 3% 1.69s ± 4% ~ (p=0.470 n=20+20) GoParser 2.16s ± 3% 2.20s ± 4% +1.70% (p=0.005 n=19+20) Reflect 6.02s ± 2% 6.02s ± 2% ~ (p=0.700 n=20+17) Tar 2.11s ± 2% 2.11s ± 3% ~ (p=0.480 n=18+20) XML 3.07s ± 2% 3.11s ± 4% +1.50% (p=0.043 n=20+20) [Geo mean] 5.61s 5.64s +0.55% name old text-bytes new text-bytes delta HelloSize 586kB ± 0% 586kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. The go1 benchmark shows improvement totally, and even more than 10% improvement in the test case Revcomp. name old time/op new time/op delta BinaryTree17-4 42.0s ± 1% 41.5s ± 1% -1.32% (p=0.000 n=39+40) Fannkuch11-4 24.1s ± 1% 23.6s ± 0% -2.38% (p=0.000 n=40+40) FmtFprintfEmpty-4 843ns ± 0% 839ns ± 1% -0.46% (p=0.000 n=33+40) FmtFprintfString-4 1.44µs ± 1% 1.37µs ± 1% -5.48% (p=0.000 n=40+35) FmtFprintfInt-4 1.44µs ± 1% 1.41µs ± 2% -1.50% (p=0.000 n=40+40) FmtFprintfIntInt-4 2.07µs ± 1% 2.06µs ± 0% -0.78% (p=0.000 n=40+40) FmtFprintfPrefixedInt-4 2.50µs ± 1% 2.33µs ± 1% -6.85% (p=0.000 n=40+40) FmtFprintfFloat-4 4.36µs ± 1% 4.34µs ± 0% -0.39% (p=0.017 n=40+40) FmtManyArgs-4 8.11µs ± 0% 8.00µs ± 0% -1.37% (p=0.000 n=40+40) GobDecode-4 105ms ± 2% 103ms ± 2% -2.17% (p=0.000 n=39+39) GobEncode-4 90.1ms ± 2% 88.6ms ± 1% -1.67% (p=0.000 n=40+39) Gzip-4 4.18s ± 1% 4.09s ± 1% -2.03% (p=0.000 n=40+40) Gunzip-4 608ms ± 1% 603ms ± 1% -0.86% (p=0.000 n=40+34) HTTPClientServer-4 674µs ± 3% 661µs ± 2% -1.82% (p=0.000 n=40+39) JSONEncode-4 256ms ± 1% 243ms ± 0% -5.11% (p=0.000 n=39+31) JSONDecode-4 915ms ± 1% 904ms ± 1% -1.18% (p=0.000 n=40+36) Mandelbrot200-4 49.2ms ± 0% 49.3ms ± 0% ~ (p=0.254 n=34+40) GoParse-4 46.9ms ± 2% 46.9ms ± 1% ~ (p=0.737 n=40+39) RegexpMatchEasy0_32-4 1.28µs ± 1% 1.27µs ± 1% -0.71% (p=0.000 n=40+40) RegexpMatchEasy0_1K-4 7.86µs ± 4% 7.67µs ± 4% -2.46% (p=0.000 n=38+40) RegexpMatchEasy1_32-4 1.28µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=40+40) RegexpMatchEasy1_1K-4 10.4µs ± 2% 10.3µs ± 2% -0.88% (p=0.003 n=40+39) RegexpMatchMedium_32-4 2.05µs ± 0% 2.04µs ± 0% -0.34% (p=0.000 n=40+33) RegexpMatchMedium_1K-4 541µs ± 1% 535µs ± 1% -1.02% (p=0.000 n=40+38) RegexpMatchHard_32-4 29.3µs ± 1% 29.1µs ± 1% -0.51% (p=0.000 n=40+40) RegexpMatchHard_1K-4 881µs ± 1% 871µs ± 1% -1.15% (p=0.000 n=40+40) Revcomp-4 81.7ms ± 2% 67.5ms ± 2% -17.37% (p=0.000 n=39+39) Template-4 1.05s ± 1% 1.08s ± 2% +3.67% (p=0.000 n=40+40) TimeParse-4 7.24µs ± 1% 7.09µs ± 1% -2.13% (p=0.000 n=40+40) TimeFormat-4 13.2µs ± 1% 13.1µs ± 0% -0.31% (p=0.007 n=40+31) [Geo mean] 733µs 718µs -2.03% name old speed new speed delta GobDecode-4 7.28MB/s ± 2% 7.44MB/s ± 2% +2.23% (p=0.000 n=39+39) GobEncode-4 8.52MB/s ± 2% 8.67MB/s ± 1% +1.70% (p=0.000 n=40+39) Gzip-4 4.65MB/s ± 1% 4.74MB/s ± 1% +1.94% (p=0.000 n=37+40) Gunzip-4 31.9MB/s ± 1% 32.2MB/s ± 1% +0.90% (p=0.000 n=40+36) JSONEncode-4 7.57MB/s ± 1% 7.98MB/s ± 0% +5.41% (p=0.000 n=40+31) JSONDecode-4 2.12MB/s ± 1% 2.15MB/s ± 1% +1.23% (p=0.000 n=40+40) GoParse-4 1.23MB/s ± 1% 1.23MB/s ± 1% ~ (p=0.769 n=39+40) RegexpMatchEasy0_32-4 25.0MB/s ± 1% 25.2MB/s ± 1% +0.71% (p=0.000 n=40+40) RegexpMatchEasy0_1K-4 130MB/s ± 5% 134MB/s ± 4% +2.53% (p=0.000 n=38+40) RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.1MB/s ± 1% +0.55% (p=0.000 n=40+40) RegexpMatchEasy1_1K-4 98.5MB/s ± 2% 99.4MB/s ± 2% +0.88% (p=0.003 n=40+39) RegexpMatchMedium_32-4 490kB/s ± 0% 490kB/s ± 0% ~ (all equal) RegexpMatchMedium_1K-4 1.89MB/s ± 1% 1.91MB/s ± 1% +1.02% (p=0.000 n=40+38) RegexpMatchHard_32-4 1.10MB/s ± 1% 1.10MB/s ± 0% +0.41% (p=0.000 n=40+33) RegexpMatchHard_1K-4 1.16MB/s ± 1% 1.17MB/s ± 1% +1.21% (p=0.000 n=40+40) Revcomp-4 31.1MB/s ± 2% 37.6MB/s ± 2% +21.03% (p=0.000 n=39+39) Template-4 1.86MB/s ± 1% 1.79MB/s ± 1% -3.51% (p=0.000 n=40+38) [Geo mean] 6.66MB/s 6.80MB/s +2.13% fixes #21492 Change-Id: Ia26e7ca393f0a5f31de240e8ff9a220453ca7e0d Reviewed-on: https://go-review.googlesource.com/58450 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-24 10:51:34 +00:00
case ssa.OpARMMOVWstoreidx, ssa.OpARMMOVBstoreidx, ssa.OpARMMOVHstoreidx:
// this is just shift 0 bits
fallthrough
case ssa.OpARMMOVWstoreshiftLL:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[2].Reg()
p.To.Type = obj.TYPE_SHIFT
p.To.Reg = v.Args[0].Reg()
p.To.Offset = int64(makeshift(v.Args[1].Reg(), arm.SHIFT_LL, v.AuxInt))
case ssa.OpARMMOVWstoreshiftRL:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[2].Reg()
p.To.Type = obj.TYPE_SHIFT
p.To.Reg = v.Args[0].Reg()
p.To.Offset = int64(makeshift(v.Args[1].Reg(), arm.SHIFT_LR, v.AuxInt))
case ssa.OpARMMOVWstoreshiftRA:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[2].Reg()
p.To.Type = obj.TYPE_SHIFT
p.To.Reg = v.Args[0].Reg()
p.To.Offset = int64(makeshift(v.Args[1].Reg(), arm.SHIFT_AR, v.AuxInt))
case ssa.OpARMMOVBreg,
ssa.OpARMMOVBUreg,
ssa.OpARMMOVHreg,
ssa.OpARMMOVHUreg:
a := v.Args[0]
for a.Op == ssa.OpCopy || a.Op == ssa.OpARMMOVWreg || a.Op == ssa.OpARMMOVWnop {
a = a.Args[0]
}
if a.Op == ssa.OpLoadReg {
t := a.Type
switch {
case v.Op == ssa.OpARMMOVBreg && t.Size() == 1 && t.IsSigned(),
v.Op == ssa.OpARMMOVBUreg && t.Size() == 1 && !t.IsSigned(),
v.Op == ssa.OpARMMOVHreg && t.Size() == 2 && t.IsSigned(),
v.Op == ssa.OpARMMOVHUreg && t.Size() == 2 && !t.IsSigned():
// arg is a proper-typed load, already zero/sign-extended, don't extend again
if v.Reg() == v.Args[0].Reg() {
return
}
p := s.Prog(arm.AMOVW)
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
return
default:
}
}
cmd/compile: optimize MOVBS/MOVBU/MOVHS/MOVHU on ARMv6 and ARMv7 MOVBS/MOVBU/MOVHS/MOVHU can be optimized with a single instruction on ARMv6 and ARMv7, instead of a pair of left/right shifts. The benchmark tests show big improvement in special cases and a little improvement in total. 1. A special case gets about 29% improvement. name old time/op new time/op delta TypePro-4 3.81ms ± 1% 2.71ms ± 1% -28.97% (p=0.000 n=26+25) The source code of this case can be found at https://github.com/benshi001/ugo1/blob/master/typepromotion_test.go 2. There is a little improvement in the go1 benchmark, excluding the noise. name old time/op new time/op delta BinaryTree17-4 42.1s ± 3% 42.1s ± 2% ~ (p=0.883 n=28+30) Fannkuch11-4 24.3s ± 4% 24.7s ± 7% +1.64% (p=0.026 n=30+30) FmtFprintfEmpty-4 833ns ± 2% 835ns ± 2% ~ (p=0.371 n=26+28) FmtFprintfString-4 1.36µs ± 3% 1.35µs ± 1% ~ (p=0.202 n=26+23) FmtFprintfInt-4 1.42µs ± 3% 1.43µs ± 1% +0.66% (p=0.000 n=26+27) FmtFprintfIntInt-4 2.10µs ± 1% 2.10µs ± 2% ~ (p=0.104 n=25+26) FmtFprintfPrefixedInt-4 2.37µs ± 2% 2.33µs ± 1% -1.75% (p=0.000 n=25+28) FmtFprintfFloat-4 4.50µs ± 0% 4.37µs ± 1% -2.81% (p=0.000 n=23+25) FmtManyArgs-4 8.08µs ± 0% 8.13µs ± 3% ~ (p=0.160 n=23+26) GobDecode-4 102ms ± 4% 103ms ± 4% +1.08% (p=0.001 n=28+26) GobEncode-4 96.0ms ± 2% 95.2ms ± 3% -0.81% (p=0.000 n=24+25) Gzip-4 4.17s ± 3% 4.11s ± 2% -1.45% (p=0.000 n=25+25) Gunzip-4 597ms ± 2% 594ms ± 2% -0.57% (p=0.000 n=24+26) HTTPClientServer-4 708µs ± 4% 708µs ± 4% ~ (p=0.852 n=28+28) JSONEncode-4 241ms ± 1% 245ms ± 3% +1.62% (p=0.000 n=27+28) JSONDecode-4 906ms ± 3% 889ms ± 3% -1.85% (p=0.000 n=23+24) Mandelbrot200-4 41.8ms ± 1% 41.8ms ± 1% ~ (p=0.929 n=25+24) GoParse-4 47.1ms ± 2% 45.3ms ± 4% -3.80% (p=0.000 n=28+24) RegexpMatchEasy0_32-4 1.27µs ± 2% 1.28µs ± 1% +0.77% (p=0.000 n=26+28) RegexpMatchEasy0_1K-4 8.08µs ± 9% 7.83µs ±10% -3.10% (p=0.012 n=26+26) RegexpMatchEasy1_32-4 1.29µs ± 5% 1.29µs ± 2% ~ (p=0.301 n=26+29) RegexpMatchEasy1_1K-4 10.5µs ± 4% 10.3µs ± 5% -1.95% (p=0.003 n=26+26) RegexpMatchMedium_32-4 1.94µs ± 1% 1.95µs ± 1% ~ (p=0.251 n=24+27) RegexpMatchMedium_1K-4 502µs ± 2% 502µs ± 2% ~ (p=0.336 n=25+28) RegexpMatchHard_32-4 26.7µs ± 1% 26.6µs ± 3% ~ (p=0.454 n=27+26) RegexpMatchHard_1K-4 801µs ± 3% 799µs ± 2% ~ (p=0.097 n=24+26) Revcomp-4 73.5ms ± 5% 73.2ms ± 3% ~ (p=0.240 n=26+26) Template-4 1.07s ± 2% 1.05s ± 1% -2.39% (p=0.000 n=26+24) TimeParse-4 6.87µs ± 1% 6.85µs ± 1% ~ (p=0.094 n=28+23) TimeFormat-4 13.4µs ± 1% 13.4µs ± 1% ~ (p=0.664 n=25+29) [Geo mean] 717µs 713µs -0.54% name old speed new speed delta GobDecode-4 7.52MB/s ± 4% 7.44MB/s ± 4% -1.10% (p=0.001 n=28+26) GobEncode-4 7.99MB/s ± 2% 8.06MB/s ± 3% +0.81% (p=0.000 n=24+25) Gzip-4 4.66MB/s ± 3% 4.72MB/s ± 2% +1.43% (p=0.000 n=25+25) Gunzip-4 32.5MB/s ± 2% 32.7MB/s ± 2% +0.56% (p=0.001 n=24+26) JSONEncode-4 8.04MB/s ± 1% 7.92MB/s ± 3% -1.59% (p=0.000 n=27+28) JSONDecode-4 2.14MB/s ± 3% 2.18MB/s ± 3% +1.90% (p=0.000 n=23+24) GoParse-4 1.23MB/s ± 3% 1.28MB/s ± 4% +4.23% (p=0.000 n=30+24) RegexpMatchEasy0_32-4 25.2MB/s ± 2% 25.0MB/s ± 1% -0.76% (p=0.000 n=26+28) RegexpMatchEasy0_1K-4 127MB/s ± 8% 131MB/s ± 9% +3.29% (p=0.012 n=26+26) RegexpMatchEasy1_32-4 24.8MB/s ± 5% 24.8MB/s ± 2% ~ (p=0.339 n=26+29) RegexpMatchEasy1_1K-4 97.9MB/s ± 4% 99.8MB/s ± 5% +1.98% (p=0.004 n=26+26) RegexpMatchMedium_32-4 514kB/s ± 3% 515kB/s ± 3% ~ (p=0.391 n=28+28) RegexpMatchMedium_1K-4 2.04MB/s ± 2% 2.04MB/s ± 2% ~ (p=0.517 n=25+28) RegexpMatchHard_32-4 1.20MB/s ± 3% 1.20MB/s ± 3% ~ (p=0.203 n=28+28) RegexpMatchHard_1K-4 1.28MB/s ± 3% 1.28MB/s ± 2% ~ (p=0.499 n=24+26) Revcomp-4 34.6MB/s ± 4% 34.7MB/s ± 3% ~ (p=0.245 n=26+26) Template-4 1.81MB/s ± 2% 1.85MB/s ± 3% +2.30% (p=0.000 n=26+25) [Geo mean] 6.82MB/s 6.88MB/s +0.84% fixes #20653 Change-Id: Ief0d6e726e517e51ae511325b21ee72598e759ff Reviewed-on: https://go-review.googlesource.com/71992 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-20 03:50:15 +00:00
if objabi.GOARM >= 6 {
// generate more efficient "MOVB/MOVBU/MOVH/MOVHU Reg@>0, Reg" on ARMv6 & ARMv7
genshift(s, v.Op.Asm(), 0, v.Args[0].Reg(), v.Reg(), arm.SHIFT_RR, 0)
return
}
fallthrough
case ssa.OpARMMVN,
ssa.OpARMCLZ,
ssa.OpARMREV,
ssa.OpARMREV16,
ssa.OpARMRBIT,
ssa.OpARMSQRTD,
ssa.OpARMNEGF,
ssa.OpARMNEGD,
cmd/compile: optimize ARM's math.Abs This CL optimizes math.Abs to an inline ABSD instruction on ARM. The benchmark results of src/math/ show big improvements. name old time/op new time/op delta Acos-4 181ns ± 0% 182ns ± 0% +0.30% (p=0.000 n=40+40) Acosh-4 202ns ± 0% 202ns ± 0% ~ (all equal) Asin-4 163ns ± 0% 163ns ± 0% ~ (all equal) Asinh-4 242ns ± 0% 242ns ± 0% ~ (all equal) Atan-4 120ns ± 0% 121ns ± 0% +0.83% (p=0.000 n=40+40) Atanh-4 202ns ± 0% 202ns ± 0% ~ (all equal) Atan2-4 173ns ± 0% 173ns ± 0% ~ (all equal) Cbrt-4 1.06µs ± 0% 1.06µs ± 0% +0.09% (p=0.000 n=39+37) Ceil-4 72.9ns ± 0% 72.8ns ± 0% ~ (p=0.237 n=40+40) Copysign-4 13.2ns ± 0% 13.2ns ± 0% ~ (all equal) Cos-4 193ns ± 0% 183ns ± 0% -5.18% (p=0.000 n=40+40) Cosh-4 254ns ± 0% 239ns ± 0% -5.91% (p=0.000 n=40+40) Erf-4 112ns ± 0% 112ns ± 0% ~ (all equal) Erfc-4 117ns ± 0% 117ns ± 0% ~ (all equal) Erfinv-4 127ns ± 0% 127ns ± 1% ~ (p=0.492 n=40+40) Erfcinv-4 128ns ± 0% 128ns ± 0% ~ (all equal) Exp-4 212ns ± 0% 206ns ± 0% -3.05% (p=0.000 n=40+40) ExpGo-4 216ns ± 0% 209ns ± 0% -3.24% (p=0.000 n=40+40) Expm1-4 142ns ± 0% 142ns ± 0% ~ (all equal) Exp2-4 191ns ± 0% 184ns ± 0% -3.45% (p=0.000 n=40+40) Exp2Go-4 194ns ± 0% 187ns ± 0% -3.61% (p=0.000 n=40+40) Abs-4 14.4ns ± 0% 6.3ns ± 0% -56.39% (p=0.000 n=38+39) Dim-4 12.6ns ± 0% 12.6ns ± 0% ~ (all equal) Floor-4 49.6ns ± 0% 49.6ns ± 0% ~ (all equal) Max-4 27.6ns ± 0% 27.6ns ± 0% ~ (all equal) Min-4 27.0ns ± 0% 27.0ns ± 0% ~ (all equal) Mod-4 349ns ± 0% 305ns ± 1% -12.55% (p=0.000 n=33+40) Frexp-4 54.0ns ± 0% 47.1ns ± 0% -12.78% (p=0.000 n=38+38) Gamma-4 242ns ± 0% 234ns ± 0% -3.16% (p=0.000 n=36+40) Hypot-4 84.8ns ± 0% 67.8ns ± 0% -20.05% (p=0.000 n=31+35) HypotGo-4 88.5ns ± 0% 71.6ns ± 0% -19.12% (p=0.000 n=40+38) Ilogb-4 45.8ns ± 0% 38.9ns ± 0% -15.12% (p=0.000 n=40+32) J0-4 821ns ± 0% 802ns ± 0% -2.33% (p=0.000 n=33+40) J1-4 816ns ± 0% 807ns ± 0% -1.05% (p=0.000 n=40+29) Jn-4 1.67µs ± 0% 1.65µs ± 0% -1.45% (p=0.000 n=40+39) Ldexp-4 61.5ns ± 0% 54.6ns ± 0% -11.27% (p=0.000 n=40+32) Lgamma-4 188ns ± 0% 188ns ± 0% ~ (all equal) Log-4 154ns ± 0% 147ns ± 0% -4.78% (p=0.000 n=40+40) Logb-4 50.9ns ± 0% 42.7ns ± 0% -16.11% (p=0.000 n=34+39) Log1p-4 160ns ± 0% 159ns ± 0% ~ (p=0.828 n=40+40) Log10-4 173ns ± 0% 166ns ± 0% -4.05% (p=0.000 n=40+40) Log2-4 65.3ns ± 0% 58.4ns ± 0% -10.57% (p=0.000 n=37+37) Modf-4 36.4ns ± 0% 36.4ns ± 0% ~ (all equal) Nextafter32-4 36.4ns ± 0% 36.4ns ± 0% ~ (all equal) Nextafter64-4 32.7ns ± 0% 32.6ns ± 0% ~ (p=0.375 n=40+40) PowInt-4 300ns ± 0% 277ns ± 0% -7.78% (p=0.000 n=40+40) PowFrac-4 676ns ± 0% 635ns ± 0% -6.00% (p=0.000 n=40+35) Pow10Pos-4 17.6ns ± 0% 17.6ns ± 0% ~ (all equal) Pow10Neg-4 22.0ns ± 0% 22.0ns ± 0% ~ (all equal) Round-4 30.1ns ± 0% 30.1ns ± 0% ~ (all equal) RoundToEven-4 38.9ns ± 0% 38.9ns ± 0% ~ (all equal) Remainder-4 291ns ± 0% 263ns ± 0% -9.62% (p=0.000 n=40+40) Signbit-4 11.3ns ± 0% 11.3ns ± 0% ~ (all equal) Sin-4 185ns ± 0% 185ns ± 0% ~ (all equal) Sincos-4 230ns ± 0% 230ns ± 0% ~ (all equal) Sinh-4 253ns ± 0% 246ns ± 0% -2.77% (p=0.000 n=39+39) SqrtIndirect-4 41.4ns ± 0% 41.4ns ± 0% ~ (all equal) SqrtLatency-4 13.8ns ± 0% 13.8ns ± 0% ~ (all equal) SqrtIndirectLatency-4 37.0ns ± 0% 37.0ns ± 0% ~ (p=0.632 n=40+40) SqrtGoLatency-4 911ns ± 0% 911ns ± 0% +0.08% (p=0.000 n=40+40) SqrtPrime-4 13.2µs ± 0% 13.2µs ± 0% +0.01% (p=0.038 n=38+40) Tan-4 205ns ± 0% 205ns ± 0% ~ (all equal) Tanh-4 264ns ± 0% 247ns ± 0% -6.44% (p=0.000 n=39+32) Trunc-4 45.2ns ± 0% 45.2ns ± 0% ~ (all equal) Y0-4 796ns ± 0% 792ns ± 0% -0.55% (p=0.000 n=35+40) Y1-4 804ns ± 0% 797ns ± 0% -0.82% (p=0.000 n=24+40) Yn-4 1.64µs ± 0% 1.62µs ± 0% -1.27% (p=0.000 n=40+39) Float64bits-4 8.16ns ± 0% 8.16ns ± 0% +0.04% (p=0.000 n=35+40) Float64frombits-4 10.7ns ± 0% 10.7ns ± 0% ~ (all equal) Float32bits-4 7.53ns ± 0% 7.53ns ± 0% ~ (p=0.760 n=40+40) Float32frombits-4 6.91ns ± 0% 6.91ns ± 0% -0.04% (p=0.002 n=32+38) [Geo mean] 111ns 106ns -3.98% Change-Id: I54f4fd7f5160db020b430b556bde59cc0fdb996d Reviewed-on: https://go-review.googlesource.com/c/go/+/188678 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-08-02 02:41:59 +00:00
ssa.OpARMABSD,
ssa.OpARMMOVWF,
ssa.OpARMMOVWD,
ssa.OpARMMOVFW,
ssa.OpARMMOVDW,
ssa.OpARMMOVFD,
ssa.OpARMMOVDF:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpARMMOVWUF,
ssa.OpARMMOVWUD,
ssa.OpARMMOVFWU,
ssa.OpARMMOVDWU:
p := s.Prog(v.Op.Asm())
p.Scond = arm.C_UBIT
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpARMCMOVWHSconst:
p := s.Prog(arm.AMOVW)
p.Scond = arm.C_SCOND_HS
p.From.Type = obj.TYPE_CONST
p.From.Offset = v.AuxInt
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpARMCMOVWLSconst:
p := s.Prog(arm.AMOVW)
p.Scond = arm.C_SCOND_LS
p.From.Type = obj.TYPE_CONST
p.From.Offset = v.AuxInt
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpARMCALLstatic, ssa.OpARMCALLclosure, ssa.OpARMCALLinter:
s.Call(v)
case ssa.OpARMCALLudiv:
p := s.Prog(obj.ACALL)
p.To.Type = obj.TYPE_MEM
p.To.Name = obj.NAME_EXTERN
p.To.Sym = gc.Udiv
case ssa.OpARMLoweredWB:
p := s.Prog(obj.ACALL)
p.To.Type = obj.TYPE_MEM
p.To.Name = obj.NAME_EXTERN
p.To.Sym = v.Aux.(*obj.LSym)
case ssa.OpARMLoweredPanicBoundsA, ssa.OpARMLoweredPanicBoundsB, ssa.OpARMLoweredPanicBoundsC:
p := s.Prog(obj.ACALL)
p.To.Type = obj.TYPE_MEM
p.To.Name = obj.NAME_EXTERN
p.To.Sym = gc.BoundsCheckFunc[v.AuxInt]
s.UseArgs(8) // space used in callee args area by assembly stubs
case ssa.OpARMLoweredPanicExtendA, ssa.OpARMLoweredPanicExtendB, ssa.OpARMLoweredPanicExtendC:
p := s.Prog(obj.ACALL)
p.To.Type = obj.TYPE_MEM
p.To.Name = obj.NAME_EXTERN
p.To.Sym = gc.ExtendCheckFunc[v.AuxInt]
s.UseArgs(12) // space used in callee args area by assembly stubs
case ssa.OpARMDUFFZERO:
p := s.Prog(obj.ADUFFZERO)
p.To.Type = obj.TYPE_MEM
p.To.Name = obj.NAME_EXTERN
p.To.Sym = gc.Duffzero
p.To.Offset = v.AuxInt
case ssa.OpARMDUFFCOPY:
p := s.Prog(obj.ADUFFCOPY)
p.To.Type = obj.TYPE_MEM
p.To.Name = obj.NAME_EXTERN
p.To.Sym = gc.Duffcopy
p.To.Offset = v.AuxInt
case ssa.OpARMLoweredNilCheck:
// Issue a load which will fault if arg is nil.
p := s.Prog(arm.AMOVB)
p.From.Type = obj.TYPE_MEM
p.From.Reg = v.Args[0].Reg()
gc.AddAux(&p.From, v)
p.To.Type = obj.TYPE_REG
p.To.Reg = arm.REGTMP
if logopt.Enabled() {
logopt.LogOpt(v.Pos, "nilcheck", "genssa", v.Block.Func.Name)
}
[dev.regabi] cmd/compile: clean up debug flag (-d) handling [generated] The debug table is not as haphazard as flags, but there are still a few mismatches between command-line names and variable names. This CL moves them all into a consistent home (var Debug, like var Flag). Code updated automatically using the rf command below. A followup CL will make a few manual cleanups, leaving this CL completely automated and easier to regenerate during merge conflicts. [git-generate] cd src/cmd/compile/internal/gc rf ' add main.go var Debug struct{} mv Debug_append Debug.Append mv Debug_checkptr Debug.Checkptr mv Debug_closure Debug.Closure mv Debug_compilelater Debug.CompileLater mv disable_checknil Debug.DisableNil mv debug_dclstack Debug.DclStack mv Debug_gcprog Debug.GCProg mv Debug_libfuzzer Debug.Libfuzzer mv Debug_checknil Debug.Nil mv Debug_panic Debug.Panic mv Debug_slice Debug.Slice mv Debug_typeassert Debug.TypeAssert mv Debug_wb Debug.WB mv Debug_export Debug.Export mv Debug_pctab Debug.PCTab mv Debug_locationlist Debug.LocationLists mv Debug_typecheckinl Debug.TypecheckInl mv Debug_gendwarfinl Debug.DwarfInl mv Debug_softfloat Debug.SoftFloat mv Debug_defer Debug.Defer mv Debug_dumpptrs Debug.DumpPtrs mv flag.go:/parse.-d/-1,/unknown.debug/+2 parseDebug mv debugtab Debug parseDebug \ debugHelpHeader debugHelpFooter \ debug.go # Remove //go:generate line copied from main.go rm debug.go:/go:generate/-+ ' Change-Id: I625761ca5659be4052f7161a83baa00df75cca91 Reviewed-on: https://go-review.googlesource.com/c/go/+/272246 Trust: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-16 01:17:25 -05:00
if gc.Debug.Nil != 0 && v.Pos.Line() > 1 { // v.Pos.Line()==1 in generated wrappers
gc.Warnl(v.Pos, "generated nil check")
}
case ssa.OpARMLoweredZero:
// MOVW.P Rarg2, 4(R1)
// CMP Rarg1, R1
// BLE -2(PC)
// arg1 is the address of the last element to zero
// arg2 is known to be zero
// auxint is alignment
var sz int64
var mov obj.As
switch {
case v.AuxInt%4 == 0:
sz = 4
mov = arm.AMOVW
case v.AuxInt%2 == 0:
sz = 2
mov = arm.AMOVH
default:
sz = 1
mov = arm.AMOVB
}
p := s.Prog(mov)
p.Scond = arm.C_PBIT
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[2].Reg()
p.To.Type = obj.TYPE_MEM
p.To.Reg = arm.REG_R1
p.To.Offset = sz
p2 := s.Prog(arm.ACMP)
p2.From.Type = obj.TYPE_REG
p2.From.Reg = v.Args[1].Reg()
p2.Reg = arm.REG_R1
p3 := s.Prog(arm.ABLE)
p3.To.Type = obj.TYPE_BRANCH
gc.Patch(p3, p)
case ssa.OpARMLoweredMove:
// MOVW.P 4(R1), Rtmp
// MOVW.P Rtmp, 4(R2)
// CMP Rarg2, R1
// BLE -3(PC)
// arg2 is the address of the last element of src
// auxint is alignment
var sz int64
var mov obj.As
switch {
case v.AuxInt%4 == 0:
sz = 4
mov = arm.AMOVW
case v.AuxInt%2 == 0:
sz = 2
mov = arm.AMOVH
default:
sz = 1
mov = arm.AMOVB
}
p := s.Prog(mov)
p.Scond = arm.C_PBIT
p.From.Type = obj.TYPE_MEM
p.From.Reg = arm.REG_R1
p.From.Offset = sz
p.To.Type = obj.TYPE_REG
p.To.Reg = arm.REGTMP
p2 := s.Prog(mov)
p2.Scond = arm.C_PBIT
p2.From.Type = obj.TYPE_REG
p2.From.Reg = arm.REGTMP
p2.To.Type = obj.TYPE_MEM
p2.To.Reg = arm.REG_R2
p2.To.Offset = sz
p3 := s.Prog(arm.ACMP)
p3.From.Type = obj.TYPE_REG
p3.From.Reg = v.Args[2].Reg()
p3.Reg = arm.REG_R1
p4 := s.Prog(arm.ABLE)
p4.To.Type = obj.TYPE_BRANCH
gc.Patch(p4, p)
case ssa.OpARMEqual,
ssa.OpARMNotEqual,
ssa.OpARMLessThan,
ssa.OpARMLessEqual,
ssa.OpARMGreaterThan,
ssa.OpARMGreaterEqual,
ssa.OpARMLessThanU,
ssa.OpARMLessEqualU,
ssa.OpARMGreaterThanU,
ssa.OpARMGreaterEqualU:
// generate boolean values
// use conditional move
p := s.Prog(arm.AMOVW)
p.From.Type = obj.TYPE_CONST
p.From.Offset = 0
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
p = s.Prog(arm.AMOVW)
p.Scond = condBits[v.Op]
p.From.Type = obj.TYPE_CONST
p.From.Offset = 1
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpARMLoweredGetClosurePtr:
// Closure pointer is R7 (arm.REGCTXT).
gc.CheckLoweredGetClosurePtr(v)
case ssa.OpARMLoweredGetCallerSP:
// caller's SP is FixedFrameSize below the address of the first arg
p := s.Prog(arm.AMOVW)
p.From.Type = obj.TYPE_ADDR
p.From.Offset = -gc.Ctxt.FixedFrameSize()
p.From.Name = obj.NAME_PARAM
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpARMLoweredGetCallerPC:
p := s.Prog(obj.AGETCALLERPC)
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpARMFlagConstant:
v.Fatalf("FlagConstant op should never make it to codegen %v", v.LongString())
case ssa.OpARMInvertFlags:
v.Fatalf("InvertFlags should never make it to codegen %v", v.LongString())
case ssa.OpClobber:
// TODO: implement for clobberdead experiment. Nop is ok for now.
default:
v.Fatalf("genValue not implemented: %s", v.LongString())
}
}
var condBits = map[ssa.Op]uint8{
ssa.OpARMEqual: arm.C_SCOND_EQ,
ssa.OpARMNotEqual: arm.C_SCOND_NE,
ssa.OpARMLessThan: arm.C_SCOND_LT,
ssa.OpARMLessThanU: arm.C_SCOND_LO,
ssa.OpARMLessEqual: arm.C_SCOND_LE,
ssa.OpARMLessEqualU: arm.C_SCOND_LS,
ssa.OpARMGreaterThan: arm.C_SCOND_GT,
ssa.OpARMGreaterThanU: arm.C_SCOND_HI,
ssa.OpARMGreaterEqual: arm.C_SCOND_GE,
ssa.OpARMGreaterEqualU: arm.C_SCOND_HS,
}
var blockJump = map[ssa.BlockKind]struct {
asm, invasm obj.As
}{
cmd/compile: ARM comparisons with 0 incorrect on overflow Some ARM rewriting rules convert 'comparing to zero' conditions of if statements to a simplified version utilizing CMN and CMP instructions to branch over condition flags, in order to save one Add or Sub caculation. Such optimizations lead to wrong branching in case an overflow/underflow occurs when executing CMN or CMP. Fix the issue by introducing new block opcodes that don't honor the overflow/underflow flag: Block-Op Meaning ARM condition codes 1. LTnoov less than MI 2. GEnoov greater than or equal PL 3. LEnoov less than or equal MI || EQ 4. GTnoov greater than NEQ & PL The patch also adds a few test cases to cover scenarios that are specific to ARM and fine-tunes the code generation tests for 'x-const'. For more details please refer to the previous fix on 64-bit ARM: https://go-review.googlesource.com/c/go/+/233097 Go1 perf, 'old' is the non-optimized version, that is removing all concerned rewriting rules. name old time/op new time/op delta BinaryTree17-8 7.73s ± 0% 7.81s ± 0% +0.97% (p=0.000 n=7+8) Fannkuch11-8 7.06s ± 0% 7.00s ± 0% -0.83% (p=0.000 n=8+8) FmtFprintfEmpty-8 181ns ± 1% 183ns ± 1% +1.31% (p=0.001 n=8+8) FmtFprintfString-8 319ns ± 1% 325ns ± 2% +1.71% (p=0.009 n=7+8) FmtFprintfInt-8 358ns ± 1% 359ns ± 1% ~ (p=0.293 n=7+7) FmtFprintfIntInt-8 459ns ± 3% 456ns ± 1% ~ (p=0.869 n=8+8) FmtFprintfPrefixedInt-8 535ns ± 4% 538ns ± 4% ~ (p=0.572 n=8+8) FmtFprintfFloat-8 1.01µs ± 2% 1.01µs ± 2% ~ (p=0.625 n=8+8) FmtManyArgs-8 1.93µs ± 2% 1.93µs ± 1% ~ (p=0.979 n=8+7) GobDecode-8 16.1ms ± 1% 16.5ms ± 1% +2.32% (p=0.000 n=8+8) GobEncode-8 15.9ms ± 0% 15.8ms ± 1% -1.00% (p=0.000 n=8+7) Gzip-8 690ms ± 1% 670ms ± 0% -2.90% (p=0.000 n=8+8) Gunzip-8 109ms ± 1% 109ms ± 1% ~ (p=0.694 n=7+8) HTTPClientServer-8 149µs ± 3% 146µs ± 2% -1.70% (p=0.028 n=8+8) JSONEncode-8 50.5ms ± 1% 49.2ms ± 0% -2.60% (p=0.001 n=7+7) JSONDecode-8 135ms ± 2% 137ms ± 1% ~ (p=0.054 n=8+7) Mandelbrot200-8 951ms ± 0% 952ms ± 0% ~ (p=0.852 n=6+8) GoParse-8 9.47ms ± 1% 9.66ms ± 1% +2.01% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 288ns ± 2% 277ns ± 2% -3.61% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 1.66µs ± 1% 1.69µs ± 2% +2.21% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 334ns ± 1% 305ns ± 2% -8.86% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 2.14µs ± 2% 2.15µs ± 0% ~ (p=0.099 n=8+8) RegexpMatchMedium_32-8 13.3ns ± 1% 13.3ns ± 0% ~ (p=1.000 n=7+7) RegexpMatchMedium_1K-8 81.1µs ± 3% 80.7µs ± 1% ~ (p=0.955 n=7+8) RegexpMatchHard_32-8 4.26µs ± 0% 4.26µs ± 0% ~ (p=0.933 n=7+8) RegexpMatchHard_1K-8 124µs ± 0% 124µs ± 0% +0.31% (p=0.000 n=8+8) Revcomp-8 14.7ms ± 2% 14.5ms ± 1% -1.66% (p=0.003 n=8+8) Template-8 197ms ± 2% 200ms ± 3% +1.62% (p=0.021 n=8+8) TimeParse-8 1.33µs ± 1% 1.30µs ± 1% -1.86% (p=0.002 n=8+8) TimeFormat-8 3.04µs ± 1% 3.02µs ± 0% -0.60% (p=0.000 n=8+8) name old speed new speed delta GobDecode-8 47.6MB/s ± 1% 46.5MB/s ± 1% -2.28% (p=0.000 n=8+8) GobEncode-8 48.1MB/s ± 0% 48.6MB/s ± 1% +1.02% (p=0.000 n=8+7) Gzip-8 28.1MB/s ± 1% 29.0MB/s ± 0% +2.97% (p=0.000 n=8+8) Gunzip-8 178MB/s ± 1% 179MB/s ± 2% ~ (p=0.694 n=7+8) JSONEncode-8 38.4MB/s ± 1% 39.4MB/s ± 0% +2.67% (p=0.001 n=7+7) JSONDecode-8 14.3MB/s ± 2% 14.2MB/s ± 1% -0.81% (p=0.043 n=8+7) GoParse-8 6.12MB/s ± 1% 5.99MB/s ± 1% -2.00% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 111MB/s ± 2% 115MB/s ± 2% +3.77% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 618MB/s ± 1% 604MB/s ± 2% -2.16% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 95.7MB/s ± 1% 105.1MB/s ± 2% +9.76% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 479MB/s ± 2% 477MB/s ± 0% ~ (p=0.105 n=8+8) RegexpMatchMedium_32-8 75.2MB/s ± 1% 75.2MB/s ± 0% ~ (p=0.247 n=7+7) RegexpMatchMedium_1K-8 12.6MB/s ± 3% 12.7MB/s ± 1% ~ (p=0.538 n=7+8) RegexpMatchHard_32-8 7.52MB/s ± 0% 7.52MB/s ± 0% ~ (p=0.968 n=7+8) RegexpMatchHard_1K-8 8.26MB/s ± 0% 8.24MB/s ± 0% -0.30% (p=0.001 n=8+8) Revcomp-8 173MB/s ± 2% 176MB/s ± 1% +1.68% (p=0.003 n=8+8) Template-8 9.85MB/s ± 2% 9.69MB/s ± 3% -1.59% (p=0.021 n=8+8) Fixes #39303 Updates #38740 Change-Id: I0a5f87bfda679f66414c0041ace2ca2e28363f36 Reviewed-on: https://go-review.googlesource.com/c/go/+/236637 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-06-01 11:01:14 +00:00
ssa.BlockARMEQ: {arm.ABEQ, arm.ABNE},
ssa.BlockARMNE: {arm.ABNE, arm.ABEQ},
ssa.BlockARMLT: {arm.ABLT, arm.ABGE},
ssa.BlockARMGE: {arm.ABGE, arm.ABLT},
ssa.BlockARMLE: {arm.ABLE, arm.ABGT},
ssa.BlockARMGT: {arm.ABGT, arm.ABLE},
ssa.BlockARMULT: {arm.ABLO, arm.ABHS},
ssa.BlockARMUGE: {arm.ABHS, arm.ABLO},
ssa.BlockARMUGT: {arm.ABHI, arm.ABLS},
ssa.BlockARMULE: {arm.ABLS, arm.ABHI},
ssa.BlockARMLTnoov: {arm.ABMI, arm.ABPL},
ssa.BlockARMGEnoov: {arm.ABPL, arm.ABMI},
}
// To model a 'LEnoov' ('<=' without overflow checking) branching
var leJumps = [2][2]gc.IndexJump{
{{Jump: arm.ABEQ, Index: 0}, {Jump: arm.ABPL, Index: 1}}, // next == b.Succs[0]
{{Jump: arm.ABMI, Index: 0}, {Jump: arm.ABEQ, Index: 0}}, // next == b.Succs[1]
}
// To model a 'GTnoov' ('>' without overflow checking) branching
var gtJumps = [2][2]gc.IndexJump{
{{Jump: arm.ABMI, Index: 1}, {Jump: arm.ABEQ, Index: 1}}, // next == b.Succs[0]
{{Jump: arm.ABEQ, Index: 1}, {Jump: arm.ABPL, Index: 0}}, // next == b.Succs[1]
}
func ssaGenBlock(s *gc.SSAGenState, b, next *ssa.Block) {
switch b.Kind {
case ssa.BlockPlain:
if b.Succs[0].Block() != next {
p := s.Prog(obj.AJMP)
p.To.Type = obj.TYPE_BRANCH
s.Branches = append(s.Branches, gc.Branch{P: p, B: b.Succs[0].Block()})
}
case ssa.BlockDefer:
// defer returns in R0:
// 0 if we should continue executing
// 1 if we should jump to deferreturn call
p := s.Prog(arm.ACMP)
p.From.Type = obj.TYPE_CONST
p.From.Offset = 0
p.Reg = arm.REG_R0
p = s.Prog(arm.ABNE)
p.To.Type = obj.TYPE_BRANCH
s.Branches = append(s.Branches, gc.Branch{P: p, B: b.Succs[1].Block()})
if b.Succs[0].Block() != next {
p := s.Prog(obj.AJMP)
p.To.Type = obj.TYPE_BRANCH
s.Branches = append(s.Branches, gc.Branch{P: p, B: b.Succs[0].Block()})
}
case ssa.BlockExit:
case ssa.BlockRet:
s.Prog(obj.ARET)
case ssa.BlockRetJmp:
p := s.Prog(obj.ARET)
p.To.Type = obj.TYPE_MEM
p.To.Name = obj.NAME_EXTERN
p.To.Sym = b.Aux.(*obj.LSym)
case ssa.BlockARMEQ, ssa.BlockARMNE,
ssa.BlockARMLT, ssa.BlockARMGE,
ssa.BlockARMLE, ssa.BlockARMGT,
ssa.BlockARMULT, ssa.BlockARMUGT,
cmd/compile: ARM comparisons with 0 incorrect on overflow Some ARM rewriting rules convert 'comparing to zero' conditions of if statements to a simplified version utilizing CMN and CMP instructions to branch over condition flags, in order to save one Add or Sub caculation. Such optimizations lead to wrong branching in case an overflow/underflow occurs when executing CMN or CMP. Fix the issue by introducing new block opcodes that don't honor the overflow/underflow flag: Block-Op Meaning ARM condition codes 1. LTnoov less than MI 2. GEnoov greater than or equal PL 3. LEnoov less than or equal MI || EQ 4. GTnoov greater than NEQ & PL The patch also adds a few test cases to cover scenarios that are specific to ARM and fine-tunes the code generation tests for 'x-const'. For more details please refer to the previous fix on 64-bit ARM: https://go-review.googlesource.com/c/go/+/233097 Go1 perf, 'old' is the non-optimized version, that is removing all concerned rewriting rules. name old time/op new time/op delta BinaryTree17-8 7.73s ± 0% 7.81s ± 0% +0.97% (p=0.000 n=7+8) Fannkuch11-8 7.06s ± 0% 7.00s ± 0% -0.83% (p=0.000 n=8+8) FmtFprintfEmpty-8 181ns ± 1% 183ns ± 1% +1.31% (p=0.001 n=8+8) FmtFprintfString-8 319ns ± 1% 325ns ± 2% +1.71% (p=0.009 n=7+8) FmtFprintfInt-8 358ns ± 1% 359ns ± 1% ~ (p=0.293 n=7+7) FmtFprintfIntInt-8 459ns ± 3% 456ns ± 1% ~ (p=0.869 n=8+8) FmtFprintfPrefixedInt-8 535ns ± 4% 538ns ± 4% ~ (p=0.572 n=8+8) FmtFprintfFloat-8 1.01µs ± 2% 1.01µs ± 2% ~ (p=0.625 n=8+8) FmtManyArgs-8 1.93µs ± 2% 1.93µs ± 1% ~ (p=0.979 n=8+7) GobDecode-8 16.1ms ± 1% 16.5ms ± 1% +2.32% (p=0.000 n=8+8) GobEncode-8 15.9ms ± 0% 15.8ms ± 1% -1.00% (p=0.000 n=8+7) Gzip-8 690ms ± 1% 670ms ± 0% -2.90% (p=0.000 n=8+8) Gunzip-8 109ms ± 1% 109ms ± 1% ~ (p=0.694 n=7+8) HTTPClientServer-8 149µs ± 3% 146µs ± 2% -1.70% (p=0.028 n=8+8) JSONEncode-8 50.5ms ± 1% 49.2ms ± 0% -2.60% (p=0.001 n=7+7) JSONDecode-8 135ms ± 2% 137ms ± 1% ~ (p=0.054 n=8+7) Mandelbrot200-8 951ms ± 0% 952ms ± 0% ~ (p=0.852 n=6+8) GoParse-8 9.47ms ± 1% 9.66ms ± 1% +2.01% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 288ns ± 2% 277ns ± 2% -3.61% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 1.66µs ± 1% 1.69µs ± 2% +2.21% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 334ns ± 1% 305ns ± 2% -8.86% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 2.14µs ± 2% 2.15µs ± 0% ~ (p=0.099 n=8+8) RegexpMatchMedium_32-8 13.3ns ± 1% 13.3ns ± 0% ~ (p=1.000 n=7+7) RegexpMatchMedium_1K-8 81.1µs ± 3% 80.7µs ± 1% ~ (p=0.955 n=7+8) RegexpMatchHard_32-8 4.26µs ± 0% 4.26µs ± 0% ~ (p=0.933 n=7+8) RegexpMatchHard_1K-8 124µs ± 0% 124µs ± 0% +0.31% (p=0.000 n=8+8) Revcomp-8 14.7ms ± 2% 14.5ms ± 1% -1.66% (p=0.003 n=8+8) Template-8 197ms ± 2% 200ms ± 3% +1.62% (p=0.021 n=8+8) TimeParse-8 1.33µs ± 1% 1.30µs ± 1% -1.86% (p=0.002 n=8+8) TimeFormat-8 3.04µs ± 1% 3.02µs ± 0% -0.60% (p=0.000 n=8+8) name old speed new speed delta GobDecode-8 47.6MB/s ± 1% 46.5MB/s ± 1% -2.28% (p=0.000 n=8+8) GobEncode-8 48.1MB/s ± 0% 48.6MB/s ± 1% +1.02% (p=0.000 n=8+7) Gzip-8 28.1MB/s ± 1% 29.0MB/s ± 0% +2.97% (p=0.000 n=8+8) Gunzip-8 178MB/s ± 1% 179MB/s ± 2% ~ (p=0.694 n=7+8) JSONEncode-8 38.4MB/s ± 1% 39.4MB/s ± 0% +2.67% (p=0.001 n=7+7) JSONDecode-8 14.3MB/s ± 2% 14.2MB/s ± 1% -0.81% (p=0.043 n=8+7) GoParse-8 6.12MB/s ± 1% 5.99MB/s ± 1% -2.00% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 111MB/s ± 2% 115MB/s ± 2% +3.77% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 618MB/s ± 1% 604MB/s ± 2% -2.16% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 95.7MB/s ± 1% 105.1MB/s ± 2% +9.76% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 479MB/s ± 2% 477MB/s ± 0% ~ (p=0.105 n=8+8) RegexpMatchMedium_32-8 75.2MB/s ± 1% 75.2MB/s ± 0% ~ (p=0.247 n=7+7) RegexpMatchMedium_1K-8 12.6MB/s ± 3% 12.7MB/s ± 1% ~ (p=0.538 n=7+8) RegexpMatchHard_32-8 7.52MB/s ± 0% 7.52MB/s ± 0% ~ (p=0.968 n=7+8) RegexpMatchHard_1K-8 8.26MB/s ± 0% 8.24MB/s ± 0% -0.30% (p=0.001 n=8+8) Revcomp-8 173MB/s ± 2% 176MB/s ± 1% +1.68% (p=0.003 n=8+8) Template-8 9.85MB/s ± 2% 9.69MB/s ± 3% -1.59% (p=0.021 n=8+8) Fixes #39303 Updates #38740 Change-Id: I0a5f87bfda679f66414c0041ace2ca2e28363f36 Reviewed-on: https://go-review.googlesource.com/c/go/+/236637 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-06-01 11:01:14 +00:00
ssa.BlockARMULE, ssa.BlockARMUGE,
ssa.BlockARMLTnoov, ssa.BlockARMGEnoov:
jmp := blockJump[b.Kind]
switch next {
case b.Succs[0].Block():
s.Br(jmp.invasm, b.Succs[1].Block())
case b.Succs[1].Block():
s.Br(jmp.asm, b.Succs[0].Block())
default:
if b.Likely != ssa.BranchUnlikely {
s.Br(jmp.asm, b.Succs[0].Block())
s.Br(obj.AJMP, b.Succs[1].Block())
} else {
s.Br(jmp.invasm, b.Succs[1].Block())
s.Br(obj.AJMP, b.Succs[0].Block())
}
}
cmd/compile: ARM comparisons with 0 incorrect on overflow Some ARM rewriting rules convert 'comparing to zero' conditions of if statements to a simplified version utilizing CMN and CMP instructions to branch over condition flags, in order to save one Add or Sub caculation. Such optimizations lead to wrong branching in case an overflow/underflow occurs when executing CMN or CMP. Fix the issue by introducing new block opcodes that don't honor the overflow/underflow flag: Block-Op Meaning ARM condition codes 1. LTnoov less than MI 2. GEnoov greater than or equal PL 3. LEnoov less than or equal MI || EQ 4. GTnoov greater than NEQ & PL The patch also adds a few test cases to cover scenarios that are specific to ARM and fine-tunes the code generation tests for 'x-const'. For more details please refer to the previous fix on 64-bit ARM: https://go-review.googlesource.com/c/go/+/233097 Go1 perf, 'old' is the non-optimized version, that is removing all concerned rewriting rules. name old time/op new time/op delta BinaryTree17-8 7.73s ± 0% 7.81s ± 0% +0.97% (p=0.000 n=7+8) Fannkuch11-8 7.06s ± 0% 7.00s ± 0% -0.83% (p=0.000 n=8+8) FmtFprintfEmpty-8 181ns ± 1% 183ns ± 1% +1.31% (p=0.001 n=8+8) FmtFprintfString-8 319ns ± 1% 325ns ± 2% +1.71% (p=0.009 n=7+8) FmtFprintfInt-8 358ns ± 1% 359ns ± 1% ~ (p=0.293 n=7+7) FmtFprintfIntInt-8 459ns ± 3% 456ns ± 1% ~ (p=0.869 n=8+8) FmtFprintfPrefixedInt-8 535ns ± 4% 538ns ± 4% ~ (p=0.572 n=8+8) FmtFprintfFloat-8 1.01µs ± 2% 1.01µs ± 2% ~ (p=0.625 n=8+8) FmtManyArgs-8 1.93µs ± 2% 1.93µs ± 1% ~ (p=0.979 n=8+7) GobDecode-8 16.1ms ± 1% 16.5ms ± 1% +2.32% (p=0.000 n=8+8) GobEncode-8 15.9ms ± 0% 15.8ms ± 1% -1.00% (p=0.000 n=8+7) Gzip-8 690ms ± 1% 670ms ± 0% -2.90% (p=0.000 n=8+8) Gunzip-8 109ms ± 1% 109ms ± 1% ~ (p=0.694 n=7+8) HTTPClientServer-8 149µs ± 3% 146µs ± 2% -1.70% (p=0.028 n=8+8) JSONEncode-8 50.5ms ± 1% 49.2ms ± 0% -2.60% (p=0.001 n=7+7) JSONDecode-8 135ms ± 2% 137ms ± 1% ~ (p=0.054 n=8+7) Mandelbrot200-8 951ms ± 0% 952ms ± 0% ~ (p=0.852 n=6+8) GoParse-8 9.47ms ± 1% 9.66ms ± 1% +2.01% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 288ns ± 2% 277ns ± 2% -3.61% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 1.66µs ± 1% 1.69µs ± 2% +2.21% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 334ns ± 1% 305ns ± 2% -8.86% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 2.14µs ± 2% 2.15µs ± 0% ~ (p=0.099 n=8+8) RegexpMatchMedium_32-8 13.3ns ± 1% 13.3ns ± 0% ~ (p=1.000 n=7+7) RegexpMatchMedium_1K-8 81.1µs ± 3% 80.7µs ± 1% ~ (p=0.955 n=7+8) RegexpMatchHard_32-8 4.26µs ± 0% 4.26µs ± 0% ~ (p=0.933 n=7+8) RegexpMatchHard_1K-8 124µs ± 0% 124µs ± 0% +0.31% (p=0.000 n=8+8) Revcomp-8 14.7ms ± 2% 14.5ms ± 1% -1.66% (p=0.003 n=8+8) Template-8 197ms ± 2% 200ms ± 3% +1.62% (p=0.021 n=8+8) TimeParse-8 1.33µs ± 1% 1.30µs ± 1% -1.86% (p=0.002 n=8+8) TimeFormat-8 3.04µs ± 1% 3.02µs ± 0% -0.60% (p=0.000 n=8+8) name old speed new speed delta GobDecode-8 47.6MB/s ± 1% 46.5MB/s ± 1% -2.28% (p=0.000 n=8+8) GobEncode-8 48.1MB/s ± 0% 48.6MB/s ± 1% +1.02% (p=0.000 n=8+7) Gzip-8 28.1MB/s ± 1% 29.0MB/s ± 0% +2.97% (p=0.000 n=8+8) Gunzip-8 178MB/s ± 1% 179MB/s ± 2% ~ (p=0.694 n=7+8) JSONEncode-8 38.4MB/s ± 1% 39.4MB/s ± 0% +2.67% (p=0.001 n=7+7) JSONDecode-8 14.3MB/s ± 2% 14.2MB/s ± 1% -0.81% (p=0.043 n=8+7) GoParse-8 6.12MB/s ± 1% 5.99MB/s ± 1% -2.00% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 111MB/s ± 2% 115MB/s ± 2% +3.77% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 618MB/s ± 1% 604MB/s ± 2% -2.16% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 95.7MB/s ± 1% 105.1MB/s ± 2% +9.76% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 479MB/s ± 2% 477MB/s ± 0% ~ (p=0.105 n=8+8) RegexpMatchMedium_32-8 75.2MB/s ± 1% 75.2MB/s ± 0% ~ (p=0.247 n=7+7) RegexpMatchMedium_1K-8 12.6MB/s ± 3% 12.7MB/s ± 1% ~ (p=0.538 n=7+8) RegexpMatchHard_32-8 7.52MB/s ± 0% 7.52MB/s ± 0% ~ (p=0.968 n=7+8) RegexpMatchHard_1K-8 8.26MB/s ± 0% 8.24MB/s ± 0% -0.30% (p=0.001 n=8+8) Revcomp-8 173MB/s ± 2% 176MB/s ± 1% +1.68% (p=0.003 n=8+8) Template-8 9.85MB/s ± 2% 9.69MB/s ± 3% -1.59% (p=0.021 n=8+8) Fixes #39303 Updates #38740 Change-Id: I0a5f87bfda679f66414c0041ace2ca2e28363f36 Reviewed-on: https://go-review.googlesource.com/c/go/+/236637 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-06-01 11:01:14 +00:00
case ssa.BlockARMLEnoov:
s.CombJump(b, next, &leJumps)
case ssa.BlockARMGTnoov:
s.CombJump(b, next, &gtJumps)
default:
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
b.Fatalf("branch not implemented: %s", b.LongString())
}
}