go/src/cmd/compile/internal/arm/ssa.go

1115 lines
32 KiB
Go
Raw Normal View History

// Copyright 2016 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package arm
import (
"fmt"
"internal/buildcfg"
"math"
cmd/compile: optimize arm's bit operation BFC (Bit Field Clear) was introduced in ARMv7, which can simplify ANDconst and BICconst. And this CL implements that optimization. 1. The total size of pkg/android_arm decreases about 3KB, excluding cmd/compile/. 2. There is no regression in the go1 benchmark result, and some cases (FmtFprintfEmpty-4 and RegexpMatchMedium_32-4) even get slight improvement. name old time/op new time/op delta BinaryTree17-4 25.3s ± 1% 25.2s ± 1% ~ (p=0.072 n=30+29) Fannkuch11-4 13.3s ± 0% 13.3s ± 0% +0.13% (p=0.000 n=30+26) FmtFprintfEmpty-4 407ns ± 0% 394ns ± 0% -3.19% (p=0.000 n=26+28) FmtFprintfString-4 664ns ± 0% 662ns ± 0% -0.22% (p=0.000 n=30+30) FmtFprintfInt-4 712ns ± 0% 706ns ± 0% -0.79% (p=0.000 n=30+30) FmtFprintfIntInt-4 1.06µs ± 0% 1.05µs ± 0% -0.38% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 1.16µs ± 0% 1.16µs ± 0% -0.13% (p=0.000 n=30+29) FmtFprintfFloat-4 2.24µs ± 0% 2.23µs ± 0% -0.51% (p=0.000 n=29+21) FmtManyArgs-4 4.09µs ± 0% 4.06µs ± 0% -0.83% (p=0.000 n=28+30) GobDecode-4 55.0ms ± 5% 55.4ms ± 5% ~ (p=0.307 n=30+30) GobEncode-4 51.2ms ± 1% 51.9ms ± 1% +1.23% (p=0.000 n=29+30) Gzip-4 2.64s ± 0% 2.60s ± 0% -1.35% (p=0.000 n=30+29) Gunzip-4 309ms ± 0% 308ms ± 0% -0.27% (p=0.000 n=30+30) HTTPClientServer-4 1.03ms ± 5% 1.02ms ± 4% ~ (p=0.117 n=30+29) JSONEncode-4 101ms ± 2% 101ms ± 2% ~ (p=0.338 n=29+29) JSONDecode-4 383ms ± 2% 382ms ± 2% ~ (p=0.751 n=26+30) Mandelbrot200-4 18.4ms ± 0% 18.4ms ± 0% -0.10% (p=0.000 n=29+29) GoParse-4 22.6ms ± 0% 22.5ms ± 0% -0.39% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 761ns ± 0% 750ns ± 0% -1.47% (p=0.000 n=26+29) RegexpMatchEasy0_1K-4 4.33µs ± 0% 4.34µs ± 0% +0.27% (p=0.000 n=25+28) RegexpMatchEasy1_32-4 809ns ± 0% 795ns ± 0% -1.74% (p=0.000 n=27+25) RegexpMatchEasy1_1K-4 5.54µs ± 0% 5.53µs ± 0% -0.18% (p=0.000 n=29+29) RegexpMatchMedium_32-4 1.11µs ± 0% 1.08µs ± 0% -2.78% (p=0.000 n=27+29) RegexpMatchMedium_1K-4 255µs ± 0% 255µs ± 0% -0.02% (p=0.029 n=30+30) RegexpMatchHard_32-4 14.7µs ± 0% 14.7µs ± 0% -0.28% (p=0.000 n=30+29) RegexpMatchHard_1K-4 439µs ± 0% 439µs ± 0% ~ (p=0.907 n=23+27) Revcomp-4 41.9ms ± 1% 41.9ms ± 1% ~ (p=0.230 n=28+30) Template-4 522ms ± 1% 528ms ± 1% +1.25% (p=0.000 n=30+30) TimeParse-4 3.34µs ± 0% 3.35µs ± 0% +0.23% (p=0.000 n=30+27) TimeFormat-4 6.06µs ± 0% 6.13µs ± 0% +1.08% (p=0.000 n=29+29) [Geo mean] 384µs 382µs -0.37% name old speed new speed delta GobDecode-4 14.0MB/s ± 5% 13.9MB/s ± 5% ~ (p=0.308 n=30+30) GobEncode-4 15.0MB/s ± 1% 14.8MB/s ± 1% -1.22% (p=0.000 n=29+30) Gzip-4 7.36MB/s ± 0% 7.46MB/s ± 0% +1.35% (p=0.000 n=30+30) Gunzip-4 62.8MB/s ± 0% 63.0MB/s ± 0% +0.27% (p=0.000 n=30+30) JSONEncode-4 19.2MB/s ± 2% 19.2MB/s ± 2% ~ (p=0.312 n=29+29) JSONDecode-4 5.05MB/s ± 3% 5.08MB/s ± 2% ~ (p=0.356 n=29+30) GoParse-4 2.56MB/s ± 0% 2.57MB/s ± 0% +0.39% (p=0.000 n=23+27) RegexpMatchEasy0_32-4 42.0MB/s ± 0% 42.6MB/s ± 0% +1.50% (p=0.000 n=26+28) RegexpMatchEasy0_1K-4 236MB/s ± 0% 236MB/s ± 0% -0.27% (p=0.000 n=25+28) RegexpMatchEasy1_32-4 39.6MB/s ± 0% 40.2MB/s ± 0% +1.73% (p=0.000 n=27+27) RegexpMatchEasy1_1K-4 185MB/s ± 0% 185MB/s ± 0% +0.18% (p=0.000 n=29+29) RegexpMatchMedium_32-4 900kB/s ± 0% 920kB/s ± 0% +2.22% (p=0.000 n=29+29) RegexpMatchMedium_1K-4 4.02MB/s ± 0% 4.02MB/s ± 0% +0.07% (p=0.004 n=30+27) RegexpMatchHard_32-4 2.17MB/s ± 0% 2.18MB/s ± 0% +0.46% (p=0.000 n=30+26) RegexpMatchHard_1K-4 2.33MB/s ± 0% 2.33MB/s ± 0% ~ (all equal) Revcomp-4 60.6MB/s ± 1% 60.7MB/s ± 1% ~ (p=0.207 n=28+30) Template-4 3.72MB/s ± 1% 3.67MB/s ± 1% -1.23% (p=0.000 n=30+30) [Geo mean] 12.9MB/s 12.9MB/s +0.29% Change-Id: I07f497f8bb476c950dc555491d00c9066fb64a4e Reviewed-on: https://go-review.googlesource.com/134232 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-10 08:29:52 +00:00
"math/bits"
"cmd/compile/internal/base"
[dev.regabi] cmd/compile: introduce cmd/compile/internal/ir [generated] If we want to break up package gc at all, we will need to move the compiler IR it defines into a separate package that can be imported by packages that gc itself imports. This CL does that. It also removes the TINT8 etc aliases so that all code is clear about which package things are coming from. This CL is automatically generated by the script below. See the comments in the script for details about the changes. [git-generate] cd src/cmd/compile/internal/gc rf ' # These names were never fully qualified # when the types package was added. # Do it now, to avoid confusion about where they live. inline -rm \ Txxx \ TINT8 \ TUINT8 \ TINT16 \ TUINT16 \ TINT32 \ TUINT32 \ TINT64 \ TUINT64 \ TINT \ TUINT \ TUINTPTR \ TCOMPLEX64 \ TCOMPLEX128 \ TFLOAT32 \ TFLOAT64 \ TBOOL \ TPTR \ TFUNC \ TSLICE \ TARRAY \ TSTRUCT \ TCHAN \ TMAP \ TINTER \ TFORW \ TANY \ TSTRING \ TUNSAFEPTR \ TIDEAL \ TNIL \ TBLANK \ TFUNCARGS \ TCHANARGS \ NTYPE \ BADWIDTH # esc.go and escape.go do not need to be split. # Append esc.go onto the end of escape.go. mv esc.go escape.go # Pull out the type format installation from func Main, # so it can be carried into package ir. mv Main:/Sconv.=/-0,/TypeLinkSym/-1 InstallTypeFormats # Names that need to be exported for use by code left in gc. mv Isconst IsConst mv asNode AsNode mv asNodes AsNodes mv asTypesNode AsTypesNode mv basicnames BasicTypeNames mv builtinpkg BuiltinPkg mv consttype ConstType mv dumplist DumpList mv fdumplist FDumpList mv fmtMode FmtMode mv goopnames OpNames mv inspect Inspect mv inspectList InspectList mv localpkg LocalPkg mv nblank BlankNode mv numImport NumImport mv opprec OpPrec mv origSym OrigSym mv stmtwithinit StmtWithInit mv dump DumpAny mv fdump FDumpAny mv nod Nod mv nodl NodAt mv newname NewName mv newnamel NewNameAt mv assertRepresents AssertValidTypeForConst mv represents ValidTypeForConst mv nodlit NewLiteral # Types and fields that need to be exported for use by gc. mv nowritebarrierrecCallSym SymAndPos mv SymAndPos.lineno SymAndPos.Pos mv SymAndPos.target SymAndPos.Sym mv Func.lsym Func.LSym mv Func.setWBPos Func.SetWBPos mv Func.numReturns Func.NumReturns mv Func.numDefers Func.NumDefers mv Func.nwbrCalls Func.NWBRCalls # initLSym is an algorithm left behind in gc, # not an operation on Func itself. mv Func.initLSym initLSym mv nodeQueue NodeQueue mv NodeQueue.empty NodeQueue.Empty mv NodeQueue.popLeft NodeQueue.PopLeft mv NodeQueue.pushRight NodeQueue.PushRight # Many methods on Node are actually algorithms that # would apply to any node implementation. # Those become plain functions. mv Node.funcname FuncName mv Node.isBlank IsBlank mv Node.isGoConst isGoConst mv Node.isNil IsNil mv Node.isParamHeapCopy isParamHeapCopy mv Node.isParamStackCopy isParamStackCopy mv Node.isSimpleName isSimpleName mv Node.mayBeShared MayBeShared mv Node.pkgFuncName PkgFuncName mv Node.backingArrayPtrLen backingArrayPtrLen mv Node.isterminating isTermNode mv Node.labeledControl labeledControl mv Nodes.isterminating isTermNodes mv Nodes.sigerr fmtSignature mv Node.MethodName methodExprName mv Node.MethodFunc methodExprFunc mv Node.IsMethod IsMethod # Every node will need to implement RawCopy; # Copy and SepCopy algorithms will use it. mv Node.rawcopy Node.RawCopy mv Node.copy Copy mv Node.sepcopy SepCopy # Extract Node.Format method body into func FmtNode, # but leave method wrapper behind. mv Node.Format:0,$ FmtNode # Formatting helpers that will apply to all node implementations. mv Node.Line Line mv Node.exprfmt exprFmt mv Node.jconv jconvFmt mv Node.modeString modeString mv Node.nconv nconvFmt mv Node.nodedump nodeDumpFmt mv Node.nodefmt nodeFmt mv Node.stmtfmt stmtFmt # Constant support needed for code moving to ir. mv okforconst OKForConst mv vconv FmtConst mv int64Val Int64Val mv float64Val Float64Val mv Node.ValueInterface ConstValue # Organize code into files. mv LocalPkg BuiltinPkg ir.go mv NumImport InstallTypeFormats Line fmt.go mv syntax.go Nod NodAt NewNameAt Class Pxxx PragmaFlag Nointerface SymAndPos \ AsNode AsTypesNode BlankNode OrigSym \ Node.SliceBounds Node.SetSliceBounds Op.IsSlice3 \ IsConst Node.Int64Val Node.CanInt64 Node.Uint64Val Node.BoolVal Node.StringVal \ Node.RawCopy SepCopy Copy \ IsNil IsBlank IsMethod \ Node.Typ Node.StorageClass node.go mv ConstType ConstValue Int64Val Float64Val AssertValidTypeForConst ValidTypeForConst NewLiteral idealType OKForConst val.go # Move files to new ir package. mv bitset.go class_string.go dump.go fmt.go \ ir.go node.go op_string.go val.go \ sizeof_test.go cmd/compile/internal/ir ' : # fix mkbuiltin.go to generate the changes made to builtin.go during rf sed -i '' ' s/\[T/[types.T/g s/\*Node/*ir.Node/g /internal\/types/c \ fmt.Fprintln(&b, `import (`) \ fmt.Fprintln(&b, ` "cmd/compile/internal/ir"`) \ fmt.Fprintln(&b, ` "cmd/compile/internal/types"`) \ fmt.Fprintln(&b, `)`) ' mkbuiltin.go gofmt -w mkbuiltin.go : # update cmd/dist to add internal/ir cd ../../../dist sed -i '' '/compile.internal.gc/a\ "cmd/compile/internal/ir", ' buildtool.go gofmt -w buildtool.go : # update cmd/compile TestFormats cd ../.. go install std cmd cd cmd/compile go test -u || go test # first one updates but fails; second passes Change-Id: I5f7caf6b20629b51970279e81231a3574d5b51db Reviewed-on: https://go-review.googlesource.com/c/go/+/273008 Trust: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-19 21:09:22 -05:00
"cmd/compile/internal/ir"
"cmd/compile/internal/logopt"
"cmd/compile/internal/ssa"
[dev.regabi] cmd/compile: split out package ssagen [generated] [git-generate] cd src/cmd/compile/internal/gc rf ' # maxOpenDefers is declared in ssa.go but used only by walk. mv maxOpenDefers walk.go # gc.Arch -> ssagen.Arch # It is not as nice but will do for now. mv Arch ArchInfo mv thearch Arch mv Arch ArchInfo arch.go # Pull dwarf out of pgen.go. mv debuginfo declPos createDwarfVars preInliningDcls \ createSimpleVars createSimpleVar \ createComplexVars createComplexVar \ dwarf.go # Pull high-level compilation out of pgen.go, # leaving only the SSA code. mv compilequeue funccompile compile compilenow \ compileFunctions isInlinableButNotInlined \ initLSym \ compile.go mv BoundsCheckFunc GCWriteBarrierReg ssa.go mv largeStack largeStackFrames CheckLargeStacks pgen.go # All that is left in dcl.go is the nowritebarrierrecCheck mv dcl.go nowb.go # Export API and unexport non-API. mv initssaconfig InitConfig mv isIntrinsicCall IsIntrinsicCall mv ssaDumpInline DumpInline mv initSSATables InitTables mv initSSAEnv InitEnv mv compileSSA Compile mv stackOffset StackOffset mv canSSAType TypeOK mv SSAGenState State mv FwdRefAux fwdRefAux mv cgoSymABIs CgoSymABIs mv readSymABIs ReadSymABIs mv initLSym InitLSym mv useABIWrapGen symabiDefs CgoSymABIs ReadSymABIs InitLSym selectLSym makeABIWrapper setupTextLSym abi.go mv arch.go abi.go nowb.go phi.go pgen.go pgen_test.go ssa.go cmd/compile/internal/ssagen ' rm go.go gsubr.go Change-Id: I47fad6cbf1d1e583fd9139003a08401d7cd048a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/279476 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:57:10 -05:00
"cmd/compile/internal/ssagen"
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
"cmd/compile/internal/types"
"cmd/internal/obj"
"cmd/internal/obj/arm"
"internal/abi"
)
// loadByType returns the load instruction of the given type.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func loadByType(t *types.Type) obj.As {
if t.IsFloat() {
switch t.Size() {
case 4:
return arm.AMOVF
case 8:
return arm.AMOVD
}
} else {
switch t.Size() {
case 1:
if t.IsSigned() {
return arm.AMOVB
} else {
return arm.AMOVBU
}
case 2:
if t.IsSigned() {
return arm.AMOVH
} else {
return arm.AMOVHU
}
case 4:
return arm.AMOVW
}
}
panic("bad load type")
}
// storeByType returns the store instruction of the given type.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func storeByType(t *types.Type) obj.As {
if t.IsFloat() {
switch t.Size() {
case 4:
return arm.AMOVF
case 8:
return arm.AMOVD
}
} else {
switch t.Size() {
case 1:
return arm.AMOVB
case 2:
return arm.AMOVH
case 4:
return arm.AMOVW
}
}
panic("bad store type")
}
// shift type is used as Offset in obj.TYPE_SHIFT operands to encode shifted register operands.
type shift int64
// copied from ../../../internal/obj/util.go:/TYPE_SHIFT
func (v shift) String() string {
op := "<<>>->@>"[((v>>5)&3)<<1:]
if v&(1<<4) != 0 {
// register shift
return fmt.Sprintf("R%d%c%cR%d", v&15, op[0], op[1], (v>>8)&15)
} else {
// constant shift
return fmt.Sprintf("R%d%c%c%d", v&15, op[0], op[1], (v>>7)&31)
}
}
// makeshift encodes a register shifted by a constant.
func makeshift(v *ssa.Value, reg int16, typ int64, s int64) shift {
if s < 0 || s >= 32 {
v.Fatalf("shift out of range: %d", s)
}
return shift(int64(reg&0xf) | typ | (s&31)<<7)
}
// genshift generates a Prog for r = r0 op (r1 shifted by n).
func genshift(s *ssagen.State, v *ssa.Value, as obj.As, r0, r1, r int16, typ int64, n int64) *obj.Prog {
p := s.Prog(as)
p.From.Type = obj.TYPE_SHIFT
p.From.Offset = int64(makeshift(v, r1, typ, n))
p.Reg = r0
if r != 0 {
p.To.Type = obj.TYPE_REG
p.To.Reg = r
}
return p
}
// makeregshift encodes a register shifted by a register.
func makeregshift(r1 int16, typ int64, r2 int16) shift {
return shift(int64(r1&0xf) | typ | int64(r2&0xf)<<8 | 1<<4)
}
// genregshift generates a Prog for r = r0 op (r1 shifted by r2).
[dev.regabi] cmd/compile: split out package ssagen [generated] [git-generate] cd src/cmd/compile/internal/gc rf ' # maxOpenDefers is declared in ssa.go but used only by walk. mv maxOpenDefers walk.go # gc.Arch -> ssagen.Arch # It is not as nice but will do for now. mv Arch ArchInfo mv thearch Arch mv Arch ArchInfo arch.go # Pull dwarf out of pgen.go. mv debuginfo declPos createDwarfVars preInliningDcls \ createSimpleVars createSimpleVar \ createComplexVars createComplexVar \ dwarf.go # Pull high-level compilation out of pgen.go, # leaving only the SSA code. mv compilequeue funccompile compile compilenow \ compileFunctions isInlinableButNotInlined \ initLSym \ compile.go mv BoundsCheckFunc GCWriteBarrierReg ssa.go mv largeStack largeStackFrames CheckLargeStacks pgen.go # All that is left in dcl.go is the nowritebarrierrecCheck mv dcl.go nowb.go # Export API and unexport non-API. mv initssaconfig InitConfig mv isIntrinsicCall IsIntrinsicCall mv ssaDumpInline DumpInline mv initSSATables InitTables mv initSSAEnv InitEnv mv compileSSA Compile mv stackOffset StackOffset mv canSSAType TypeOK mv SSAGenState State mv FwdRefAux fwdRefAux mv cgoSymABIs CgoSymABIs mv readSymABIs ReadSymABIs mv initLSym InitLSym mv useABIWrapGen symabiDefs CgoSymABIs ReadSymABIs InitLSym selectLSym makeABIWrapper setupTextLSym abi.go mv arch.go abi.go nowb.go phi.go pgen.go pgen_test.go ssa.go cmd/compile/internal/ssagen ' rm go.go gsubr.go Change-Id: I47fad6cbf1d1e583fd9139003a08401d7cd048a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/279476 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:57:10 -05:00
func genregshift(s *ssagen.State, as obj.As, r0, r1, r2, r int16, typ int64) *obj.Prog {
p := s.Prog(as)
p.From.Type = obj.TYPE_SHIFT
p.From.Offset = int64(makeregshift(r1, typ, r2))
p.Reg = r0
if r != 0 {
p.To.Type = obj.TYPE_REG
p.To.Reg = r
}
return p
}
cmd/compile: optimize arm's bit operation BFC (Bit Field Clear) was introduced in ARMv7, which can simplify ANDconst and BICconst. And this CL implements that optimization. 1. The total size of pkg/android_arm decreases about 3KB, excluding cmd/compile/. 2. There is no regression in the go1 benchmark result, and some cases (FmtFprintfEmpty-4 and RegexpMatchMedium_32-4) even get slight improvement. name old time/op new time/op delta BinaryTree17-4 25.3s ± 1% 25.2s ± 1% ~ (p=0.072 n=30+29) Fannkuch11-4 13.3s ± 0% 13.3s ± 0% +0.13% (p=0.000 n=30+26) FmtFprintfEmpty-4 407ns ± 0% 394ns ± 0% -3.19% (p=0.000 n=26+28) FmtFprintfString-4 664ns ± 0% 662ns ± 0% -0.22% (p=0.000 n=30+30) FmtFprintfInt-4 712ns ± 0% 706ns ± 0% -0.79% (p=0.000 n=30+30) FmtFprintfIntInt-4 1.06µs ± 0% 1.05µs ± 0% -0.38% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 1.16µs ± 0% 1.16µs ± 0% -0.13% (p=0.000 n=30+29) FmtFprintfFloat-4 2.24µs ± 0% 2.23µs ± 0% -0.51% (p=0.000 n=29+21) FmtManyArgs-4 4.09µs ± 0% 4.06µs ± 0% -0.83% (p=0.000 n=28+30) GobDecode-4 55.0ms ± 5% 55.4ms ± 5% ~ (p=0.307 n=30+30) GobEncode-4 51.2ms ± 1% 51.9ms ± 1% +1.23% (p=0.000 n=29+30) Gzip-4 2.64s ± 0% 2.60s ± 0% -1.35% (p=0.000 n=30+29) Gunzip-4 309ms ± 0% 308ms ± 0% -0.27% (p=0.000 n=30+30) HTTPClientServer-4 1.03ms ± 5% 1.02ms ± 4% ~ (p=0.117 n=30+29) JSONEncode-4 101ms ± 2% 101ms ± 2% ~ (p=0.338 n=29+29) JSONDecode-4 383ms ± 2% 382ms ± 2% ~ (p=0.751 n=26+30) Mandelbrot200-4 18.4ms ± 0% 18.4ms ± 0% -0.10% (p=0.000 n=29+29) GoParse-4 22.6ms ± 0% 22.5ms ± 0% -0.39% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 761ns ± 0% 750ns ± 0% -1.47% (p=0.000 n=26+29) RegexpMatchEasy0_1K-4 4.33µs ± 0% 4.34µs ± 0% +0.27% (p=0.000 n=25+28) RegexpMatchEasy1_32-4 809ns ± 0% 795ns ± 0% -1.74% (p=0.000 n=27+25) RegexpMatchEasy1_1K-4 5.54µs ± 0% 5.53µs ± 0% -0.18% (p=0.000 n=29+29) RegexpMatchMedium_32-4 1.11µs ± 0% 1.08µs ± 0% -2.78% (p=0.000 n=27+29) RegexpMatchMedium_1K-4 255µs ± 0% 255µs ± 0% -0.02% (p=0.029 n=30+30) RegexpMatchHard_32-4 14.7µs ± 0% 14.7µs ± 0% -0.28% (p=0.000 n=30+29) RegexpMatchHard_1K-4 439µs ± 0% 439µs ± 0% ~ (p=0.907 n=23+27) Revcomp-4 41.9ms ± 1% 41.9ms ± 1% ~ (p=0.230 n=28+30) Template-4 522ms ± 1% 528ms ± 1% +1.25% (p=0.000 n=30+30) TimeParse-4 3.34µs ± 0% 3.35µs ± 0% +0.23% (p=0.000 n=30+27) TimeFormat-4 6.06µs ± 0% 6.13µs ± 0% +1.08% (p=0.000 n=29+29) [Geo mean] 384µs 382µs -0.37% name old speed new speed delta GobDecode-4 14.0MB/s ± 5% 13.9MB/s ± 5% ~ (p=0.308 n=30+30) GobEncode-4 15.0MB/s ± 1% 14.8MB/s ± 1% -1.22% (p=0.000 n=29+30) Gzip-4 7.36MB/s ± 0% 7.46MB/s ± 0% +1.35% (p=0.000 n=30+30) Gunzip-4 62.8MB/s ± 0% 63.0MB/s ± 0% +0.27% (p=0.000 n=30+30) JSONEncode-4 19.2MB/s ± 2% 19.2MB/s ± 2% ~ (p=0.312 n=29+29) JSONDecode-4 5.05MB/s ± 3% 5.08MB/s ± 2% ~ (p=0.356 n=29+30) GoParse-4 2.56MB/s ± 0% 2.57MB/s ± 0% +0.39% (p=0.000 n=23+27) RegexpMatchEasy0_32-4 42.0MB/s ± 0% 42.6MB/s ± 0% +1.50% (p=0.000 n=26+28) RegexpMatchEasy0_1K-4 236MB/s ± 0% 236MB/s ± 0% -0.27% (p=0.000 n=25+28) RegexpMatchEasy1_32-4 39.6MB/s ± 0% 40.2MB/s ± 0% +1.73% (p=0.000 n=27+27) RegexpMatchEasy1_1K-4 185MB/s ± 0% 185MB/s ± 0% +0.18% (p=0.000 n=29+29) RegexpMatchMedium_32-4 900kB/s ± 0% 920kB/s ± 0% +2.22% (p=0.000 n=29+29) RegexpMatchMedium_1K-4 4.02MB/s ± 0% 4.02MB/s ± 0% +0.07% (p=0.004 n=30+27) RegexpMatchHard_32-4 2.17MB/s ± 0% 2.18MB/s ± 0% +0.46% (p=0.000 n=30+26) RegexpMatchHard_1K-4 2.33MB/s ± 0% 2.33MB/s ± 0% ~ (all equal) Revcomp-4 60.6MB/s ± 1% 60.7MB/s ± 1% ~ (p=0.207 n=28+30) Template-4 3.72MB/s ± 1% 3.67MB/s ± 1% -1.23% (p=0.000 n=30+30) [Geo mean] 12.9MB/s 12.9MB/s +0.29% Change-Id: I07f497f8bb476c950dc555491d00c9066fb64a4e Reviewed-on: https://go-review.googlesource.com/134232 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-10 08:29:52 +00:00
// find a (lsb, width) pair for BFC
// lsb must be in [0, 31], width must be in [1, 32 - lsb]
// return (0xffffffff, 0) if v is not a binary like 0...01...10...0
func getBFC(v uint32) (uint32, uint32) {
var m, l uint32
// BFC is not applicable with zero
if v == 0 {
return 0xffffffff, 0
}
// find the lowest set bit, for example l=2 for 0x3ffffffc
l = uint32(bits.TrailingZeros32(v))
// m-1 represents the highest set bit index, for example m=30 for 0x3ffffffc
m = 32 - uint32(bits.LeadingZeros32(v))
// check if v is a binary like 0...01...10...0
if (1<<m)-(1<<l) == v {
// it must be m > l for non-zero v
return l, m - l
}
// invalid
return 0xffffffff, 0
}
[dev.regabi] cmd/compile: split out package ssagen [generated] [git-generate] cd src/cmd/compile/internal/gc rf ' # maxOpenDefers is declared in ssa.go but used only by walk. mv maxOpenDefers walk.go # gc.Arch -> ssagen.Arch # It is not as nice but will do for now. mv Arch ArchInfo mv thearch Arch mv Arch ArchInfo arch.go # Pull dwarf out of pgen.go. mv debuginfo declPos createDwarfVars preInliningDcls \ createSimpleVars createSimpleVar \ createComplexVars createComplexVar \ dwarf.go # Pull high-level compilation out of pgen.go, # leaving only the SSA code. mv compilequeue funccompile compile compilenow \ compileFunctions isInlinableButNotInlined \ initLSym \ compile.go mv BoundsCheckFunc GCWriteBarrierReg ssa.go mv largeStack largeStackFrames CheckLargeStacks pgen.go # All that is left in dcl.go is the nowritebarrierrecCheck mv dcl.go nowb.go # Export API and unexport non-API. mv initssaconfig InitConfig mv isIntrinsicCall IsIntrinsicCall mv ssaDumpInline DumpInline mv initSSATables InitTables mv initSSAEnv InitEnv mv compileSSA Compile mv stackOffset StackOffset mv canSSAType TypeOK mv SSAGenState State mv FwdRefAux fwdRefAux mv cgoSymABIs CgoSymABIs mv readSymABIs ReadSymABIs mv initLSym InitLSym mv useABIWrapGen symabiDefs CgoSymABIs ReadSymABIs InitLSym selectLSym makeABIWrapper setupTextLSym abi.go mv arch.go abi.go nowb.go phi.go pgen.go pgen_test.go ssa.go cmd/compile/internal/ssagen ' rm go.go gsubr.go Change-Id: I47fad6cbf1d1e583fd9139003a08401d7cd048a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/279476 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:57:10 -05:00
func ssaGenValue(s *ssagen.State, v *ssa.Value) {
switch v.Op {
case ssa.OpCopy, ssa.OpARMMOVWreg:
if v.Type.IsMemory() {
return
}
x := v.Args[0].Reg()
y := v.Reg()
if x == y {
return
}
as := arm.AMOVW
if v.Type.IsFloat() {
switch v.Type.Size() {
case 4:
as = arm.AMOVF
case 8:
as = arm.AMOVD
default:
panic("bad float size")
}
}
p := s.Prog(as)
p.From.Type = obj.TYPE_REG
p.From.Reg = x
p.To.Type = obj.TYPE_REG
p.To.Reg = y
case ssa.OpARMMOVWnop:
// nothing to do
case ssa.OpLoadReg:
if v.Type.IsFlags() {
v.Fatalf("load flags not implemented: %v", v.LongString())
return
}
p := s.Prog(loadByType(v.Type))
[dev.regabi] cmd/compile: split out package ssagen [generated] [git-generate] cd src/cmd/compile/internal/gc rf ' # maxOpenDefers is declared in ssa.go but used only by walk. mv maxOpenDefers walk.go # gc.Arch -> ssagen.Arch # It is not as nice but will do for now. mv Arch ArchInfo mv thearch Arch mv Arch ArchInfo arch.go # Pull dwarf out of pgen.go. mv debuginfo declPos createDwarfVars preInliningDcls \ createSimpleVars createSimpleVar \ createComplexVars createComplexVar \ dwarf.go # Pull high-level compilation out of pgen.go, # leaving only the SSA code. mv compilequeue funccompile compile compilenow \ compileFunctions isInlinableButNotInlined \ initLSym \ compile.go mv BoundsCheckFunc GCWriteBarrierReg ssa.go mv largeStack largeStackFrames CheckLargeStacks pgen.go # All that is left in dcl.go is the nowritebarrierrecCheck mv dcl.go nowb.go # Export API and unexport non-API. mv initssaconfig InitConfig mv isIntrinsicCall IsIntrinsicCall mv ssaDumpInline DumpInline mv initSSATables InitTables mv initSSAEnv InitEnv mv compileSSA Compile mv stackOffset StackOffset mv canSSAType TypeOK mv SSAGenState State mv FwdRefAux fwdRefAux mv cgoSymABIs CgoSymABIs mv readSymABIs ReadSymABIs mv initLSym InitLSym mv useABIWrapGen symabiDefs CgoSymABIs ReadSymABIs InitLSym selectLSym makeABIWrapper setupTextLSym abi.go mv arch.go abi.go nowb.go phi.go pgen.go pgen_test.go ssa.go cmd/compile/internal/ssagen ' rm go.go gsubr.go Change-Id: I47fad6cbf1d1e583fd9139003a08401d7cd048a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/279476 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:57:10 -05:00
ssagen.AddrAuto(&p.From, v.Args[0])
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpStoreReg:
if v.Type.IsFlags() {
v.Fatalf("store flags not implemented: %v", v.LongString())
return
}
p := s.Prog(storeByType(v.Type))
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
[dev.regabi] cmd/compile: split out package ssagen [generated] [git-generate] cd src/cmd/compile/internal/gc rf ' # maxOpenDefers is declared in ssa.go but used only by walk. mv maxOpenDefers walk.go # gc.Arch -> ssagen.Arch # It is not as nice but will do for now. mv Arch ArchInfo mv thearch Arch mv Arch ArchInfo arch.go # Pull dwarf out of pgen.go. mv debuginfo declPos createDwarfVars preInliningDcls \ createSimpleVars createSimpleVar \ createComplexVars createComplexVar \ dwarf.go # Pull high-level compilation out of pgen.go, # leaving only the SSA code. mv compilequeue funccompile compile compilenow \ compileFunctions isInlinableButNotInlined \ initLSym \ compile.go mv BoundsCheckFunc GCWriteBarrierReg ssa.go mv largeStack largeStackFrames CheckLargeStacks pgen.go # All that is left in dcl.go is the nowritebarrierrecCheck mv dcl.go nowb.go # Export API and unexport non-API. mv initssaconfig InitConfig mv isIntrinsicCall IsIntrinsicCall mv ssaDumpInline DumpInline mv initSSATables InitTables mv initSSAEnv InitEnv mv compileSSA Compile mv stackOffset StackOffset mv canSSAType TypeOK mv SSAGenState State mv FwdRefAux fwdRefAux mv cgoSymABIs CgoSymABIs mv readSymABIs ReadSymABIs mv initLSym InitLSym mv useABIWrapGen symabiDefs CgoSymABIs ReadSymABIs InitLSym selectLSym makeABIWrapper setupTextLSym abi.go mv arch.go abi.go nowb.go phi.go pgen.go pgen_test.go ssa.go cmd/compile/internal/ssagen ' rm go.go gsubr.go Change-Id: I47fad6cbf1d1e583fd9139003a08401d7cd048a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/279476 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:57:10 -05:00
ssagen.AddrAuto(&p.To, v)
case ssa.OpARMADD,
ssa.OpARMADC,
ssa.OpARMSUB,
ssa.OpARMSBC,
ssa.OpARMRSB,
ssa.OpARMAND,
ssa.OpARMOR,
ssa.OpARMXOR,
ssa.OpARMBIC,
ssa.OpARMMUL,
ssa.OpARMADDF,
ssa.OpARMADDD,
ssa.OpARMSUBF,
ssa.OpARMSUBD,
ssa.OpARMSLL,
ssa.OpARMSRL,
ssa.OpARMSRA,
ssa.OpARMMULF,
ssa.OpARMMULD,
cmd/compile: optimize ARM code with NMULF/NMULD NMULF and NMULD are efficient FP instructions, and the go compiler can use them to generate better code. The benchmark tests of my patch did not show general change, but big improvement in special cases. 1.A special test case improved 12.6%. https://github.com/benshi001/ugo1/blob/master/fpmul_test.go name old time/op new time/op delta FPMul-4 398µs ± 1% 348µs ± 1% -12.64% (p=0.000 n=40+40) 2. the compilecmp test showed little change. name old time/op new time/op delta Template 2.30s ± 1% 2.31s ± 1% ~ (p=0.754 n=17+19) Unicode 1.31s ± 3% 1.32s ± 5% ~ (p=0.265 n=20+20) GoTypes 7.73s ± 2% 7.73s ± 1% ~ (p=0.925 n=20+20) Compiler 37.0s ± 1% 37.3s ± 2% +0.79% (p=0.002 n=19+20) SSA 83.8s ± 4% 83.5s ± 2% ~ (p=0.964 n=20+17) Flate 1.43s ± 2% 1.44s ± 1% ~ (p=0.602 n=20+20) GoParser 1.82s ± 2% 1.81s ± 2% ~ (p=0.141 n=19+20) Reflect 5.08s ± 2% 5.08s ± 3% ~ (p=0.835 n=20+19) Tar 2.36s ± 1% 2.35s ± 1% ~ (p=0.195 n=18+17) XML 2.57s ± 2% 2.56s ± 1% ~ (p=0.283 n=20+17) [Geo mean] 4.74s 4.75s +0.05% name old user-time/op new user-time/op delta Template 2.75s ± 2% 2.75s ± 0% ~ (p=0.620 n=20+15) Unicode 1.59s ± 4% 1.60s ± 4% ~ (p=0.479 n=20+19) GoTypes 9.48s ± 1% 9.47s ± 1% ~ (p=0.743 n=20+20) Compiler 45.7s ± 1% 45.7s ± 1% ~ (p=0.482 n=19+20) SSA 109s ± 1% 109s ± 2% ~ (p=0.800 n=18+20) Flate 1.67s ± 3% 1.67s ± 3% ~ (p=0.598 n=19+18) GoParser 2.15s ± 4% 2.13s ± 3% ~ (p=0.153 n=20+20) Reflect 5.95s ± 2% 5.95s ± 2% ~ (p=0.961 n=19+20) Tar 2.93s ± 2% 2.92s ± 3% ~ (p=0.242 n=20+19) XML 3.02s ± 3% 3.04s ± 3% ~ (p=0.233 n=19+18) [Geo mean] 5.74s 5.74s -0.04% name old text-bytes new text-bytes delta HelloSize 588kB ± 0% 588kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. The go1 benchmark showed little change in total. name old time/op new time/op delta BinaryTree17-4 41.8s ± 1% 41.8s ± 1% ~ (p=0.388 n=40+39) Fannkuch11-4 24.1s ± 1% 24.1s ± 1% ~ (p=0.077 n=40+40) FmtFprintfEmpty-4 834ns ± 1% 831ns ± 1% -0.31% (p=0.002 n=40+37) FmtFprintfString-4 1.34µs ± 1% 1.34µs ± 0% ~ (p=0.387 n=40+40) FmtFprintfInt-4 1.44µs ± 1% 1.44µs ± 1% ~ (p=0.421 n=40+40) FmtFprintfIntInt-4 2.09µs ± 0% 2.09µs ± 1% ~ (p=0.589 n=40+39) FmtFprintfPrefixedInt-4 2.32µs ± 1% 2.33µs ± 1% +0.15% (p=0.001 n=40+40) FmtFprintfFloat-4 4.51µs ± 0% 4.44µs ± 1% -1.50% (p=0.000 n=40+40) FmtManyArgs-4 7.94µs ± 0% 7.97µs ± 0% +0.36% (p=0.001 n=32+40) GobDecode-4 104ms ± 1% 102ms ± 2% -1.27% (p=0.000 n=39+37) GobEncode-4 90.5ms ± 1% 90.9ms ± 2% +0.40% (p=0.006 n=37+40) Gzip-4 4.10s ± 2% 4.08s ± 1% -0.30% (p=0.004 n=40+40) Gunzip-4 603ms ± 0% 602ms ± 1% ~ (p=0.303 n=37+40) HTTPClientServer-4 672µs ± 3% 658µs ± 2% -2.08% (p=0.000 n=39+37) JSONEncode-4 238ms ± 1% 239ms ± 0% +0.26% (p=0.001 n=40+25) JSONDecode-4 884ms ± 1% 885ms ± 1% +0.16% (p=0.012 n=40+40) Mandelbrot200-4 49.3ms ± 0% 49.3ms ± 0% ~ (p=0.588 n=40+38) GoParse-4 46.3ms ± 1% 46.4ms ± 2% ~ (p=0.487 n=40+40) RegexpMatchEasy0_32-4 1.28µs ± 1% 1.28µs ± 0% +0.12% (p=0.003 n=40+40) RegexpMatchEasy0_1K-4 7.78µs ± 5% 7.78µs ± 4% ~ (p=0.825 n=40+40) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.29µs ± 0% ~ (p=0.659 n=40+40) RegexpMatchEasy1_1K-4 10.3µs ± 3% 10.4µs ± 2% ~ (p=0.266 n=40+40) RegexpMatchMedium_32-4 2.05µs ± 1% 2.05µs ± 0% -0.18% (p=0.002 n=40+28) RegexpMatchMedium_1K-4 533µs ± 1% 534µs ± 1% ~ (p=0.397 n=37+40) RegexpMatchHard_32-4 28.9µs ± 1% 28.9µs ± 1% -0.22% (p=0.002 n=40+40) RegexpMatchHard_1K-4 868µs ± 1% 870µs ± 1% +0.21% (p=0.015 n=40+40) Revcomp-4 67.3ms ± 1% 67.2ms ± 2% ~ (p=0.262 n=38+39) Template-4 1.07s ± 1% 1.07s ± 1% ~ (p=0.276 n=40+40) TimeParse-4 7.16µs ± 1% 7.16µs ± 1% ~ (p=0.610 n=39+40) TimeFormat-4 13.3µs ± 1% 13.3µs ± 1% ~ (p=0.617 n=38+40) [Geo mean] 720µs 719µs -0.13% name old speed new speed delta GobDecode-4 7.39MB/s ± 1% 7.49MB/s ± 2% +1.25% (p=0.000 n=39+38) GobEncode-4 8.48MB/s ± 1% 8.45MB/s ± 2% -0.40% (p=0.005 n=37+40) Gzip-4 4.74MB/s ± 2% 4.75MB/s ± 1% +0.30% (p=0.018 n=40+40) Gunzip-4 32.2MB/s ± 0% 32.2MB/s ± 1% ~ (p=0.272 n=36+40) JSONEncode-4 8.15MB/s ± 1% 8.13MB/s ± 0% -0.26% (p=0.003 n=40+25) JSONDecode-4 2.19MB/s ± 1% 2.19MB/s ± 1% ~ (p=0.676 n=40+40) GoParse-4 1.25MB/s ± 2% 1.25MB/s ± 2% ~ (p=0.823 n=40+40) RegexpMatchEasy0_32-4 25.1MB/s ± 1% 25.1MB/s ± 0% -0.12% (p=0.006 n=40+40) RegexpMatchEasy0_1K-4 132MB/s ± 5% 132MB/s ± 5% ~ (p=0.821 n=40+40) RegexpMatchEasy1_32-4 24.7MB/s ± 1% 24.7MB/s ± 0% ~ (p=0.630 n=40+40) RegexpMatchEasy1_1K-4 99.1MB/s ± 3% 98.8MB/s ± 2% ~ (p=0.268 n=40+40) RegexpMatchMedium_32-4 487kB/s ± 2% 490kB/s ± 0% +0.51% (p=0.001 n=40+40) RegexpMatchMedium_1K-4 1.92MB/s ± 1% 1.92MB/s ± 1% ~ (p=0.208 n=39+40) RegexpMatchHard_32-4 1.11MB/s ± 1% 1.11MB/s ± 0% +0.36% (p=0.000 n=40+33) RegexpMatchHard_1K-4 1.18MB/s ± 1% 1.18MB/s ± 1% ~ (p=0.207 n=40+37) Revcomp-4 37.8MB/s ± 1% 37.8MB/s ± 2% ~ (p=0.276 n=38+39) Template-4 1.82MB/s ± 1% 1.81MB/s ± 1% ~ (p=0.122 n=38+40) [Geo mean] 6.81MB/s 6.81MB/s +0.06% fixes #19843 Change-Id: Ief3a0c2b15f59d40c7b40f2784eeb71196685b59 Reviewed-on: https://go-review.googlesource.com/61150 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-02 08:14:08 +00:00
ssa.OpARMNMULF,
ssa.OpARMNMULD,
ssa.OpARMDIVF,
ssa.OpARMDIVD:
r := v.Reg()
r1 := v.Args[0].Reg()
r2 := v.Args[1].Reg()
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = r2
p.Reg = r1
p.To.Type = obj.TYPE_REG
p.To.Reg = r
case ssa.OpARMSRR:
genregshift(s, arm.AMOVW, 0, v.Args[0].Reg(), v.Args[1].Reg(), v.Reg(), arm.SHIFT_RR)
case ssa.OpARMMULAF, ssa.OpARMMULAD, ssa.OpARMMULSF, ssa.OpARMMULSD, ssa.OpARMFMULAD:
cmd/compile: optimize ARM code with MULAF/MULSF/MULAD/MULSD The go compiler can generate better ARM code with those more efficient FP instructions. And there is little improvement in total but big improvement in special cases. 1. The size of pkg/linux_arm/math.a shrinks by 2.4%. 2. there is neither improvement nor regression in compilecmp benchmark. name old time/op new time/op delta Template 2.32s ± 2% 2.32s ± 1% ~ (p=1.000 n=9+10) Unicode 1.32s ± 4% 1.32s ± 4% ~ (p=0.912 n=10+10) GoTypes 7.76s ± 1% 7.79s ± 1% ~ (p=0.447 n=9+10) Compiler 37.4s ± 2% 37.2s ± 2% ~ (p=0.218 n=10+10) SSA 84.8s ± 2% 85.0s ± 1% ~ (p=0.604 n=10+9) Flate 1.45s ± 2% 1.44s ± 2% ~ (p=0.075 n=10+10) GoParser 1.82s ± 1% 1.81s ± 1% ~ (p=0.190 n=10+10) Reflect 5.06s ± 1% 5.05s ± 1% ~ (p=0.315 n=10+9) Tar 2.37s ± 1% 2.37s ± 2% ~ (p=0.912 n=10+10) XML 2.56s ± 1% 2.58s ± 2% ~ (p=0.089 n=10+10) [Geo mean] 4.77s 4.77s -0.08% name old user-time/op new user-time/op delta Template 2.74s ± 2% 2.75s ± 2% ~ (p=0.856 n=9+10) Unicode 1.61s ± 4% 1.62s ± 3% ~ (p=0.693 n=10+10) GoTypes 9.55s ± 1% 9.49s ± 2% ~ (p=0.056 n=9+10) Compiler 45.9s ± 1% 45.8s ± 1% ~ (p=0.345 n=9+10) SSA 110s ± 1% 110s ± 1% ~ (p=0.763 n=9+10) Flate 1.68s ± 2% 1.68s ± 3% ~ (p=0.616 n=10+10) GoParser 2.14s ± 4% 2.14s ± 1% ~ (p=0.825 n=10+9) Reflect 5.95s ± 1% 5.97s ± 3% ~ (p=0.951 n=9+10) Tar 2.94s ± 3% 2.93s ± 2% ~ (p=0.359 n=10+10) XML 3.03s ± 3% 3.07s ± 6% ~ (p=0.166 n=10+10) [Geo mean] 5.76s 5.77s +0.12% name old text-bytes new text-bytes delta HelloSize 588kB ± 0% 588kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. The performance of Mandelbrot200 improves 15%, though little improvement in total. name old time/op new time/op delta BinaryTree17-4 41.7s ± 1% 41.7s ± 1% ~ (p=0.264 n=29+23) Fannkuch11-4 24.2s ± 0% 24.1s ± 1% -0.13% (p=0.050 n=30+30) FmtFprintfEmpty-4 826ns ± 1% 824ns ± 1% -0.24% (p=0.038 n=25+30) FmtFprintfString-4 1.38µs ± 1% 1.38µs ± 0% -0.42% (p=0.000 n=27+25) FmtFprintfInt-4 1.46µs ± 1% 1.46µs ± 0% ~ (p=0.060 n=30+23) FmtFprintfIntInt-4 2.11µs ± 1% 2.08µs ± 0% -1.04% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 2.23µs ± 1% 2.22µs ± 1% -0.51% (p=0.000 n=30+30) FmtFprintfFloat-4 4.49µs ± 1% 4.48µs ± 1% -0.22% (p=0.004 n=26+30) FmtManyArgs-4 8.06µs ± 1% 8.12µs ± 1% +0.68% (p=0.000 n=25+30) GobDecode-4 104ms ± 1% 104ms ± 2% ~ (p=0.362 n=29+29) GobEncode-4 92.9ms ± 1% 92.8ms ± 2% ~ (p=0.786 n=30+30) Gzip-4 4.12s ± 1% 4.12s ± 1% ~ (p=0.314 n=30+30) Gunzip-4 602ms ± 1% 603ms ± 1% ~ (p=0.164 n=30+30) HTTPClientServer-4 659µs ± 1% 655µs ± 2% -0.64% (p=0.006 n=25+28) JSONEncode-4 234ms ± 1% 235ms ± 1% +0.29% (p=0.050 n=30+30) JSONDecode-4 912ms ± 0% 911ms ± 0% ~ (p=0.385 n=18+24) Mandelbrot200-4 49.2ms ± 0% 41.7ms ± 0% -15.35% (p=0.000 n=25+27) GoParse-4 46.3ms ± 1% 46.3ms ± 2% ~ (p=0.572 n=30+30) RegexpMatchEasy0_32-4 1.29µs ± 1% 1.27µs ± 0% -1.59% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 7.62µs ± 4% 7.71µs ± 3% ~ (p=0.074 n=30+30) RegexpMatchEasy1_32-4 1.31µs ± 0% 1.30µs ± 1% -0.71% (p=0.000 n=23+30) RegexpMatchEasy1_1K-4 10.3µs ± 3% 10.3µs ± 5% ~ (p=0.105 n=30+30) RegexpMatchMedium_32-4 2.06µs ± 1% 2.06µs ± 1% ~ (p=0.100 n=30+30) RegexpMatchMedium_1K-4 533µs ± 1% 534µs ± 1% ~ (p=0.254 n=29+30) RegexpMatchHard_32-4 28.9µs ± 0% 28.9µs ± 0% ~ (p=0.154 n=30+30) RegexpMatchHard_1K-4 868µs ± 1% 867µs ± 0% ~ (p=0.729 n=30+23) Revcomp-4 66.9ms ± 1% 67.2ms ± 2% ~ (p=0.102 n=28+29) Template-4 1.07s ± 1% 1.06s ± 1% -0.53% (p=0.000 n=30+30) TimeParse-4 7.07µs ± 1% 7.01µs ± 0% -0.85% (p=0.000 n=30+25) TimeFormat-4 13.1µs ± 0% 13.2µs ± 1% +0.77% (p=0.000 n=27+27) [Geo mean] 721µs 716µs -0.70% name old speed new speed delta GobDecode-4 7.38MB/s ± 1% 7.37MB/s ± 2% ~ (p=0.399 n=29+29) GobEncode-4 8.26MB/s ± 1% 8.27MB/s ± 2% ~ (p=0.790 n=30+30) Gzip-4 4.71MB/s ± 1% 4.71MB/s ± 1% ~ (p=0.885 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.2MB/s ± 1% ~ (p=0.190 n=30+30) JSONEncode-4 8.28MB/s ± 1% 8.25MB/s ± 1% ~ (p=0.053 n=30+30) JSONDecode-4 2.13MB/s ± 0% 2.12MB/s ± 1% ~ (p=0.072 n=18+30) GoParse-4 1.25MB/s ± 1% 1.25MB/s ± 2% ~ (p=0.863 n=30+30) RegexpMatchEasy0_32-4 24.8MB/s ± 0% 25.2MB/s ± 1% +1.61% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 134MB/s ± 4% 133MB/s ± 3% ~ (p=0.074 n=30+30) RegexpMatchEasy1_32-4 24.5MB/s ± 0% 24.6MB/s ± 1% +0.72% (p=0.000 n=23+30) RegexpMatchEasy1_1K-4 99.1MB/s ± 3% 99.8MB/s ± 5% ~ (p=0.105 n=30+30) RegexpMatchMedium_32-4 483kB/s ± 1% 487kB/s ± 1% +0.83% (p=0.002 n=30+30) RegexpMatchMedium_1K-4 1.92MB/s ± 1% 1.92MB/s ± 1% ~ (p=0.058 n=30+30) RegexpMatchHard_32-4 1.10MB/s ± 0% 1.11MB/s ± 0% ~ (p=0.804 n=30+30) RegexpMatchHard_1K-4 1.18MB/s ± 0% 1.18MB/s ± 0% ~ (all equal) Revcomp-4 38.0MB/s ± 1% 37.8MB/s ± 2% ~ (p=0.098 n=28+29) Template-4 1.82MB/s ± 1% 1.83MB/s ± 1% +0.55% (p=0.000 n=29+29) [Geo mean] 6.79MB/s 6.79MB/s +0.09% Change-Id: Ia91991c2c5c59c5df712de85a83b13a21c0a554b Reviewed-on: https://go-review.googlesource.com/63770 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-09-14 06:52:51 +00:00
r := v.Reg()
r0 := v.Args[0].Reg()
r1 := v.Args[1].Reg()
r2 := v.Args[2].Reg()
if r != r0 {
v.Fatalf("result and addend are not in the same register: %v", v.LongString())
}
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = r2
p.Reg = r1
p.To.Type = obj.TYPE_REG
p.To.Reg = r
case ssa.OpARMADDS,
ssa.OpARMSUBS:
r := v.Reg0()
r1 := v.Args[0].Reg()
r2 := v.Args[1].Reg()
p := s.Prog(v.Op.Asm())
p.Scond = arm.C_SBIT
p.From.Type = obj.TYPE_REG
p.From.Reg = r2
p.Reg = r1
p.To.Type = obj.TYPE_REG
p.To.Reg = r
case ssa.OpARMSRAcond:
// ARM shift instructions uses only the low-order byte of the shift amount
// generate conditional instructions to deal with large shifts
// flag is already set
// SRA.HS $31, Rarg0, Rdst // shift 31 bits to get the sign bit
// SRA.LO Rarg1, Rarg0, Rdst
r := v.Reg()
r1 := v.Args[0].Reg()
r2 := v.Args[1].Reg()
p := s.Prog(arm.ASRA)
p.Scond = arm.C_SCOND_HS
p.From.Type = obj.TYPE_CONST
p.From.Offset = 31
p.Reg = r1
p.To.Type = obj.TYPE_REG
p.To.Reg = r
p = s.Prog(arm.ASRA)
p.Scond = arm.C_SCOND_LO
p.From.Type = obj.TYPE_REG
p.From.Reg = r2
p.Reg = r1
p.To.Type = obj.TYPE_REG
p.To.Reg = r
cmd/compile: optimized ARM code with BFX/BFXU BFX&BFXU were introduced in ARMv6T2. A single BFX or BFXU is more efficiently than a pair of left-shift/right-shift in bit field extraction. This patch implements this optimization. And the benchmark tests show big improvement in special cases and little change in total. 1. There is big improvement in a special test case. name old time/op new time/op delta BFX-4 665µs ± 1% 595µs ± 0% -10.61% (p=0.000 n=20+20) (The test case: https://github.com/benshi001/ugo1/blob/master/bfx_test.go) 2. The compilecmp benchmark shows no regression. name old time/op new time/op delta Template 2.33s ± 2% 2.34s ± 2% ~ (p=0.356 n=9+10) Unicode 1.32s ± 2% 1.30s ± 2% ~ (p=0.139 n=9+8) GoTypes 7.77s ± 1% 7.76s ± 1% ~ (p=0.780 n=10+9) Compiler 37.3s ± 1% 37.1s ± 1% ~ (p=0.211 n=10+9) SSA 84.3s ± 2% 84.3s ± 2% ~ (p=0.842 n=10+9) Flate 1.45s ± 1% 1.45s ± 3% ~ (p=0.853 n=10+10) GoParser 1.83s ± 2% 1.83s ± 2% ~ (p=0.739 n=10+10) Reflect 5.08s ± 2% 5.09s ± 2% ~ (p=0.720 n=9+10) Tar 2.44s ± 1% 2.44s ± 2% ~ (p=0.684 n=10+10) XML 2.62s ± 2% 2.62s ± 2% ~ (p=0.529 n=10+10) [Geo mean] 4.80s 4.79s -0.06% name old user-time/op new user-time/op delta Template 2.76s ± 2% 2.75s ± 3% ~ (p=0.893 n=10+10) Unicode 1.63s ± 1% 1.60s ± 1% -2.07% (p=0.000 n=8+9) GoTypes 9.54s ± 1% 9.52s ± 1% ~ (p=0.215 n=10+10) Compiler 46.0s ± 1% 46.0s ± 1% ~ (p=0.853 n=10+10) SSA 110s ± 1% 110s ± 1% ~ (p=0.838 n=10+10) Flate 1.69s ± 3% 1.69s ± 5% ~ (p=0.957 n=10+10) GoParser 2.15s ± 2% 2.15s ± 2% ~ (p=0.749 n=10+10) Reflect 6.03s ± 1% 5.99s ± 2% ~ (p=0.060 n=9+10) Tar 3.02s ± 2% 2.99s ± 2% ~ (p=0.214 n=10+10) XML 3.10s ± 2% 3.08s ± 2% ~ (p=0.732 n=9+10) [Geo mean] 5.82s 5.79s -0.41% name old text-bytes new text-bytes delta HelloSize 589kB ± 0% 589kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 76.9kB ± 0% 76.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. The go1 benchmark shows little change in total. (excluding noise) name old time/op new time/op delta BinaryTree17-4 41.5s ± 1% 41.6s ± 1% ~ (p=0.373 n=30+26) Fannkuch11-4 23.6s ± 1% 23.6s ± 1% +0.28% (p=0.003 n=29+30) FmtFprintfEmpty-4 826ns ± 1% 827ns ± 1% ~ (p=0.155 n=30+30) FmtFprintfString-4 1.35µs ± 1% 1.35µs ± 1% ~ (p=0.499 n=30+30) FmtFprintfInt-4 1.43µs ± 1% 1.41µs ± 1% -1.19% (p=0.000 n=30+30) FmtFprintfIntInt-4 2.15µs ± 1% 2.11µs ± 1% -1.78% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 2.21µs ± 1% 2.21µs ± 1% ~ (p=0.881 n=30+30) FmtFprintfFloat-4 4.41µs ± 1% 4.44µs ± 0% +0.64% (p=0.000 n=30+30) FmtManyArgs-4 8.06µs ± 1% 8.06µs ± 0% ~ (p=0.871 n=30+30) GobDecode-4 103ms ± 1% 104ms ± 2% +0.54% (p=0.013 n=28+29) GobEncode-4 92.4ms ± 1% 92.6ms ± 1% ~ (p=0.447 n=30+29) Gzip-4 4.17s ± 1% 4.06s ± 1% -2.56% (p=0.000 n=29+30) Gunzip-4 603ms ± 1% 602ms ± 1% ~ (p=0.423 n=30+30) HTTPClientServer-4 688µs ± 2% 674µs ± 3% -2.09% (p=0.000 n=29+30) JSONEncode-4 237ms ± 1% 237ms ± 1% ~ (p=0.061 n=29+30) JSONDecode-4 907ms ± 1% 910ms ± 1% ~ (p=0.061 n=30+30) Mandelbrot200-4 41.7ms ± 0% 41.7ms ± 0% +0.19% (p=0.000 n=24+20) GoParse-4 45.7ms ± 2% 45.5ms ± 2% -0.29% (p=0.005 n=30+30) RegexpMatchEasy0_32-4 1.27µs ± 0% 1.27µs ± 0% +0.12% (p=0.031 n=30+30) RegexpMatchEasy0_1K-4 7.77µs ± 4% 7.73µs ± 3% ~ (p=0.169 n=30+30) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.29µs ± 1% ~ (p=0.126 n=30+30) RegexpMatchEasy1_1K-4 10.4µs ± 3% 10.3µs ± 2% -1.32% (p=0.004 n=30+29) RegexpMatchMedium_32-4 2.06µs ± 0% 2.06µs ± 0% ~ (p=0.071 n=30+30) RegexpMatchMedium_1K-4 531µs ± 1% 530µs ± 0% ~ (p=0.121 n=30+23) RegexpMatchHard_32-4 28.7µs ± 1% 28.6µs ± 1% -0.21% (p=0.001 n=30+27) RegexpMatchHard_1K-4 860µs ± 1% 857µs ± 1% ~ (p=0.105 n=30+27) Revcomp-4 67.3ms ± 2% 67.3ms ± 2% ~ (p=0.805 n=29+29) Template-4 1.08s ± 1% 1.08s ± 1% ~ (p=0.260 n=30+30) TimeParse-4 7.04µs ± 0% 7.04µs ± 0% ~ (p=0.315 n=30+30) TimeFormat-4 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.077 n=30+30) [Geo mean] 715µs 713µs -0.30% name old speed new speed delta GobDecode-4 7.42MB/s ± 1% 7.38MB/s ± 2% -0.54% (p=0.011 n=28+29) GobEncode-4 8.30MB/s ± 1% 8.29MB/s ± 1% ~ (p=0.484 n=30+29) Gzip-4 4.65MB/s ± 2% 4.78MB/s ± 1% +2.73% (p=0.000 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.2MB/s ± 1% ~ (p=0.357 n=30+30) JSONEncode-4 8.18MB/s ± 1% 8.19MB/s ± 1% ~ (p=0.052 n=29+30) JSONDecode-4 2.14MB/s ± 1% 2.13MB/s ± 1% ~ (p=0.074 n=30+29) GoParse-4 1.27MB/s ± 1% 1.27MB/s ± 2% ~ (p=0.618 n=24+30) RegexpMatchEasy0_32-4 25.2MB/s ± 0% 25.2MB/s ± 0% -0.12% (p=0.031 n=30+30) RegexpMatchEasy0_1K-4 132MB/s ± 5% 132MB/s ± 2% ~ (p=0.171 n=30+30) RegexpMatchEasy1_32-4 24.8MB/s ± 1% 24.9MB/s ± 1% ~ (p=0.106 n=30+30) RegexpMatchEasy1_1K-4 98.4MB/s ± 3% 99.6MB/s ± 4% +1.19% (p=0.011 n=30+30) RegexpMatchMedium_32-4 483kB/s ± 1% 484kB/s ± 1% ~ (p=0.426 n=30+30) RegexpMatchMedium_1K-4 1.93MB/s ± 1% 1.93MB/s ± 0% ~ (p=0.157 n=30+17) RegexpMatchHard_32-4 1.12MB/s ± 1% 1.12MB/s ± 0% +0.33% (p=0.001 n=30+24) RegexpMatchHard_1K-4 1.19MB/s ± 1% 1.19MB/s ± 1% ~ (p=0.290 n=30+30) Revcomp-4 37.8MB/s ± 2% 37.8MB/s ± 1% ~ (p=0.815 n=29+29) Template-4 1.80MB/s ± 1% 1.80MB/s ± 1% ~ (p=0.586 n=30+30) [Geo mean] 6.80MB/s 6.81MB/s +0.25% fixes #20966 Change-Id: Idb5567bbe988c875315b8c98c128957cd474ccc5 Reviewed-on: https://go-review.googlesource.com/64950 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com>
2017-09-20 08:48:34 +00:00
case ssa.OpARMBFX, ssa.OpARMBFXU:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
p.From.Offset = v.AuxInt >> 8
p.AddRestSourceConst(v.AuxInt & 0xff)
cmd/compile: optimized ARM code with BFX/BFXU BFX&BFXU were introduced in ARMv6T2. A single BFX or BFXU is more efficiently than a pair of left-shift/right-shift in bit field extraction. This patch implements this optimization. And the benchmark tests show big improvement in special cases and little change in total. 1. There is big improvement in a special test case. name old time/op new time/op delta BFX-4 665µs ± 1% 595µs ± 0% -10.61% (p=0.000 n=20+20) (The test case: https://github.com/benshi001/ugo1/blob/master/bfx_test.go) 2. The compilecmp benchmark shows no regression. name old time/op new time/op delta Template 2.33s ± 2% 2.34s ± 2% ~ (p=0.356 n=9+10) Unicode 1.32s ± 2% 1.30s ± 2% ~ (p=0.139 n=9+8) GoTypes 7.77s ± 1% 7.76s ± 1% ~ (p=0.780 n=10+9) Compiler 37.3s ± 1% 37.1s ± 1% ~ (p=0.211 n=10+9) SSA 84.3s ± 2% 84.3s ± 2% ~ (p=0.842 n=10+9) Flate 1.45s ± 1% 1.45s ± 3% ~ (p=0.853 n=10+10) GoParser 1.83s ± 2% 1.83s ± 2% ~ (p=0.739 n=10+10) Reflect 5.08s ± 2% 5.09s ± 2% ~ (p=0.720 n=9+10) Tar 2.44s ± 1% 2.44s ± 2% ~ (p=0.684 n=10+10) XML 2.62s ± 2% 2.62s ± 2% ~ (p=0.529 n=10+10) [Geo mean] 4.80s 4.79s -0.06% name old user-time/op new user-time/op delta Template 2.76s ± 2% 2.75s ± 3% ~ (p=0.893 n=10+10) Unicode 1.63s ± 1% 1.60s ± 1% -2.07% (p=0.000 n=8+9) GoTypes 9.54s ± 1% 9.52s ± 1% ~ (p=0.215 n=10+10) Compiler 46.0s ± 1% 46.0s ± 1% ~ (p=0.853 n=10+10) SSA 110s ± 1% 110s ± 1% ~ (p=0.838 n=10+10) Flate 1.69s ± 3% 1.69s ± 5% ~ (p=0.957 n=10+10) GoParser 2.15s ± 2% 2.15s ± 2% ~ (p=0.749 n=10+10) Reflect 6.03s ± 1% 5.99s ± 2% ~ (p=0.060 n=9+10) Tar 3.02s ± 2% 2.99s ± 2% ~ (p=0.214 n=10+10) XML 3.10s ± 2% 3.08s ± 2% ~ (p=0.732 n=9+10) [Geo mean] 5.82s 5.79s -0.41% name old text-bytes new text-bytes delta HelloSize 589kB ± 0% 589kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 76.9kB ± 0% 76.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. The go1 benchmark shows little change in total. (excluding noise) name old time/op new time/op delta BinaryTree17-4 41.5s ± 1% 41.6s ± 1% ~ (p=0.373 n=30+26) Fannkuch11-4 23.6s ± 1% 23.6s ± 1% +0.28% (p=0.003 n=29+30) FmtFprintfEmpty-4 826ns ± 1% 827ns ± 1% ~ (p=0.155 n=30+30) FmtFprintfString-4 1.35µs ± 1% 1.35µs ± 1% ~ (p=0.499 n=30+30) FmtFprintfInt-4 1.43µs ± 1% 1.41µs ± 1% -1.19% (p=0.000 n=30+30) FmtFprintfIntInt-4 2.15µs ± 1% 2.11µs ± 1% -1.78% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 2.21µs ± 1% 2.21µs ± 1% ~ (p=0.881 n=30+30) FmtFprintfFloat-4 4.41µs ± 1% 4.44µs ± 0% +0.64% (p=0.000 n=30+30) FmtManyArgs-4 8.06µs ± 1% 8.06µs ± 0% ~ (p=0.871 n=30+30) GobDecode-4 103ms ± 1% 104ms ± 2% +0.54% (p=0.013 n=28+29) GobEncode-4 92.4ms ± 1% 92.6ms ± 1% ~ (p=0.447 n=30+29) Gzip-4 4.17s ± 1% 4.06s ± 1% -2.56% (p=0.000 n=29+30) Gunzip-4 603ms ± 1% 602ms ± 1% ~ (p=0.423 n=30+30) HTTPClientServer-4 688µs ± 2% 674µs ± 3% -2.09% (p=0.000 n=29+30) JSONEncode-4 237ms ± 1% 237ms ± 1% ~ (p=0.061 n=29+30) JSONDecode-4 907ms ± 1% 910ms ± 1% ~ (p=0.061 n=30+30) Mandelbrot200-4 41.7ms ± 0% 41.7ms ± 0% +0.19% (p=0.000 n=24+20) GoParse-4 45.7ms ± 2% 45.5ms ± 2% -0.29% (p=0.005 n=30+30) RegexpMatchEasy0_32-4 1.27µs ± 0% 1.27µs ± 0% +0.12% (p=0.031 n=30+30) RegexpMatchEasy0_1K-4 7.77µs ± 4% 7.73µs ± 3% ~ (p=0.169 n=30+30) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.29µs ± 1% ~ (p=0.126 n=30+30) RegexpMatchEasy1_1K-4 10.4µs ± 3% 10.3µs ± 2% -1.32% (p=0.004 n=30+29) RegexpMatchMedium_32-4 2.06µs ± 0% 2.06µs ± 0% ~ (p=0.071 n=30+30) RegexpMatchMedium_1K-4 531µs ± 1% 530µs ± 0% ~ (p=0.121 n=30+23) RegexpMatchHard_32-4 28.7µs ± 1% 28.6µs ± 1% -0.21% (p=0.001 n=30+27) RegexpMatchHard_1K-4 860µs ± 1% 857µs ± 1% ~ (p=0.105 n=30+27) Revcomp-4 67.3ms ± 2% 67.3ms ± 2% ~ (p=0.805 n=29+29) Template-4 1.08s ± 1% 1.08s ± 1% ~ (p=0.260 n=30+30) TimeParse-4 7.04µs ± 0% 7.04µs ± 0% ~ (p=0.315 n=30+30) TimeFormat-4 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.077 n=30+30) [Geo mean] 715µs 713µs -0.30% name old speed new speed delta GobDecode-4 7.42MB/s ± 1% 7.38MB/s ± 2% -0.54% (p=0.011 n=28+29) GobEncode-4 8.30MB/s ± 1% 8.29MB/s ± 1% ~ (p=0.484 n=30+29) Gzip-4 4.65MB/s ± 2% 4.78MB/s ± 1% +2.73% (p=0.000 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.2MB/s ± 1% ~ (p=0.357 n=30+30) JSONEncode-4 8.18MB/s ± 1% 8.19MB/s ± 1% ~ (p=0.052 n=29+30) JSONDecode-4 2.14MB/s ± 1% 2.13MB/s ± 1% ~ (p=0.074 n=30+29) GoParse-4 1.27MB/s ± 1% 1.27MB/s ± 2% ~ (p=0.618 n=24+30) RegexpMatchEasy0_32-4 25.2MB/s ± 0% 25.2MB/s ± 0% -0.12% (p=0.031 n=30+30) RegexpMatchEasy0_1K-4 132MB/s ± 5% 132MB/s ± 2% ~ (p=0.171 n=30+30) RegexpMatchEasy1_32-4 24.8MB/s ± 1% 24.9MB/s ± 1% ~ (p=0.106 n=30+30) RegexpMatchEasy1_1K-4 98.4MB/s ± 3% 99.6MB/s ± 4% +1.19% (p=0.011 n=30+30) RegexpMatchMedium_32-4 483kB/s ± 1% 484kB/s ± 1% ~ (p=0.426 n=30+30) RegexpMatchMedium_1K-4 1.93MB/s ± 1% 1.93MB/s ± 0% ~ (p=0.157 n=30+17) RegexpMatchHard_32-4 1.12MB/s ± 1% 1.12MB/s ± 0% +0.33% (p=0.001 n=30+24) RegexpMatchHard_1K-4 1.19MB/s ± 1% 1.19MB/s ± 1% ~ (p=0.290 n=30+30) Revcomp-4 37.8MB/s ± 2% 37.8MB/s ± 1% ~ (p=0.815 n=29+29) Template-4 1.80MB/s ± 1% 1.80MB/s ± 1% ~ (p=0.586 n=30+30) [Geo mean] 6.80MB/s 6.81MB/s +0.25% fixes #20966 Change-Id: Idb5567bbe988c875315b8c98c128957cd474ccc5 Reviewed-on: https://go-review.googlesource.com/64950 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com>
2017-09-20 08:48:34 +00:00
p.Reg = v.Args[0].Reg()
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
cmd/compile: optimize arm's bit operation BFC (Bit Field Clear) was introduced in ARMv7, which can simplify ANDconst and BICconst. And this CL implements that optimization. 1. The total size of pkg/android_arm decreases about 3KB, excluding cmd/compile/. 2. There is no regression in the go1 benchmark result, and some cases (FmtFprintfEmpty-4 and RegexpMatchMedium_32-4) even get slight improvement. name old time/op new time/op delta BinaryTree17-4 25.3s ± 1% 25.2s ± 1% ~ (p=0.072 n=30+29) Fannkuch11-4 13.3s ± 0% 13.3s ± 0% +0.13% (p=0.000 n=30+26) FmtFprintfEmpty-4 407ns ± 0% 394ns ± 0% -3.19% (p=0.000 n=26+28) FmtFprintfString-4 664ns ± 0% 662ns ± 0% -0.22% (p=0.000 n=30+30) FmtFprintfInt-4 712ns ± 0% 706ns ± 0% -0.79% (p=0.000 n=30+30) FmtFprintfIntInt-4 1.06µs ± 0% 1.05µs ± 0% -0.38% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 1.16µs ± 0% 1.16µs ± 0% -0.13% (p=0.000 n=30+29) FmtFprintfFloat-4 2.24µs ± 0% 2.23µs ± 0% -0.51% (p=0.000 n=29+21) FmtManyArgs-4 4.09µs ± 0% 4.06µs ± 0% -0.83% (p=0.000 n=28+30) GobDecode-4 55.0ms ± 5% 55.4ms ± 5% ~ (p=0.307 n=30+30) GobEncode-4 51.2ms ± 1% 51.9ms ± 1% +1.23% (p=0.000 n=29+30) Gzip-4 2.64s ± 0% 2.60s ± 0% -1.35% (p=0.000 n=30+29) Gunzip-4 309ms ± 0% 308ms ± 0% -0.27% (p=0.000 n=30+30) HTTPClientServer-4 1.03ms ± 5% 1.02ms ± 4% ~ (p=0.117 n=30+29) JSONEncode-4 101ms ± 2% 101ms ± 2% ~ (p=0.338 n=29+29) JSONDecode-4 383ms ± 2% 382ms ± 2% ~ (p=0.751 n=26+30) Mandelbrot200-4 18.4ms ± 0% 18.4ms ± 0% -0.10% (p=0.000 n=29+29) GoParse-4 22.6ms ± 0% 22.5ms ± 0% -0.39% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 761ns ± 0% 750ns ± 0% -1.47% (p=0.000 n=26+29) RegexpMatchEasy0_1K-4 4.33µs ± 0% 4.34µs ± 0% +0.27% (p=0.000 n=25+28) RegexpMatchEasy1_32-4 809ns ± 0% 795ns ± 0% -1.74% (p=0.000 n=27+25) RegexpMatchEasy1_1K-4 5.54µs ± 0% 5.53µs ± 0% -0.18% (p=0.000 n=29+29) RegexpMatchMedium_32-4 1.11µs ± 0% 1.08µs ± 0% -2.78% (p=0.000 n=27+29) RegexpMatchMedium_1K-4 255µs ± 0% 255µs ± 0% -0.02% (p=0.029 n=30+30) RegexpMatchHard_32-4 14.7µs ± 0% 14.7µs ± 0% -0.28% (p=0.000 n=30+29) RegexpMatchHard_1K-4 439µs ± 0% 439µs ± 0% ~ (p=0.907 n=23+27) Revcomp-4 41.9ms ± 1% 41.9ms ± 1% ~ (p=0.230 n=28+30) Template-4 522ms ± 1% 528ms ± 1% +1.25% (p=0.000 n=30+30) TimeParse-4 3.34µs ± 0% 3.35µs ± 0% +0.23% (p=0.000 n=30+27) TimeFormat-4 6.06µs ± 0% 6.13µs ± 0% +1.08% (p=0.000 n=29+29) [Geo mean] 384µs 382µs -0.37% name old speed new speed delta GobDecode-4 14.0MB/s ± 5% 13.9MB/s ± 5% ~ (p=0.308 n=30+30) GobEncode-4 15.0MB/s ± 1% 14.8MB/s ± 1% -1.22% (p=0.000 n=29+30) Gzip-4 7.36MB/s ± 0% 7.46MB/s ± 0% +1.35% (p=0.000 n=30+30) Gunzip-4 62.8MB/s ± 0% 63.0MB/s ± 0% +0.27% (p=0.000 n=30+30) JSONEncode-4 19.2MB/s ± 2% 19.2MB/s ± 2% ~ (p=0.312 n=29+29) JSONDecode-4 5.05MB/s ± 3% 5.08MB/s ± 2% ~ (p=0.356 n=29+30) GoParse-4 2.56MB/s ± 0% 2.57MB/s ± 0% +0.39% (p=0.000 n=23+27) RegexpMatchEasy0_32-4 42.0MB/s ± 0% 42.6MB/s ± 0% +1.50% (p=0.000 n=26+28) RegexpMatchEasy0_1K-4 236MB/s ± 0% 236MB/s ± 0% -0.27% (p=0.000 n=25+28) RegexpMatchEasy1_32-4 39.6MB/s ± 0% 40.2MB/s ± 0% +1.73% (p=0.000 n=27+27) RegexpMatchEasy1_1K-4 185MB/s ± 0% 185MB/s ± 0% +0.18% (p=0.000 n=29+29) RegexpMatchMedium_32-4 900kB/s ± 0% 920kB/s ± 0% +2.22% (p=0.000 n=29+29) RegexpMatchMedium_1K-4 4.02MB/s ± 0% 4.02MB/s ± 0% +0.07% (p=0.004 n=30+27) RegexpMatchHard_32-4 2.17MB/s ± 0% 2.18MB/s ± 0% +0.46% (p=0.000 n=30+26) RegexpMatchHard_1K-4 2.33MB/s ± 0% 2.33MB/s ± 0% ~ (all equal) Revcomp-4 60.6MB/s ± 1% 60.7MB/s ± 1% ~ (p=0.207 n=28+30) Template-4 3.72MB/s ± 1% 3.67MB/s ± 1% -1.23% (p=0.000 n=30+30) [Geo mean] 12.9MB/s 12.9MB/s +0.29% Change-Id: I07f497f8bb476c950dc555491d00c9066fb64a4e Reviewed-on: https://go-review.googlesource.com/134232 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-10 08:29:52 +00:00
case ssa.OpARMANDconst, ssa.OpARMBICconst:
// try to optimize ANDconst and BICconst to BFC, which saves bytes and ticks
// BFC is only available on ARMv7, and its result and source are in the same register
if buildcfg.GOARM.Version == 7 && v.Reg() == v.Args[0].Reg() {
cmd/compile: optimize arm's bit operation BFC (Bit Field Clear) was introduced in ARMv7, which can simplify ANDconst and BICconst. And this CL implements that optimization. 1. The total size of pkg/android_arm decreases about 3KB, excluding cmd/compile/. 2. There is no regression in the go1 benchmark result, and some cases (FmtFprintfEmpty-4 and RegexpMatchMedium_32-4) even get slight improvement. name old time/op new time/op delta BinaryTree17-4 25.3s ± 1% 25.2s ± 1% ~ (p=0.072 n=30+29) Fannkuch11-4 13.3s ± 0% 13.3s ± 0% +0.13% (p=0.000 n=30+26) FmtFprintfEmpty-4 407ns ± 0% 394ns ± 0% -3.19% (p=0.000 n=26+28) FmtFprintfString-4 664ns ± 0% 662ns ± 0% -0.22% (p=0.000 n=30+30) FmtFprintfInt-4 712ns ± 0% 706ns ± 0% -0.79% (p=0.000 n=30+30) FmtFprintfIntInt-4 1.06µs ± 0% 1.05µs ± 0% -0.38% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 1.16µs ± 0% 1.16µs ± 0% -0.13% (p=0.000 n=30+29) FmtFprintfFloat-4 2.24µs ± 0% 2.23µs ± 0% -0.51% (p=0.000 n=29+21) FmtManyArgs-4 4.09µs ± 0% 4.06µs ± 0% -0.83% (p=0.000 n=28+30) GobDecode-4 55.0ms ± 5% 55.4ms ± 5% ~ (p=0.307 n=30+30) GobEncode-4 51.2ms ± 1% 51.9ms ± 1% +1.23% (p=0.000 n=29+30) Gzip-4 2.64s ± 0% 2.60s ± 0% -1.35% (p=0.000 n=30+29) Gunzip-4 309ms ± 0% 308ms ± 0% -0.27% (p=0.000 n=30+30) HTTPClientServer-4 1.03ms ± 5% 1.02ms ± 4% ~ (p=0.117 n=30+29) JSONEncode-4 101ms ± 2% 101ms ± 2% ~ (p=0.338 n=29+29) JSONDecode-4 383ms ± 2% 382ms ± 2% ~ (p=0.751 n=26+30) Mandelbrot200-4 18.4ms ± 0% 18.4ms ± 0% -0.10% (p=0.000 n=29+29) GoParse-4 22.6ms ± 0% 22.5ms ± 0% -0.39% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 761ns ± 0% 750ns ± 0% -1.47% (p=0.000 n=26+29) RegexpMatchEasy0_1K-4 4.33µs ± 0% 4.34µs ± 0% +0.27% (p=0.000 n=25+28) RegexpMatchEasy1_32-4 809ns ± 0% 795ns ± 0% -1.74% (p=0.000 n=27+25) RegexpMatchEasy1_1K-4 5.54µs ± 0% 5.53µs ± 0% -0.18% (p=0.000 n=29+29) RegexpMatchMedium_32-4 1.11µs ± 0% 1.08µs ± 0% -2.78% (p=0.000 n=27+29) RegexpMatchMedium_1K-4 255µs ± 0% 255µs ± 0% -0.02% (p=0.029 n=30+30) RegexpMatchHard_32-4 14.7µs ± 0% 14.7µs ± 0% -0.28% (p=0.000 n=30+29) RegexpMatchHard_1K-4 439µs ± 0% 439µs ± 0% ~ (p=0.907 n=23+27) Revcomp-4 41.9ms ± 1% 41.9ms ± 1% ~ (p=0.230 n=28+30) Template-4 522ms ± 1% 528ms ± 1% +1.25% (p=0.000 n=30+30) TimeParse-4 3.34µs ± 0% 3.35µs ± 0% +0.23% (p=0.000 n=30+27) TimeFormat-4 6.06µs ± 0% 6.13µs ± 0% +1.08% (p=0.000 n=29+29) [Geo mean] 384µs 382µs -0.37% name old speed new speed delta GobDecode-4 14.0MB/s ± 5% 13.9MB/s ± 5% ~ (p=0.308 n=30+30) GobEncode-4 15.0MB/s ± 1% 14.8MB/s ± 1% -1.22% (p=0.000 n=29+30) Gzip-4 7.36MB/s ± 0% 7.46MB/s ± 0% +1.35% (p=0.000 n=30+30) Gunzip-4 62.8MB/s ± 0% 63.0MB/s ± 0% +0.27% (p=0.000 n=30+30) JSONEncode-4 19.2MB/s ± 2% 19.2MB/s ± 2% ~ (p=0.312 n=29+29) JSONDecode-4 5.05MB/s ± 3% 5.08MB/s ± 2% ~ (p=0.356 n=29+30) GoParse-4 2.56MB/s ± 0% 2.57MB/s ± 0% +0.39% (p=0.000 n=23+27) RegexpMatchEasy0_32-4 42.0MB/s ± 0% 42.6MB/s ± 0% +1.50% (p=0.000 n=26+28) RegexpMatchEasy0_1K-4 236MB/s ± 0% 236MB/s ± 0% -0.27% (p=0.000 n=25+28) RegexpMatchEasy1_32-4 39.6MB/s ± 0% 40.2MB/s ± 0% +1.73% (p=0.000 n=27+27) RegexpMatchEasy1_1K-4 185MB/s ± 0% 185MB/s ± 0% +0.18% (p=0.000 n=29+29) RegexpMatchMedium_32-4 900kB/s ± 0% 920kB/s ± 0% +2.22% (p=0.000 n=29+29) RegexpMatchMedium_1K-4 4.02MB/s ± 0% 4.02MB/s ± 0% +0.07% (p=0.004 n=30+27) RegexpMatchHard_32-4 2.17MB/s ± 0% 2.18MB/s ± 0% +0.46% (p=0.000 n=30+26) RegexpMatchHard_1K-4 2.33MB/s ± 0% 2.33MB/s ± 0% ~ (all equal) Revcomp-4 60.6MB/s ± 1% 60.7MB/s ± 1% ~ (p=0.207 n=28+30) Template-4 3.72MB/s ± 1% 3.67MB/s ± 1% -1.23% (p=0.000 n=30+30) [Geo mean] 12.9MB/s 12.9MB/s +0.29% Change-Id: I07f497f8bb476c950dc555491d00c9066fb64a4e Reviewed-on: https://go-review.googlesource.com/134232 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-10 08:29:52 +00:00
var val uint32
if v.Op == ssa.OpARMANDconst {
val = ^uint32(v.AuxInt)
} else { // BICconst
val = uint32(v.AuxInt)
}
lsb, width := getBFC(val)
// omit BFC for ARM's imm12
if 8 < width && width < 24 {
p := s.Prog(arm.ABFC)
p.From.Type = obj.TYPE_CONST
p.From.Offset = int64(width)
p.AddRestSourceConst(int64(lsb))
cmd/compile: optimize arm's bit operation BFC (Bit Field Clear) was introduced in ARMv7, which can simplify ANDconst and BICconst. And this CL implements that optimization. 1. The total size of pkg/android_arm decreases about 3KB, excluding cmd/compile/. 2. There is no regression in the go1 benchmark result, and some cases (FmtFprintfEmpty-4 and RegexpMatchMedium_32-4) even get slight improvement. name old time/op new time/op delta BinaryTree17-4 25.3s ± 1% 25.2s ± 1% ~ (p=0.072 n=30+29) Fannkuch11-4 13.3s ± 0% 13.3s ± 0% +0.13% (p=0.000 n=30+26) FmtFprintfEmpty-4 407ns ± 0% 394ns ± 0% -3.19% (p=0.000 n=26+28) FmtFprintfString-4 664ns ± 0% 662ns ± 0% -0.22% (p=0.000 n=30+30) FmtFprintfInt-4 712ns ± 0% 706ns ± 0% -0.79% (p=0.000 n=30+30) FmtFprintfIntInt-4 1.06µs ± 0% 1.05µs ± 0% -0.38% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 1.16µs ± 0% 1.16µs ± 0% -0.13% (p=0.000 n=30+29) FmtFprintfFloat-4 2.24µs ± 0% 2.23µs ± 0% -0.51% (p=0.000 n=29+21) FmtManyArgs-4 4.09µs ± 0% 4.06µs ± 0% -0.83% (p=0.000 n=28+30) GobDecode-4 55.0ms ± 5% 55.4ms ± 5% ~ (p=0.307 n=30+30) GobEncode-4 51.2ms ± 1% 51.9ms ± 1% +1.23% (p=0.000 n=29+30) Gzip-4 2.64s ± 0% 2.60s ± 0% -1.35% (p=0.000 n=30+29) Gunzip-4 309ms ± 0% 308ms ± 0% -0.27% (p=0.000 n=30+30) HTTPClientServer-4 1.03ms ± 5% 1.02ms ± 4% ~ (p=0.117 n=30+29) JSONEncode-4 101ms ± 2% 101ms ± 2% ~ (p=0.338 n=29+29) JSONDecode-4 383ms ± 2% 382ms ± 2% ~ (p=0.751 n=26+30) Mandelbrot200-4 18.4ms ± 0% 18.4ms ± 0% -0.10% (p=0.000 n=29+29) GoParse-4 22.6ms ± 0% 22.5ms ± 0% -0.39% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 761ns ± 0% 750ns ± 0% -1.47% (p=0.000 n=26+29) RegexpMatchEasy0_1K-4 4.33µs ± 0% 4.34µs ± 0% +0.27% (p=0.000 n=25+28) RegexpMatchEasy1_32-4 809ns ± 0% 795ns ± 0% -1.74% (p=0.000 n=27+25) RegexpMatchEasy1_1K-4 5.54µs ± 0% 5.53µs ± 0% -0.18% (p=0.000 n=29+29) RegexpMatchMedium_32-4 1.11µs ± 0% 1.08µs ± 0% -2.78% (p=0.000 n=27+29) RegexpMatchMedium_1K-4 255µs ± 0% 255µs ± 0% -0.02% (p=0.029 n=30+30) RegexpMatchHard_32-4 14.7µs ± 0% 14.7µs ± 0% -0.28% (p=0.000 n=30+29) RegexpMatchHard_1K-4 439µs ± 0% 439µs ± 0% ~ (p=0.907 n=23+27) Revcomp-4 41.9ms ± 1% 41.9ms ± 1% ~ (p=0.230 n=28+30) Template-4 522ms ± 1% 528ms ± 1% +1.25% (p=0.000 n=30+30) TimeParse-4 3.34µs ± 0% 3.35µs ± 0% +0.23% (p=0.000 n=30+27) TimeFormat-4 6.06µs ± 0% 6.13µs ± 0% +1.08% (p=0.000 n=29+29) [Geo mean] 384µs 382µs -0.37% name old speed new speed delta GobDecode-4 14.0MB/s ± 5% 13.9MB/s ± 5% ~ (p=0.308 n=30+30) GobEncode-4 15.0MB/s ± 1% 14.8MB/s ± 1% -1.22% (p=0.000 n=29+30) Gzip-4 7.36MB/s ± 0% 7.46MB/s ± 0% +1.35% (p=0.000 n=30+30) Gunzip-4 62.8MB/s ± 0% 63.0MB/s ± 0% +0.27% (p=0.000 n=30+30) JSONEncode-4 19.2MB/s ± 2% 19.2MB/s ± 2% ~ (p=0.312 n=29+29) JSONDecode-4 5.05MB/s ± 3% 5.08MB/s ± 2% ~ (p=0.356 n=29+30) GoParse-4 2.56MB/s ± 0% 2.57MB/s ± 0% +0.39% (p=0.000 n=23+27) RegexpMatchEasy0_32-4 42.0MB/s ± 0% 42.6MB/s ± 0% +1.50% (p=0.000 n=26+28) RegexpMatchEasy0_1K-4 236MB/s ± 0% 236MB/s ± 0% -0.27% (p=0.000 n=25+28) RegexpMatchEasy1_32-4 39.6MB/s ± 0% 40.2MB/s ± 0% +1.73% (p=0.000 n=27+27) RegexpMatchEasy1_1K-4 185MB/s ± 0% 185MB/s ± 0% +0.18% (p=0.000 n=29+29) RegexpMatchMedium_32-4 900kB/s ± 0% 920kB/s ± 0% +2.22% (p=0.000 n=29+29) RegexpMatchMedium_1K-4 4.02MB/s ± 0% 4.02MB/s ± 0% +0.07% (p=0.004 n=30+27) RegexpMatchHard_32-4 2.17MB/s ± 0% 2.18MB/s ± 0% +0.46% (p=0.000 n=30+26) RegexpMatchHard_1K-4 2.33MB/s ± 0% 2.33MB/s ± 0% ~ (all equal) Revcomp-4 60.6MB/s ± 1% 60.7MB/s ± 1% ~ (p=0.207 n=28+30) Template-4 3.72MB/s ± 1% 3.67MB/s ± 1% -1.23% (p=0.000 n=30+30) [Geo mean] 12.9MB/s 12.9MB/s +0.29% Change-Id: I07f497f8bb476c950dc555491d00c9066fb64a4e Reviewed-on: https://go-review.googlesource.com/134232 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-10 08:29:52 +00:00
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
break
}
}
// fall back to ordinary form
fallthrough
case ssa.OpARMADDconst,
ssa.OpARMADCconst,
ssa.OpARMSUBconst,
ssa.OpARMSBCconst,
ssa.OpARMRSBconst,
ssa.OpARMRSCconst,
ssa.OpARMORconst,
ssa.OpARMXORconst,
ssa.OpARMSLLconst,
ssa.OpARMSRLconst,
ssa.OpARMSRAconst:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
p.From.Offset = v.AuxInt
p.Reg = v.Args[0].Reg()
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpARMADDSconst,
ssa.OpARMSUBSconst,
ssa.OpARMRSBSconst:
p := s.Prog(v.Op.Asm())
p.Scond = arm.C_SBIT
p.From.Type = obj.TYPE_CONST
p.From.Offset = v.AuxInt
p.Reg = v.Args[0].Reg()
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg0()
case ssa.OpARMSRRconst:
genshift(s, v, arm.AMOVW, 0, v.Args[0].Reg(), v.Reg(), arm.SHIFT_RR, v.AuxInt)
case ssa.OpARMADDshiftLL,
ssa.OpARMADCshiftLL,
ssa.OpARMSUBshiftLL,
ssa.OpARMSBCshiftLL,
ssa.OpARMRSBshiftLL,
ssa.OpARMRSCshiftLL,
ssa.OpARMANDshiftLL,
ssa.OpARMORshiftLL,
ssa.OpARMXORshiftLL,
ssa.OpARMBICshiftLL:
genshift(s, v, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Reg(), arm.SHIFT_LL, v.AuxInt)
case ssa.OpARMADDSshiftLL,
ssa.OpARMSUBSshiftLL,
ssa.OpARMRSBSshiftLL:
p := genshift(s, v, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Reg0(), arm.SHIFT_LL, v.AuxInt)
p.Scond = arm.C_SBIT
case ssa.OpARMADDshiftRL,
ssa.OpARMADCshiftRL,
ssa.OpARMSUBshiftRL,
ssa.OpARMSBCshiftRL,
ssa.OpARMRSBshiftRL,
ssa.OpARMRSCshiftRL,
ssa.OpARMANDshiftRL,
ssa.OpARMORshiftRL,
ssa.OpARMXORshiftRL,
ssa.OpARMBICshiftRL:
genshift(s, v, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Reg(), arm.SHIFT_LR, v.AuxInt)
case ssa.OpARMADDSshiftRL,
ssa.OpARMSUBSshiftRL,
ssa.OpARMRSBSshiftRL:
p := genshift(s, v, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Reg0(), arm.SHIFT_LR, v.AuxInt)
p.Scond = arm.C_SBIT
case ssa.OpARMADDshiftRA,
ssa.OpARMADCshiftRA,
ssa.OpARMSUBshiftRA,
ssa.OpARMSBCshiftRA,
ssa.OpARMRSBshiftRA,
ssa.OpARMRSCshiftRA,
ssa.OpARMANDshiftRA,
ssa.OpARMORshiftRA,
ssa.OpARMXORshiftRA,
ssa.OpARMBICshiftRA:
genshift(s, v, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Reg(), arm.SHIFT_AR, v.AuxInt)
case ssa.OpARMADDSshiftRA,
ssa.OpARMSUBSshiftRA,
ssa.OpARMRSBSshiftRA:
p := genshift(s, v, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Reg0(), arm.SHIFT_AR, v.AuxInt)
p.Scond = arm.C_SBIT
case ssa.OpARMXORshiftRR:
genshift(s, v, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Reg(), arm.SHIFT_RR, v.AuxInt)
case ssa.OpARMMVNshiftLL:
genshift(s, v, v.Op.Asm(), 0, v.Args[0].Reg(), v.Reg(), arm.SHIFT_LL, v.AuxInt)
case ssa.OpARMMVNshiftRL:
genshift(s, v, v.Op.Asm(), 0, v.Args[0].Reg(), v.Reg(), arm.SHIFT_LR, v.AuxInt)
case ssa.OpARMMVNshiftRA:
genshift(s, v, v.Op.Asm(), 0, v.Args[0].Reg(), v.Reg(), arm.SHIFT_AR, v.AuxInt)
case ssa.OpARMMVNshiftLLreg:
genregshift(s, v.Op.Asm(), 0, v.Args[0].Reg(), v.Args[1].Reg(), v.Reg(), arm.SHIFT_LL)
case ssa.OpARMMVNshiftRLreg:
genregshift(s, v.Op.Asm(), 0, v.Args[0].Reg(), v.Args[1].Reg(), v.Reg(), arm.SHIFT_LR)
case ssa.OpARMMVNshiftRAreg:
genregshift(s, v.Op.Asm(), 0, v.Args[0].Reg(), v.Args[1].Reg(), v.Reg(), arm.SHIFT_AR)
case ssa.OpARMADDshiftLLreg,
ssa.OpARMADCshiftLLreg,
ssa.OpARMSUBshiftLLreg,
ssa.OpARMSBCshiftLLreg,
ssa.OpARMRSBshiftLLreg,
ssa.OpARMRSCshiftLLreg,
ssa.OpARMANDshiftLLreg,
ssa.OpARMORshiftLLreg,
ssa.OpARMXORshiftLLreg,
ssa.OpARMBICshiftLLreg:
genregshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Args[2].Reg(), v.Reg(), arm.SHIFT_LL)
case ssa.OpARMADDSshiftLLreg,
ssa.OpARMSUBSshiftLLreg,
ssa.OpARMRSBSshiftLLreg:
p := genregshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Args[2].Reg(), v.Reg0(), arm.SHIFT_LL)
p.Scond = arm.C_SBIT
case ssa.OpARMADDshiftRLreg,
ssa.OpARMADCshiftRLreg,
ssa.OpARMSUBshiftRLreg,
ssa.OpARMSBCshiftRLreg,
ssa.OpARMRSBshiftRLreg,
ssa.OpARMRSCshiftRLreg,
ssa.OpARMANDshiftRLreg,
ssa.OpARMORshiftRLreg,
ssa.OpARMXORshiftRLreg,
ssa.OpARMBICshiftRLreg:
genregshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Args[2].Reg(), v.Reg(), arm.SHIFT_LR)
case ssa.OpARMADDSshiftRLreg,
ssa.OpARMSUBSshiftRLreg,
ssa.OpARMRSBSshiftRLreg:
p := genregshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Args[2].Reg(), v.Reg0(), arm.SHIFT_LR)
p.Scond = arm.C_SBIT
case ssa.OpARMADDshiftRAreg,
ssa.OpARMADCshiftRAreg,
ssa.OpARMSUBshiftRAreg,
ssa.OpARMSBCshiftRAreg,
ssa.OpARMRSBshiftRAreg,
ssa.OpARMRSCshiftRAreg,
ssa.OpARMANDshiftRAreg,
ssa.OpARMORshiftRAreg,
ssa.OpARMXORshiftRAreg,
ssa.OpARMBICshiftRAreg:
genregshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Args[2].Reg(), v.Reg(), arm.SHIFT_AR)
case ssa.OpARMADDSshiftRAreg,
ssa.OpARMSUBSshiftRAreg,
ssa.OpARMRSBSshiftRAreg:
p := genregshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Args[2].Reg(), v.Reg0(), arm.SHIFT_AR)
p.Scond = arm.C_SBIT
case ssa.OpARMHMUL,
ssa.OpARMHMULU:
// 32-bit high multiplication
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
p.Reg = v.Args[1].Reg()
p.To.Type = obj.TYPE_REGREG
p.To.Reg = v.Reg()
p.To.Offset = arm.REGTMP // throw away low 32-bit into tmp register
case ssa.OpARMMULLU:
// 32-bit multiplication, results 64-bit, high 32-bit in out0, low 32-bit in out1
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
p.Reg = v.Args[1].Reg()
p.To.Type = obj.TYPE_REGREG
p.To.Reg = v.Reg0() // high 32-bit
p.To.Offset = int64(v.Reg1()) // low 32-bit
cmd/compile: optimize ARM with MULS MULS was introduced in ARMv7 and corresponding to MULA. This patch duplicated all MULA related SSA rules with MULS. Here was the contrast test result against the original go compiler. There was no improvement in total, but big improvement in special cases. 1. A specific test case accelerated 18.62%. (https://github.com/benshi001/ugo1/blob/master/mulsub_test.go) name old time/op new time/op delta MulSub-4 270µs ± 0% 219µs ± 0% -18.62% (p=0.000 n=35+40) 2. Total size of all .a files in pkg/ shrank by 0.002%. 3. The compilecmp benchmark showed no decline. name old time/op new time/op delta Template 2.37s ± 3% 2.36s ± 1% ~ (p=0.233 n=19+18) Unicode 1.32s ± 2% 1.34s ± 5% +1.32% (p=0.011 n=20+18) GoTypes 7.88s ± 1% 7.87s ± 1% ~ (p=0.758 n=20+20) Compiler 37.5s ± 1% 37.6s ± 1% ~ (p=0.194 n=20+19) SSA 83.7s ± 2% 83.5s ± 2% ~ (p=0.569 n=20+19) Flate 1.46s ± 3% 1.45s ± 1% ~ (p=0.619 n=20+17) GoParser 1.87s ± 2% 1.85s ± 1% -0.58% (p=0.048 n=20+18) Reflect 5.10s ± 2% 5.11s ± 2% ~ (p=0.365 n=19+20) Tar 1.78s ± 2% 1.78s ± 2% ~ (p=0.531 n=19+20) XML 2.62s ± 1% 2.61s ± 2% ~ (p=0.057 n=17+19) [Geo mean] 4.68s 4.67s -0.07% name old user-time/op new user-time/op delta Template 2.80s ± 1% 2.79s ± 2% ~ (p=0.686 n=17+20) Unicode 1.61s ± 4% 1.63s ± 6% ~ (p=0.222 n=20+20) GoTypes 9.59s ± 1% 9.60s ± 1% ~ (p=0.482 n=17+20) Compiler 46.1s ± 1% 46.2s ± 1% ~ (p=0.373 n=20+18) SSA 108s ± 1% 108s ± 2% ~ (p=0.784 n=20+20) Flate 1.68s ± 3% 1.69s ± 3% ~ (p=0.335 n=20+19) GoParser 2.20s ± 4% 2.19s ± 2% ~ (p=0.844 n=20+18) Reflect 5.97s ± 3% 6.01s ± 2% ~ (p=0.184 n=20+20) Tar 2.11s ± 2% 2.11s ± 4% ~ (p=0.961 n=19+20) XML 3.07s ± 1% 3.07s ± 3% ~ (p=0.786 n=16+19) [Geo mean] 5.61s 5.62s +0.19% name old text-bytes new text-bytes delta HelloSize 586kB ± 0% 586kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 4. The go1 benchmark showed no decline in total. name old time/op new time/op delta BinaryTree17-4 41.7s ± 1% 41.7s ± 1% ~ (p=0.966 n=40+40) Fannkuch11-4 23.6s ± 0% 23.6s ± 1% -0.23% (p=0.000 n=40+40) FmtFprintfEmpty-4 844ns ± 1% 834ns ± 1% -1.23% (p=0.000 n=40+40) FmtFprintfString-4 1.39µs ± 1% 1.40µs ± 1% +0.71% (p=0.000 n=40+40) FmtFprintfInt-4 1.44µs ± 1% 1.45µs ± 1% +0.70% (p=0.000 n=40+40) FmtFprintfIntInt-4 2.10µs ± 1% 2.10µs ± 1% +0.30% (p=0.000 n=40+40) FmtFprintfPrefixedInt-4 2.49µs ± 0% 2.50µs ± 1% +0.66% (p=0.000 n=32+40) FmtFprintfFloat-4 4.42µs ± 1% 4.46µs ± 2% +0.94% (p=0.000 n=40+40) FmtManyArgs-4 8.31µs ± 1% 8.22µs ± 1% -1.09% (p=0.000 n=40+40) GobDecode-4 105ms ± 1% 102ms ± 1% -2.30% (p=0.000 n=39+39) GobEncode-4 90.2ms ± 1% 88.7ms ± 1% -1.66% (p=0.000 n=40+39) Gzip-4 4.17s ± 1% 4.16s ± 1% ~ (p=0.785 n=40+40) Gunzip-4 608ms ± 1% 608ms ± 1% ~ (p=0.481 n=40+40) HTTPClientServer-4 697µs ± 2% 684µs ± 3% -1.89% (p=0.000 n=37+40) JSONEncode-4 255ms ± 1% 256ms ± 1% +0.35% (p=0.000 n=40+40) JSONDecode-4 920ms ± 1% 926ms ± 1% +0.64% (p=0.000 n=40+39) Mandelbrot200-4 49.3ms ± 1% 49.3ms ± 0% +0.07% (p=0.005 n=40+40) GoParse-4 46.8ms ± 2% 46.7ms ± 1% ~ (p=1.000 n=40+40) RegexpMatchEasy0_32-4 1.27µs ± 0% 1.27µs ± 1% ~ (p=0.057 n=40+40) RegexpMatchEasy0_1K-4 7.97µs ± 7% 7.92µs ± 5% ~ (p=0.094 n=40+40) RegexpMatchEasy1_32-4 1.28µs ± 1% 1.28µs ± 1% ~ (p=0.406 n=40+40) RegexpMatchEasy1_1K-4 10.5µs ± 4% 10.5µs ± 3% ~ (p=0.855 n=40+40) RegexpMatchMedium_32-4 2.04µs ± 0% 2.04µs ± 1% -0.22% (p=0.000 n=39+40) RegexpMatchMedium_1K-4 541µs ± 0% 540µs ± 1% -0.25% (p=0.000 n=40+38) RegexpMatchHard_32-4 29.3µs ± 1% 29.3µs ± 0% ~ (p=0.149 n=40+40) RegexpMatchHard_1K-4 878µs ± 1% 880µs ± 0% +0.14% (p=0.005 n=36+35) Revcomp-4 81.8ms ± 2% 81.4ms ± 2% -0.43% (p=0.015 n=38+39) Template-4 1.05s ± 1% 1.05s ± 1% ~ (p=0.302 n=40+35) TimeParse-4 7.18µs ± 1% 7.26µs ± 1% +1.05% (p=0.000 n=40+36) TimeFormat-4 13.1µs ± 1% 13.1µs ± 1% ~ (p=0.698 n=37+40) [Geo mean] 733µs 732µs -0.16% name old speed new speed delta GobDecode-4 7.34MB/s ± 1% 7.51MB/s ± 1% +2.36% (p=0.000 n=39+39) GobEncode-4 8.51MB/s ± 1% 8.65MB/s ± 1% +1.69% (p=0.000 n=40+39) Gzip-4 4.66MB/s ± 1% 4.66MB/s ± 1% ~ (p=0.783 n=40+40) Gunzip-4 31.9MB/s ± 1% 31.9MB/s ± 1% ~ (p=0.466 n=40+40) JSONEncode-4 7.61MB/s ± 1% 7.58MB/s ± 1% -0.35% (p=0.001 n=40+40) JSONDecode-4 2.11MB/s ± 1% 2.10MB/s ± 1% -0.52% (p=0.000 n=38+39) GoParse-4 1.24MB/s ± 2% 1.24MB/s ± 1% ~ (p=0.556 n=40+39) RegexpMatchEasy0_32-4 25.1MB/s ± 0% 25.1MB/s ± 1% ~ (p=0.064 n=40+40) RegexpMatchEasy0_1K-4 129MB/s ± 8% 129MB/s ± 5% ~ (p=0.094 n=40+40) RegexpMatchEasy1_32-4 25.0MB/s ± 1% 25.1MB/s ± 1% ~ (p=0.331 n=40+40) RegexpMatchEasy1_1K-4 97.7MB/s ± 4% 97.8MB/s ± 3% ~ (p=0.851 n=40+40) RegexpMatchMedium_32-4 490kB/s ± 0% 490kB/s ± 0% ~ (all equal) RegexpMatchMedium_1K-4 1.89MB/s ± 0% 1.90MB/s ± 1% +0.12% (p=0.031 n=40+40) RegexpMatchHard_32-4 1.09MB/s ± 1% 1.09MB/s ± 1% ~ (p=0.597 n=40+40) RegexpMatchHard_1K-4 1.16MB/s ± 1% 1.16MB/s ± 1% ~ (p=0.565 n=40+35) Revcomp-4 31.1MB/s ± 2% 31.2MB/s ± 2% +0.44% (p=0.018 n=38+39) Template-4 1.85MB/s ± 1% 1.85MB/s ± 1% ~ (p=0.873 n=40+40) [Geo mean] 6.66MB/s 6.67MB/s +0.26% Change-Id: Icc972d8a78ea06c32c3aa15733ff0537c82c2dc7 Reviewed-on: https://go-review.googlesource.com/58950 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com>
2017-08-25 12:07:01 +00:00
case ssa.OpARMMULA, ssa.OpARMMULS:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
p.Reg = v.Args[1].Reg()
p.To.Type = obj.TYPE_REGREG2
p.To.Reg = v.Reg() // result
p.To.Offset = int64(v.Args[2].Reg()) // addend
case ssa.OpARMMOVWconst:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
p.From.Offset = v.AuxInt
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpARMMOVFconst,
ssa.OpARMMOVDconst:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_FCONST
p.From.Val = math.Float64frombits(uint64(v.AuxInt))
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpARMCMP,
ssa.OpARMCMN,
ssa.OpARMTST,
ssa.OpARMTEQ,
ssa.OpARMCMPF,
ssa.OpARMCMPD:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
// Special layout in ARM assembly
// Comparing to x86, the operands of ARM's CMP are reversed.
p.From.Reg = v.Args[1].Reg()
p.Reg = v.Args[0].Reg()
case ssa.OpARMCMPconst,
ssa.OpARMCMNconst,
ssa.OpARMTSTconst,
ssa.OpARMTEQconst:
// Special layout in ARM assembly
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
p.From.Offset = v.AuxInt
p.Reg = v.Args[0].Reg()
case ssa.OpARMCMPF0,
ssa.OpARMCMPD0:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
cmd/compile: optimize ARM code with CMN/TST/TEQ CMN/TST/TEQ were supported since ARMv4, which can be used to simplify comparisons. This patch implements the optimization and here are the benchmark results. 1. A special test case got 18.21% improvement. name old time/op new time/op delta TSTTEQ-4 806µs ± 1% 659µs ± 0% -18.21% (p=0.000 n=20+18) (https://github.com/benshi001/ugo1/blob/master/tstteq_test.go) 2. There is no regression in the compilecmp benchmark. name old time/op new time/op delta Template 2.31s ± 1% 2.30s ± 1% ~ (p=0.661 n=10+9) Unicode 1.32s ± 3% 1.32s ± 5% ~ (p=0.280 n=10+10) GoTypes 7.69s ± 1% 7.65s ± 0% -0.52% (p=0.027 n=10+8) Compiler 36.5s ± 1% 36.4s ± 1% ~ (p=0.546 n=9+9) SSA 85.1s ± 2% 84.9s ± 1% ~ (p=0.529 n=10+10) Flate 1.43s ± 2% 1.43s ± 2% ~ (p=0.661 n=10+9) GoParser 1.81s ± 2% 1.81s ± 1% ~ (p=0.796 n=10+10) Reflect 5.10s ± 2% 5.09s ± 1% ~ (p=0.853 n=10+10) Tar 2.47s ± 1% 2.48s ± 1% ~ (p=0.123 n=10+10) XML 2.59s ± 1% 2.58s ± 1% ~ (p=0.853 n=10+10) [Geo mean] 4.78s 4.77s -0.17% name old user-time/op new user-time/op delta Template 2.72s ± 3% 2.73s ± 2% ~ (p=0.928 n=10+10) Unicode 1.58s ± 4% 1.60s ± 1% ~ (p=0.087 n=10+9) GoTypes 9.41s ± 2% 9.36s ± 1% ~ (p=0.060 n=10+10) Compiler 44.4s ± 2% 44.2s ± 2% ~ (p=0.289 n=10+10) SSA 110s ± 2% 110s ± 1% ~ (p=0.739 n=10+10) Flate 1.67s ± 2% 1.63s ± 3% ~ (p=0.063 n=10+10) GoParser 2.12s ± 1% 2.12s ± 2% ~ (p=0.840 n=10+10) Reflect 5.94s ± 1% 5.98s ± 1% ~ (p=0.063 n=9+10) Tar 3.01s ± 2% 3.02s ± 2% ~ (p=0.584 n=10+10) XML 3.04s ± 3% 3.02s ± 2% ~ (p=0.696 n=10+10) [Geo mean] 5.73s 5.72s -0.20% name old text-bytes new text-bytes delta HelloSize 579kB ± 0% 579kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.8kB ± 0% 72.8kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. There is little change in the go1 benchmark (excluding the noise). name old time/op new time/op delta BinaryTree17-4 40.3s ± 1% 40.6s ± 1% +0.80% (p=0.000 n=30+30) Fannkuch11-4 24.2s ± 1% 24.1s ± 0% ~ (p=0.093 n=30+30) FmtFprintfEmpty-4 834ns ± 0% 826ns ± 0% -0.93% (p=0.000 n=29+24) FmtFprintfString-4 1.39µs ± 1% 1.36µs ± 0% -2.02% (p=0.000 n=30+30) FmtFprintfInt-4 1.43µs ± 1% 1.44µs ± 1% ~ (p=0.155 n=30+29) FmtFprintfIntInt-4 2.09µs ± 0% 2.11µs ± 0% +1.16% (p=0.000 n=28+30) FmtFprintfPrefixedInt-4 2.33µs ± 1% 2.36µs ± 0% +1.25% (p=0.000 n=30+30) FmtFprintfFloat-4 4.27µs ± 1% 4.32µs ± 1% +1.27% (p=0.000 n=30+30) FmtManyArgs-4 8.18µs ± 0% 8.14µs ± 0% -0.46% (p=0.000 n=25+27) GobDecode-4 101ms ± 1% 101ms ± 1% ~ (p=0.182 n=29+29) GobEncode-4 89.6ms ± 1% 87.8ms ± 2% -2.02% (p=0.000 n=30+29) Gzip-4 4.07s ± 1% 4.08s ± 1% ~ (p=0.173 n=30+27) Gunzip-4 602ms ± 1% 600ms ± 1% -0.29% (p=0.000 n=29+28) HTTPClientServer-4 679µs ± 4% 683µs ± 3% ~ (p=0.197 n=30+30) JSONEncode-4 241ms ± 1% 239ms ± 1% -0.84% (p=0.000 n=30+30) JSONDecode-4 903ms ± 1% 882ms ± 1% -2.33% (p=0.000 n=30+30) Mandelbrot200-4 41.8ms ± 0% 41.8ms ± 0% ~ (p=0.719 n=30+30) GoParse-4 45.5ms ± 1% 45.8ms ± 1% +0.52% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 1.27µs ± 1% 1.27µs ± 0% -0.60% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 7.77µs ± 6% 7.69µs ± 4% -0.96% (p=0.040 n=30+30) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 10.3µs ± 6% 10.2µs ± 3% ~ (p=0.453 n=30+27) RegexpMatchMedium_32-4 1.98µs ± 1% 2.00µs ± 1% +0.85% (p=0.000 n=30+29) RegexpMatchMedium_1K-4 503µs ± 0% 503µs ± 1% ~ (p=0.752 n=30+30) RegexpMatchHard_32-4 27.1µs ± 1% 26.5µs ± 0% -1.96% (p=0.000 n=30+24) RegexpMatchHard_1K-4 809µs ± 1% 799µs ± 1% -1.29% (p=0.000 n=29+30) Revcomp-4 67.3ms ± 2% 67.2ms ± 1% ~ (p=0.265 n=29+29) Template-4 1.08s ± 1% 1.07s ± 0% -1.39% (p=0.000 n=30+22) TimeParse-4 6.93µs ± 1% 6.96µs ± 1% +0.40% (p=0.005 n=30+30) TimeFormat-4 13.3µs ± 0% 13.3µs ± 1% ~ (p=0.734 n=30+30) [Geo mean] 709µs 707µs -0.32% name old speed new speed delta GobDecode-4 7.59MB/s ± 1% 7.57MB/s ± 1% ~ (p=0.145 n=29+29) GobEncode-4 8.56MB/s ± 1% 8.74MB/s ± 1% +2.07% (p=0.000 n=30+29) Gzip-4 4.76MB/s ± 1% 4.75MB/s ± 1% -0.25% (p=0.037 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.3MB/s ± 1% +0.29% (p=0.000 n=29+28) JSONEncode-4 8.04MB/s ± 1% 8.11MB/s ± 1% +0.85% (p=0.000 n=30+30) JSONDecode-4 2.15MB/s ± 1% 2.20MB/s ± 1% +2.29% (p=0.000 n=30+30) GoParse-4 1.27MB/s ± 1% 1.26MB/s ± 1% -0.73% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 25.1MB/s ± 1% 25.3MB/s ± 0% +0.61% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 131MB/s ± 6% 133MB/s ± 4% +1.35% (p=0.009 n=28+30) RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.0MB/s ± 1% +0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 99.2MB/s ± 6% 100.2MB/s ± 3% ~ (p=0.448 n=30+27) RegexpMatchMedium_32-4 503kB/s ± 1% 500kB/s ± 0% -0.66% (p=0.002 n=30+24) RegexpMatchMedium_1K-4 2.04MB/s ± 0% 2.04MB/s ± 1% ~ (p=0.358 n=30+30) RegexpMatchHard_32-4 1.18MB/s ± 1% 1.20MB/s ± 1% +1.75% (p=0.000 n=30+30) RegexpMatchHard_1K-4 1.26MB/s ± 1% 1.28MB/s ± 1% +1.42% (p=0.000 n=30+30) Revcomp-4 37.8MB/s ± 2% 37.8MB/s ± 1% ~ (p=0.266 n=29+29) Template-4 1.80MB/s ± 1% 1.82MB/s ± 1% +1.46% (p=0.000 n=30+30) [Geo mean] 6.91MB/s 6.96MB/s +0.70% fixes #21583 Change-Id: I24065a80588ccae7de3ad732a3cfb0026cf7e214 Reviewed-on: https://go-review.googlesource.com/67490 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-02 03:09:28 +00:00
case ssa.OpARMCMPshiftLL, ssa.OpARMCMNshiftLL, ssa.OpARMTSTshiftLL, ssa.OpARMTEQshiftLL:
genshift(s, v, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), 0, arm.SHIFT_LL, v.AuxInt)
cmd/compile: optimize ARM code with CMN/TST/TEQ CMN/TST/TEQ were supported since ARMv4, which can be used to simplify comparisons. This patch implements the optimization and here are the benchmark results. 1. A special test case got 18.21% improvement. name old time/op new time/op delta TSTTEQ-4 806µs ± 1% 659µs ± 0% -18.21% (p=0.000 n=20+18) (https://github.com/benshi001/ugo1/blob/master/tstteq_test.go) 2. There is no regression in the compilecmp benchmark. name old time/op new time/op delta Template 2.31s ± 1% 2.30s ± 1% ~ (p=0.661 n=10+9) Unicode 1.32s ± 3% 1.32s ± 5% ~ (p=0.280 n=10+10) GoTypes 7.69s ± 1% 7.65s ± 0% -0.52% (p=0.027 n=10+8) Compiler 36.5s ± 1% 36.4s ± 1% ~ (p=0.546 n=9+9) SSA 85.1s ± 2% 84.9s ± 1% ~ (p=0.529 n=10+10) Flate 1.43s ± 2% 1.43s ± 2% ~ (p=0.661 n=10+9) GoParser 1.81s ± 2% 1.81s ± 1% ~ (p=0.796 n=10+10) Reflect 5.10s ± 2% 5.09s ± 1% ~ (p=0.853 n=10+10) Tar 2.47s ± 1% 2.48s ± 1% ~ (p=0.123 n=10+10) XML 2.59s ± 1% 2.58s ± 1% ~ (p=0.853 n=10+10) [Geo mean] 4.78s 4.77s -0.17% name old user-time/op new user-time/op delta Template 2.72s ± 3% 2.73s ± 2% ~ (p=0.928 n=10+10) Unicode 1.58s ± 4% 1.60s ± 1% ~ (p=0.087 n=10+9) GoTypes 9.41s ± 2% 9.36s ± 1% ~ (p=0.060 n=10+10) Compiler 44.4s ± 2% 44.2s ± 2% ~ (p=0.289 n=10+10) SSA 110s ± 2% 110s ± 1% ~ (p=0.739 n=10+10) Flate 1.67s ± 2% 1.63s ± 3% ~ (p=0.063 n=10+10) GoParser 2.12s ± 1% 2.12s ± 2% ~ (p=0.840 n=10+10) Reflect 5.94s ± 1% 5.98s ± 1% ~ (p=0.063 n=9+10) Tar 3.01s ± 2% 3.02s ± 2% ~ (p=0.584 n=10+10) XML 3.04s ± 3% 3.02s ± 2% ~ (p=0.696 n=10+10) [Geo mean] 5.73s 5.72s -0.20% name old text-bytes new text-bytes delta HelloSize 579kB ± 0% 579kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.8kB ± 0% 72.8kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. There is little change in the go1 benchmark (excluding the noise). name old time/op new time/op delta BinaryTree17-4 40.3s ± 1% 40.6s ± 1% +0.80% (p=0.000 n=30+30) Fannkuch11-4 24.2s ± 1% 24.1s ± 0% ~ (p=0.093 n=30+30) FmtFprintfEmpty-4 834ns ± 0% 826ns ± 0% -0.93% (p=0.000 n=29+24) FmtFprintfString-4 1.39µs ± 1% 1.36µs ± 0% -2.02% (p=0.000 n=30+30) FmtFprintfInt-4 1.43µs ± 1% 1.44µs ± 1% ~ (p=0.155 n=30+29) FmtFprintfIntInt-4 2.09µs ± 0% 2.11µs ± 0% +1.16% (p=0.000 n=28+30) FmtFprintfPrefixedInt-4 2.33µs ± 1% 2.36µs ± 0% +1.25% (p=0.000 n=30+30) FmtFprintfFloat-4 4.27µs ± 1% 4.32µs ± 1% +1.27% (p=0.000 n=30+30) FmtManyArgs-4 8.18µs ± 0% 8.14µs ± 0% -0.46% (p=0.000 n=25+27) GobDecode-4 101ms ± 1% 101ms ± 1% ~ (p=0.182 n=29+29) GobEncode-4 89.6ms ± 1% 87.8ms ± 2% -2.02% (p=0.000 n=30+29) Gzip-4 4.07s ± 1% 4.08s ± 1% ~ (p=0.173 n=30+27) Gunzip-4 602ms ± 1% 600ms ± 1% -0.29% (p=0.000 n=29+28) HTTPClientServer-4 679µs ± 4% 683µs ± 3% ~ (p=0.197 n=30+30) JSONEncode-4 241ms ± 1% 239ms ± 1% -0.84% (p=0.000 n=30+30) JSONDecode-4 903ms ± 1% 882ms ± 1% -2.33% (p=0.000 n=30+30) Mandelbrot200-4 41.8ms ± 0% 41.8ms ± 0% ~ (p=0.719 n=30+30) GoParse-4 45.5ms ± 1% 45.8ms ± 1% +0.52% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 1.27µs ± 1% 1.27µs ± 0% -0.60% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 7.77µs ± 6% 7.69µs ± 4% -0.96% (p=0.040 n=30+30) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 10.3µs ± 6% 10.2µs ± 3% ~ (p=0.453 n=30+27) RegexpMatchMedium_32-4 1.98µs ± 1% 2.00µs ± 1% +0.85% (p=0.000 n=30+29) RegexpMatchMedium_1K-4 503µs ± 0% 503µs ± 1% ~ (p=0.752 n=30+30) RegexpMatchHard_32-4 27.1µs ± 1% 26.5µs ± 0% -1.96% (p=0.000 n=30+24) RegexpMatchHard_1K-4 809µs ± 1% 799µs ± 1% -1.29% (p=0.000 n=29+30) Revcomp-4 67.3ms ± 2% 67.2ms ± 1% ~ (p=0.265 n=29+29) Template-4 1.08s ± 1% 1.07s ± 0% -1.39% (p=0.000 n=30+22) TimeParse-4 6.93µs ± 1% 6.96µs ± 1% +0.40% (p=0.005 n=30+30) TimeFormat-4 13.3µs ± 0% 13.3µs ± 1% ~ (p=0.734 n=30+30) [Geo mean] 709µs 707µs -0.32% name old speed new speed delta GobDecode-4 7.59MB/s ± 1% 7.57MB/s ± 1% ~ (p=0.145 n=29+29) GobEncode-4 8.56MB/s ± 1% 8.74MB/s ± 1% +2.07% (p=0.000 n=30+29) Gzip-4 4.76MB/s ± 1% 4.75MB/s ± 1% -0.25% (p=0.037 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.3MB/s ± 1% +0.29% (p=0.000 n=29+28) JSONEncode-4 8.04MB/s ± 1% 8.11MB/s ± 1% +0.85% (p=0.000 n=30+30) JSONDecode-4 2.15MB/s ± 1% 2.20MB/s ± 1% +2.29% (p=0.000 n=30+30) GoParse-4 1.27MB/s ± 1% 1.26MB/s ± 1% -0.73% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 25.1MB/s ± 1% 25.3MB/s ± 0% +0.61% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 131MB/s ± 6% 133MB/s ± 4% +1.35% (p=0.009 n=28+30) RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.0MB/s ± 1% +0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 99.2MB/s ± 6% 100.2MB/s ± 3% ~ (p=0.448 n=30+27) RegexpMatchMedium_32-4 503kB/s ± 1% 500kB/s ± 0% -0.66% (p=0.002 n=30+24) RegexpMatchMedium_1K-4 2.04MB/s ± 0% 2.04MB/s ± 1% ~ (p=0.358 n=30+30) RegexpMatchHard_32-4 1.18MB/s ± 1% 1.20MB/s ± 1% +1.75% (p=0.000 n=30+30) RegexpMatchHard_1K-4 1.26MB/s ± 1% 1.28MB/s ± 1% +1.42% (p=0.000 n=30+30) Revcomp-4 37.8MB/s ± 2% 37.8MB/s ± 1% ~ (p=0.266 n=29+29) Template-4 1.80MB/s ± 1% 1.82MB/s ± 1% +1.46% (p=0.000 n=30+30) [Geo mean] 6.91MB/s 6.96MB/s +0.70% fixes #21583 Change-Id: I24065a80588ccae7de3ad732a3cfb0026cf7e214 Reviewed-on: https://go-review.googlesource.com/67490 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-02 03:09:28 +00:00
case ssa.OpARMCMPshiftRL, ssa.OpARMCMNshiftRL, ssa.OpARMTSTshiftRL, ssa.OpARMTEQshiftRL:
genshift(s, v, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), 0, arm.SHIFT_LR, v.AuxInt)
cmd/compile: optimize ARM code with CMN/TST/TEQ CMN/TST/TEQ were supported since ARMv4, which can be used to simplify comparisons. This patch implements the optimization and here are the benchmark results. 1. A special test case got 18.21% improvement. name old time/op new time/op delta TSTTEQ-4 806µs ± 1% 659µs ± 0% -18.21% (p=0.000 n=20+18) (https://github.com/benshi001/ugo1/blob/master/tstteq_test.go) 2. There is no regression in the compilecmp benchmark. name old time/op new time/op delta Template 2.31s ± 1% 2.30s ± 1% ~ (p=0.661 n=10+9) Unicode 1.32s ± 3% 1.32s ± 5% ~ (p=0.280 n=10+10) GoTypes 7.69s ± 1% 7.65s ± 0% -0.52% (p=0.027 n=10+8) Compiler 36.5s ± 1% 36.4s ± 1% ~ (p=0.546 n=9+9) SSA 85.1s ± 2% 84.9s ± 1% ~ (p=0.529 n=10+10) Flate 1.43s ± 2% 1.43s ± 2% ~ (p=0.661 n=10+9) GoParser 1.81s ± 2% 1.81s ± 1% ~ (p=0.796 n=10+10) Reflect 5.10s ± 2% 5.09s ± 1% ~ (p=0.853 n=10+10) Tar 2.47s ± 1% 2.48s ± 1% ~ (p=0.123 n=10+10) XML 2.59s ± 1% 2.58s ± 1% ~ (p=0.853 n=10+10) [Geo mean] 4.78s 4.77s -0.17% name old user-time/op new user-time/op delta Template 2.72s ± 3% 2.73s ± 2% ~ (p=0.928 n=10+10) Unicode 1.58s ± 4% 1.60s ± 1% ~ (p=0.087 n=10+9) GoTypes 9.41s ± 2% 9.36s ± 1% ~ (p=0.060 n=10+10) Compiler 44.4s ± 2% 44.2s ± 2% ~ (p=0.289 n=10+10) SSA 110s ± 2% 110s ± 1% ~ (p=0.739 n=10+10) Flate 1.67s ± 2% 1.63s ± 3% ~ (p=0.063 n=10+10) GoParser 2.12s ± 1% 2.12s ± 2% ~ (p=0.840 n=10+10) Reflect 5.94s ± 1% 5.98s ± 1% ~ (p=0.063 n=9+10) Tar 3.01s ± 2% 3.02s ± 2% ~ (p=0.584 n=10+10) XML 3.04s ± 3% 3.02s ± 2% ~ (p=0.696 n=10+10) [Geo mean] 5.73s 5.72s -0.20% name old text-bytes new text-bytes delta HelloSize 579kB ± 0% 579kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.8kB ± 0% 72.8kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. There is little change in the go1 benchmark (excluding the noise). name old time/op new time/op delta BinaryTree17-4 40.3s ± 1% 40.6s ± 1% +0.80% (p=0.000 n=30+30) Fannkuch11-4 24.2s ± 1% 24.1s ± 0% ~ (p=0.093 n=30+30) FmtFprintfEmpty-4 834ns ± 0% 826ns ± 0% -0.93% (p=0.000 n=29+24) FmtFprintfString-4 1.39µs ± 1% 1.36µs ± 0% -2.02% (p=0.000 n=30+30) FmtFprintfInt-4 1.43µs ± 1% 1.44µs ± 1% ~ (p=0.155 n=30+29) FmtFprintfIntInt-4 2.09µs ± 0% 2.11µs ± 0% +1.16% (p=0.000 n=28+30) FmtFprintfPrefixedInt-4 2.33µs ± 1% 2.36µs ± 0% +1.25% (p=0.000 n=30+30) FmtFprintfFloat-4 4.27µs ± 1% 4.32µs ± 1% +1.27% (p=0.000 n=30+30) FmtManyArgs-4 8.18µs ± 0% 8.14µs ± 0% -0.46% (p=0.000 n=25+27) GobDecode-4 101ms ± 1% 101ms ± 1% ~ (p=0.182 n=29+29) GobEncode-4 89.6ms ± 1% 87.8ms ± 2% -2.02% (p=0.000 n=30+29) Gzip-4 4.07s ± 1% 4.08s ± 1% ~ (p=0.173 n=30+27) Gunzip-4 602ms ± 1% 600ms ± 1% -0.29% (p=0.000 n=29+28) HTTPClientServer-4 679µs ± 4% 683µs ± 3% ~ (p=0.197 n=30+30) JSONEncode-4 241ms ± 1% 239ms ± 1% -0.84% (p=0.000 n=30+30) JSONDecode-4 903ms ± 1% 882ms ± 1% -2.33% (p=0.000 n=30+30) Mandelbrot200-4 41.8ms ± 0% 41.8ms ± 0% ~ (p=0.719 n=30+30) GoParse-4 45.5ms ± 1% 45.8ms ± 1% +0.52% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 1.27µs ± 1% 1.27µs ± 0% -0.60% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 7.77µs ± 6% 7.69µs ± 4% -0.96% (p=0.040 n=30+30) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 10.3µs ± 6% 10.2µs ± 3% ~ (p=0.453 n=30+27) RegexpMatchMedium_32-4 1.98µs ± 1% 2.00µs ± 1% +0.85% (p=0.000 n=30+29) RegexpMatchMedium_1K-4 503µs ± 0% 503µs ± 1% ~ (p=0.752 n=30+30) RegexpMatchHard_32-4 27.1µs ± 1% 26.5µs ± 0% -1.96% (p=0.000 n=30+24) RegexpMatchHard_1K-4 809µs ± 1% 799µs ± 1% -1.29% (p=0.000 n=29+30) Revcomp-4 67.3ms ± 2% 67.2ms ± 1% ~ (p=0.265 n=29+29) Template-4 1.08s ± 1% 1.07s ± 0% -1.39% (p=0.000 n=30+22) TimeParse-4 6.93µs ± 1% 6.96µs ± 1% +0.40% (p=0.005 n=30+30) TimeFormat-4 13.3µs ± 0% 13.3µs ± 1% ~ (p=0.734 n=30+30) [Geo mean] 709µs 707µs -0.32% name old speed new speed delta GobDecode-4 7.59MB/s ± 1% 7.57MB/s ± 1% ~ (p=0.145 n=29+29) GobEncode-4 8.56MB/s ± 1% 8.74MB/s ± 1% +2.07% (p=0.000 n=30+29) Gzip-4 4.76MB/s ± 1% 4.75MB/s ± 1% -0.25% (p=0.037 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.3MB/s ± 1% +0.29% (p=0.000 n=29+28) JSONEncode-4 8.04MB/s ± 1% 8.11MB/s ± 1% +0.85% (p=0.000 n=30+30) JSONDecode-4 2.15MB/s ± 1% 2.20MB/s ± 1% +2.29% (p=0.000 n=30+30) GoParse-4 1.27MB/s ± 1% 1.26MB/s ± 1% -0.73% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 25.1MB/s ± 1% 25.3MB/s ± 0% +0.61% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 131MB/s ± 6% 133MB/s ± 4% +1.35% (p=0.009 n=28+30) RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.0MB/s ± 1% +0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 99.2MB/s ± 6% 100.2MB/s ± 3% ~ (p=0.448 n=30+27) RegexpMatchMedium_32-4 503kB/s ± 1% 500kB/s ± 0% -0.66% (p=0.002 n=30+24) RegexpMatchMedium_1K-4 2.04MB/s ± 0% 2.04MB/s ± 1% ~ (p=0.358 n=30+30) RegexpMatchHard_32-4 1.18MB/s ± 1% 1.20MB/s ± 1% +1.75% (p=0.000 n=30+30) RegexpMatchHard_1K-4 1.26MB/s ± 1% 1.28MB/s ± 1% +1.42% (p=0.000 n=30+30) Revcomp-4 37.8MB/s ± 2% 37.8MB/s ± 1% ~ (p=0.266 n=29+29) Template-4 1.80MB/s ± 1% 1.82MB/s ± 1% +1.46% (p=0.000 n=30+30) [Geo mean] 6.91MB/s 6.96MB/s +0.70% fixes #21583 Change-Id: I24065a80588ccae7de3ad732a3cfb0026cf7e214 Reviewed-on: https://go-review.googlesource.com/67490 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-02 03:09:28 +00:00
case ssa.OpARMCMPshiftRA, ssa.OpARMCMNshiftRA, ssa.OpARMTSTshiftRA, ssa.OpARMTEQshiftRA:
genshift(s, v, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), 0, arm.SHIFT_AR, v.AuxInt)
cmd/compile: optimize ARM code with CMN/TST/TEQ CMN/TST/TEQ were supported since ARMv4, which can be used to simplify comparisons. This patch implements the optimization and here are the benchmark results. 1. A special test case got 18.21% improvement. name old time/op new time/op delta TSTTEQ-4 806µs ± 1% 659µs ± 0% -18.21% (p=0.000 n=20+18) (https://github.com/benshi001/ugo1/blob/master/tstteq_test.go) 2. There is no regression in the compilecmp benchmark. name old time/op new time/op delta Template 2.31s ± 1% 2.30s ± 1% ~ (p=0.661 n=10+9) Unicode 1.32s ± 3% 1.32s ± 5% ~ (p=0.280 n=10+10) GoTypes 7.69s ± 1% 7.65s ± 0% -0.52% (p=0.027 n=10+8) Compiler 36.5s ± 1% 36.4s ± 1% ~ (p=0.546 n=9+9) SSA 85.1s ± 2% 84.9s ± 1% ~ (p=0.529 n=10+10) Flate 1.43s ± 2% 1.43s ± 2% ~ (p=0.661 n=10+9) GoParser 1.81s ± 2% 1.81s ± 1% ~ (p=0.796 n=10+10) Reflect 5.10s ± 2% 5.09s ± 1% ~ (p=0.853 n=10+10) Tar 2.47s ± 1% 2.48s ± 1% ~ (p=0.123 n=10+10) XML 2.59s ± 1% 2.58s ± 1% ~ (p=0.853 n=10+10) [Geo mean] 4.78s 4.77s -0.17% name old user-time/op new user-time/op delta Template 2.72s ± 3% 2.73s ± 2% ~ (p=0.928 n=10+10) Unicode 1.58s ± 4% 1.60s ± 1% ~ (p=0.087 n=10+9) GoTypes 9.41s ± 2% 9.36s ± 1% ~ (p=0.060 n=10+10) Compiler 44.4s ± 2% 44.2s ± 2% ~ (p=0.289 n=10+10) SSA 110s ± 2% 110s ± 1% ~ (p=0.739 n=10+10) Flate 1.67s ± 2% 1.63s ± 3% ~ (p=0.063 n=10+10) GoParser 2.12s ± 1% 2.12s ± 2% ~ (p=0.840 n=10+10) Reflect 5.94s ± 1% 5.98s ± 1% ~ (p=0.063 n=9+10) Tar 3.01s ± 2% 3.02s ± 2% ~ (p=0.584 n=10+10) XML 3.04s ± 3% 3.02s ± 2% ~ (p=0.696 n=10+10) [Geo mean] 5.73s 5.72s -0.20% name old text-bytes new text-bytes delta HelloSize 579kB ± 0% 579kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.8kB ± 0% 72.8kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. There is little change in the go1 benchmark (excluding the noise). name old time/op new time/op delta BinaryTree17-4 40.3s ± 1% 40.6s ± 1% +0.80% (p=0.000 n=30+30) Fannkuch11-4 24.2s ± 1% 24.1s ± 0% ~ (p=0.093 n=30+30) FmtFprintfEmpty-4 834ns ± 0% 826ns ± 0% -0.93% (p=0.000 n=29+24) FmtFprintfString-4 1.39µs ± 1% 1.36µs ± 0% -2.02% (p=0.000 n=30+30) FmtFprintfInt-4 1.43µs ± 1% 1.44µs ± 1% ~ (p=0.155 n=30+29) FmtFprintfIntInt-4 2.09µs ± 0% 2.11µs ± 0% +1.16% (p=0.000 n=28+30) FmtFprintfPrefixedInt-4 2.33µs ± 1% 2.36µs ± 0% +1.25% (p=0.000 n=30+30) FmtFprintfFloat-4 4.27µs ± 1% 4.32µs ± 1% +1.27% (p=0.000 n=30+30) FmtManyArgs-4 8.18µs ± 0% 8.14µs ± 0% -0.46% (p=0.000 n=25+27) GobDecode-4 101ms ± 1% 101ms ± 1% ~ (p=0.182 n=29+29) GobEncode-4 89.6ms ± 1% 87.8ms ± 2% -2.02% (p=0.000 n=30+29) Gzip-4 4.07s ± 1% 4.08s ± 1% ~ (p=0.173 n=30+27) Gunzip-4 602ms ± 1% 600ms ± 1% -0.29% (p=0.000 n=29+28) HTTPClientServer-4 679µs ± 4% 683µs ± 3% ~ (p=0.197 n=30+30) JSONEncode-4 241ms ± 1% 239ms ± 1% -0.84% (p=0.000 n=30+30) JSONDecode-4 903ms ± 1% 882ms ± 1% -2.33% (p=0.000 n=30+30) Mandelbrot200-4 41.8ms ± 0% 41.8ms ± 0% ~ (p=0.719 n=30+30) GoParse-4 45.5ms ± 1% 45.8ms ± 1% +0.52% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 1.27µs ± 1% 1.27µs ± 0% -0.60% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 7.77µs ± 6% 7.69µs ± 4% -0.96% (p=0.040 n=30+30) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 10.3µs ± 6% 10.2µs ± 3% ~ (p=0.453 n=30+27) RegexpMatchMedium_32-4 1.98µs ± 1% 2.00µs ± 1% +0.85% (p=0.000 n=30+29) RegexpMatchMedium_1K-4 503µs ± 0% 503µs ± 1% ~ (p=0.752 n=30+30) RegexpMatchHard_32-4 27.1µs ± 1% 26.5µs ± 0% -1.96% (p=0.000 n=30+24) RegexpMatchHard_1K-4 809µs ± 1% 799µs ± 1% -1.29% (p=0.000 n=29+30) Revcomp-4 67.3ms ± 2% 67.2ms ± 1% ~ (p=0.265 n=29+29) Template-4 1.08s ± 1% 1.07s ± 0% -1.39% (p=0.000 n=30+22) TimeParse-4 6.93µs ± 1% 6.96µs ± 1% +0.40% (p=0.005 n=30+30) TimeFormat-4 13.3µs ± 0% 13.3µs ± 1% ~ (p=0.734 n=30+30) [Geo mean] 709µs 707µs -0.32% name old speed new speed delta GobDecode-4 7.59MB/s ± 1% 7.57MB/s ± 1% ~ (p=0.145 n=29+29) GobEncode-4 8.56MB/s ± 1% 8.74MB/s ± 1% +2.07% (p=0.000 n=30+29) Gzip-4 4.76MB/s ± 1% 4.75MB/s ± 1% -0.25% (p=0.037 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.3MB/s ± 1% +0.29% (p=0.000 n=29+28) JSONEncode-4 8.04MB/s ± 1% 8.11MB/s ± 1% +0.85% (p=0.000 n=30+30) JSONDecode-4 2.15MB/s ± 1% 2.20MB/s ± 1% +2.29% (p=0.000 n=30+30) GoParse-4 1.27MB/s ± 1% 1.26MB/s ± 1% -0.73% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 25.1MB/s ± 1% 25.3MB/s ± 0% +0.61% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 131MB/s ± 6% 133MB/s ± 4% +1.35% (p=0.009 n=28+30) RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.0MB/s ± 1% +0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 99.2MB/s ± 6% 100.2MB/s ± 3% ~ (p=0.448 n=30+27) RegexpMatchMedium_32-4 503kB/s ± 1% 500kB/s ± 0% -0.66% (p=0.002 n=30+24) RegexpMatchMedium_1K-4 2.04MB/s ± 0% 2.04MB/s ± 1% ~ (p=0.358 n=30+30) RegexpMatchHard_32-4 1.18MB/s ± 1% 1.20MB/s ± 1% +1.75% (p=0.000 n=30+30) RegexpMatchHard_1K-4 1.26MB/s ± 1% 1.28MB/s ± 1% +1.42% (p=0.000 n=30+30) Revcomp-4 37.8MB/s ± 2% 37.8MB/s ± 1% ~ (p=0.266 n=29+29) Template-4 1.80MB/s ± 1% 1.82MB/s ± 1% +1.46% (p=0.000 n=30+30) [Geo mean] 6.91MB/s 6.96MB/s +0.70% fixes #21583 Change-Id: I24065a80588ccae7de3ad732a3cfb0026cf7e214 Reviewed-on: https://go-review.googlesource.com/67490 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-02 03:09:28 +00:00
case ssa.OpARMCMPshiftLLreg, ssa.OpARMCMNshiftLLreg, ssa.OpARMTSTshiftLLreg, ssa.OpARMTEQshiftLLreg:
genregshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Args[2].Reg(), 0, arm.SHIFT_LL)
cmd/compile: optimize ARM code with CMN/TST/TEQ CMN/TST/TEQ were supported since ARMv4, which can be used to simplify comparisons. This patch implements the optimization and here are the benchmark results. 1. A special test case got 18.21% improvement. name old time/op new time/op delta TSTTEQ-4 806µs ± 1% 659µs ± 0% -18.21% (p=0.000 n=20+18) (https://github.com/benshi001/ugo1/blob/master/tstteq_test.go) 2. There is no regression in the compilecmp benchmark. name old time/op new time/op delta Template 2.31s ± 1% 2.30s ± 1% ~ (p=0.661 n=10+9) Unicode 1.32s ± 3% 1.32s ± 5% ~ (p=0.280 n=10+10) GoTypes 7.69s ± 1% 7.65s ± 0% -0.52% (p=0.027 n=10+8) Compiler 36.5s ± 1% 36.4s ± 1% ~ (p=0.546 n=9+9) SSA 85.1s ± 2% 84.9s ± 1% ~ (p=0.529 n=10+10) Flate 1.43s ± 2% 1.43s ± 2% ~ (p=0.661 n=10+9) GoParser 1.81s ± 2% 1.81s ± 1% ~ (p=0.796 n=10+10) Reflect 5.10s ± 2% 5.09s ± 1% ~ (p=0.853 n=10+10) Tar 2.47s ± 1% 2.48s ± 1% ~ (p=0.123 n=10+10) XML 2.59s ± 1% 2.58s ± 1% ~ (p=0.853 n=10+10) [Geo mean] 4.78s 4.77s -0.17% name old user-time/op new user-time/op delta Template 2.72s ± 3% 2.73s ± 2% ~ (p=0.928 n=10+10) Unicode 1.58s ± 4% 1.60s ± 1% ~ (p=0.087 n=10+9) GoTypes 9.41s ± 2% 9.36s ± 1% ~ (p=0.060 n=10+10) Compiler 44.4s ± 2% 44.2s ± 2% ~ (p=0.289 n=10+10) SSA 110s ± 2% 110s ± 1% ~ (p=0.739 n=10+10) Flate 1.67s ± 2% 1.63s ± 3% ~ (p=0.063 n=10+10) GoParser 2.12s ± 1% 2.12s ± 2% ~ (p=0.840 n=10+10) Reflect 5.94s ± 1% 5.98s ± 1% ~ (p=0.063 n=9+10) Tar 3.01s ± 2% 3.02s ± 2% ~ (p=0.584 n=10+10) XML 3.04s ± 3% 3.02s ± 2% ~ (p=0.696 n=10+10) [Geo mean] 5.73s 5.72s -0.20% name old text-bytes new text-bytes delta HelloSize 579kB ± 0% 579kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.8kB ± 0% 72.8kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. There is little change in the go1 benchmark (excluding the noise). name old time/op new time/op delta BinaryTree17-4 40.3s ± 1% 40.6s ± 1% +0.80% (p=0.000 n=30+30) Fannkuch11-4 24.2s ± 1% 24.1s ± 0% ~ (p=0.093 n=30+30) FmtFprintfEmpty-4 834ns ± 0% 826ns ± 0% -0.93% (p=0.000 n=29+24) FmtFprintfString-4 1.39µs ± 1% 1.36µs ± 0% -2.02% (p=0.000 n=30+30) FmtFprintfInt-4 1.43µs ± 1% 1.44µs ± 1% ~ (p=0.155 n=30+29) FmtFprintfIntInt-4 2.09µs ± 0% 2.11µs ± 0% +1.16% (p=0.000 n=28+30) FmtFprintfPrefixedInt-4 2.33µs ± 1% 2.36µs ± 0% +1.25% (p=0.000 n=30+30) FmtFprintfFloat-4 4.27µs ± 1% 4.32µs ± 1% +1.27% (p=0.000 n=30+30) FmtManyArgs-4 8.18µs ± 0% 8.14µs ± 0% -0.46% (p=0.000 n=25+27) GobDecode-4 101ms ± 1% 101ms ± 1% ~ (p=0.182 n=29+29) GobEncode-4 89.6ms ± 1% 87.8ms ± 2% -2.02% (p=0.000 n=30+29) Gzip-4 4.07s ± 1% 4.08s ± 1% ~ (p=0.173 n=30+27) Gunzip-4 602ms ± 1% 600ms ± 1% -0.29% (p=0.000 n=29+28) HTTPClientServer-4 679µs ± 4% 683µs ± 3% ~ (p=0.197 n=30+30) JSONEncode-4 241ms ± 1% 239ms ± 1% -0.84% (p=0.000 n=30+30) JSONDecode-4 903ms ± 1% 882ms ± 1% -2.33% (p=0.000 n=30+30) Mandelbrot200-4 41.8ms ± 0% 41.8ms ± 0% ~ (p=0.719 n=30+30) GoParse-4 45.5ms ± 1% 45.8ms ± 1% +0.52% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 1.27µs ± 1% 1.27µs ± 0% -0.60% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 7.77µs ± 6% 7.69µs ± 4% -0.96% (p=0.040 n=30+30) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 10.3µs ± 6% 10.2µs ± 3% ~ (p=0.453 n=30+27) RegexpMatchMedium_32-4 1.98µs ± 1% 2.00µs ± 1% +0.85% (p=0.000 n=30+29) RegexpMatchMedium_1K-4 503µs ± 0% 503µs ± 1% ~ (p=0.752 n=30+30) RegexpMatchHard_32-4 27.1µs ± 1% 26.5µs ± 0% -1.96% (p=0.000 n=30+24) RegexpMatchHard_1K-4 809µs ± 1% 799µs ± 1% -1.29% (p=0.000 n=29+30) Revcomp-4 67.3ms ± 2% 67.2ms ± 1% ~ (p=0.265 n=29+29) Template-4 1.08s ± 1% 1.07s ± 0% -1.39% (p=0.000 n=30+22) TimeParse-4 6.93µs ± 1% 6.96µs ± 1% +0.40% (p=0.005 n=30+30) TimeFormat-4 13.3µs ± 0% 13.3µs ± 1% ~ (p=0.734 n=30+30) [Geo mean] 709µs 707µs -0.32% name old speed new speed delta GobDecode-4 7.59MB/s ± 1% 7.57MB/s ± 1% ~ (p=0.145 n=29+29) GobEncode-4 8.56MB/s ± 1% 8.74MB/s ± 1% +2.07% (p=0.000 n=30+29) Gzip-4 4.76MB/s ± 1% 4.75MB/s ± 1% -0.25% (p=0.037 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.3MB/s ± 1% +0.29% (p=0.000 n=29+28) JSONEncode-4 8.04MB/s ± 1% 8.11MB/s ± 1% +0.85% (p=0.000 n=30+30) JSONDecode-4 2.15MB/s ± 1% 2.20MB/s ± 1% +2.29% (p=0.000 n=30+30) GoParse-4 1.27MB/s ± 1% 1.26MB/s ± 1% -0.73% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 25.1MB/s ± 1% 25.3MB/s ± 0% +0.61% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 131MB/s ± 6% 133MB/s ± 4% +1.35% (p=0.009 n=28+30) RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.0MB/s ± 1% +0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 99.2MB/s ± 6% 100.2MB/s ± 3% ~ (p=0.448 n=30+27) RegexpMatchMedium_32-4 503kB/s ± 1% 500kB/s ± 0% -0.66% (p=0.002 n=30+24) RegexpMatchMedium_1K-4 2.04MB/s ± 0% 2.04MB/s ± 1% ~ (p=0.358 n=30+30) RegexpMatchHard_32-4 1.18MB/s ± 1% 1.20MB/s ± 1% +1.75% (p=0.000 n=30+30) RegexpMatchHard_1K-4 1.26MB/s ± 1% 1.28MB/s ± 1% +1.42% (p=0.000 n=30+30) Revcomp-4 37.8MB/s ± 2% 37.8MB/s ± 1% ~ (p=0.266 n=29+29) Template-4 1.80MB/s ± 1% 1.82MB/s ± 1% +1.46% (p=0.000 n=30+30) [Geo mean] 6.91MB/s 6.96MB/s +0.70% fixes #21583 Change-Id: I24065a80588ccae7de3ad732a3cfb0026cf7e214 Reviewed-on: https://go-review.googlesource.com/67490 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-02 03:09:28 +00:00
case ssa.OpARMCMPshiftRLreg, ssa.OpARMCMNshiftRLreg, ssa.OpARMTSTshiftRLreg, ssa.OpARMTEQshiftRLreg:
genregshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Args[2].Reg(), 0, arm.SHIFT_LR)
cmd/compile: optimize ARM code with CMN/TST/TEQ CMN/TST/TEQ were supported since ARMv4, which can be used to simplify comparisons. This patch implements the optimization and here are the benchmark results. 1. A special test case got 18.21% improvement. name old time/op new time/op delta TSTTEQ-4 806µs ± 1% 659µs ± 0% -18.21% (p=0.000 n=20+18) (https://github.com/benshi001/ugo1/blob/master/tstteq_test.go) 2. There is no regression in the compilecmp benchmark. name old time/op new time/op delta Template 2.31s ± 1% 2.30s ± 1% ~ (p=0.661 n=10+9) Unicode 1.32s ± 3% 1.32s ± 5% ~ (p=0.280 n=10+10) GoTypes 7.69s ± 1% 7.65s ± 0% -0.52% (p=0.027 n=10+8) Compiler 36.5s ± 1% 36.4s ± 1% ~ (p=0.546 n=9+9) SSA 85.1s ± 2% 84.9s ± 1% ~ (p=0.529 n=10+10) Flate 1.43s ± 2% 1.43s ± 2% ~ (p=0.661 n=10+9) GoParser 1.81s ± 2% 1.81s ± 1% ~ (p=0.796 n=10+10) Reflect 5.10s ± 2% 5.09s ± 1% ~ (p=0.853 n=10+10) Tar 2.47s ± 1% 2.48s ± 1% ~ (p=0.123 n=10+10) XML 2.59s ± 1% 2.58s ± 1% ~ (p=0.853 n=10+10) [Geo mean] 4.78s 4.77s -0.17% name old user-time/op new user-time/op delta Template 2.72s ± 3% 2.73s ± 2% ~ (p=0.928 n=10+10) Unicode 1.58s ± 4% 1.60s ± 1% ~ (p=0.087 n=10+9) GoTypes 9.41s ± 2% 9.36s ± 1% ~ (p=0.060 n=10+10) Compiler 44.4s ± 2% 44.2s ± 2% ~ (p=0.289 n=10+10) SSA 110s ± 2% 110s ± 1% ~ (p=0.739 n=10+10) Flate 1.67s ± 2% 1.63s ± 3% ~ (p=0.063 n=10+10) GoParser 2.12s ± 1% 2.12s ± 2% ~ (p=0.840 n=10+10) Reflect 5.94s ± 1% 5.98s ± 1% ~ (p=0.063 n=9+10) Tar 3.01s ± 2% 3.02s ± 2% ~ (p=0.584 n=10+10) XML 3.04s ± 3% 3.02s ± 2% ~ (p=0.696 n=10+10) [Geo mean] 5.73s 5.72s -0.20% name old text-bytes new text-bytes delta HelloSize 579kB ± 0% 579kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.8kB ± 0% 72.8kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. There is little change in the go1 benchmark (excluding the noise). name old time/op new time/op delta BinaryTree17-4 40.3s ± 1% 40.6s ± 1% +0.80% (p=0.000 n=30+30) Fannkuch11-4 24.2s ± 1% 24.1s ± 0% ~ (p=0.093 n=30+30) FmtFprintfEmpty-4 834ns ± 0% 826ns ± 0% -0.93% (p=0.000 n=29+24) FmtFprintfString-4 1.39µs ± 1% 1.36µs ± 0% -2.02% (p=0.000 n=30+30) FmtFprintfInt-4 1.43µs ± 1% 1.44µs ± 1% ~ (p=0.155 n=30+29) FmtFprintfIntInt-4 2.09µs ± 0% 2.11µs ± 0% +1.16% (p=0.000 n=28+30) FmtFprintfPrefixedInt-4 2.33µs ± 1% 2.36µs ± 0% +1.25% (p=0.000 n=30+30) FmtFprintfFloat-4 4.27µs ± 1% 4.32µs ± 1% +1.27% (p=0.000 n=30+30) FmtManyArgs-4 8.18µs ± 0% 8.14µs ± 0% -0.46% (p=0.000 n=25+27) GobDecode-4 101ms ± 1% 101ms ± 1% ~ (p=0.182 n=29+29) GobEncode-4 89.6ms ± 1% 87.8ms ± 2% -2.02% (p=0.000 n=30+29) Gzip-4 4.07s ± 1% 4.08s ± 1% ~ (p=0.173 n=30+27) Gunzip-4 602ms ± 1% 600ms ± 1% -0.29% (p=0.000 n=29+28) HTTPClientServer-4 679µs ± 4% 683µs ± 3% ~ (p=0.197 n=30+30) JSONEncode-4 241ms ± 1% 239ms ± 1% -0.84% (p=0.000 n=30+30) JSONDecode-4 903ms ± 1% 882ms ± 1% -2.33% (p=0.000 n=30+30) Mandelbrot200-4 41.8ms ± 0% 41.8ms ± 0% ~ (p=0.719 n=30+30) GoParse-4 45.5ms ± 1% 45.8ms ± 1% +0.52% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 1.27µs ± 1% 1.27µs ± 0% -0.60% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 7.77µs ± 6% 7.69µs ± 4% -0.96% (p=0.040 n=30+30) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 10.3µs ± 6% 10.2µs ± 3% ~ (p=0.453 n=30+27) RegexpMatchMedium_32-4 1.98µs ± 1% 2.00µs ± 1% +0.85% (p=0.000 n=30+29) RegexpMatchMedium_1K-4 503µs ± 0% 503µs ± 1% ~ (p=0.752 n=30+30) RegexpMatchHard_32-4 27.1µs ± 1% 26.5µs ± 0% -1.96% (p=0.000 n=30+24) RegexpMatchHard_1K-4 809µs ± 1% 799µs ± 1% -1.29% (p=0.000 n=29+30) Revcomp-4 67.3ms ± 2% 67.2ms ± 1% ~ (p=0.265 n=29+29) Template-4 1.08s ± 1% 1.07s ± 0% -1.39% (p=0.000 n=30+22) TimeParse-4 6.93µs ± 1% 6.96µs ± 1% +0.40% (p=0.005 n=30+30) TimeFormat-4 13.3µs ± 0% 13.3µs ± 1% ~ (p=0.734 n=30+30) [Geo mean] 709µs 707µs -0.32% name old speed new speed delta GobDecode-4 7.59MB/s ± 1% 7.57MB/s ± 1% ~ (p=0.145 n=29+29) GobEncode-4 8.56MB/s ± 1% 8.74MB/s ± 1% +2.07% (p=0.000 n=30+29) Gzip-4 4.76MB/s ± 1% 4.75MB/s ± 1% -0.25% (p=0.037 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.3MB/s ± 1% +0.29% (p=0.000 n=29+28) JSONEncode-4 8.04MB/s ± 1% 8.11MB/s ± 1% +0.85% (p=0.000 n=30+30) JSONDecode-4 2.15MB/s ± 1% 2.20MB/s ± 1% +2.29% (p=0.000 n=30+30) GoParse-4 1.27MB/s ± 1% 1.26MB/s ± 1% -0.73% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 25.1MB/s ± 1% 25.3MB/s ± 0% +0.61% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 131MB/s ± 6% 133MB/s ± 4% +1.35% (p=0.009 n=28+30) RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.0MB/s ± 1% +0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 99.2MB/s ± 6% 100.2MB/s ± 3% ~ (p=0.448 n=30+27) RegexpMatchMedium_32-4 503kB/s ± 1% 500kB/s ± 0% -0.66% (p=0.002 n=30+24) RegexpMatchMedium_1K-4 2.04MB/s ± 0% 2.04MB/s ± 1% ~ (p=0.358 n=30+30) RegexpMatchHard_32-4 1.18MB/s ± 1% 1.20MB/s ± 1% +1.75% (p=0.000 n=30+30) RegexpMatchHard_1K-4 1.26MB/s ± 1% 1.28MB/s ± 1% +1.42% (p=0.000 n=30+30) Revcomp-4 37.8MB/s ± 2% 37.8MB/s ± 1% ~ (p=0.266 n=29+29) Template-4 1.80MB/s ± 1% 1.82MB/s ± 1% +1.46% (p=0.000 n=30+30) [Geo mean] 6.91MB/s 6.96MB/s +0.70% fixes #21583 Change-Id: I24065a80588ccae7de3ad732a3cfb0026cf7e214 Reviewed-on: https://go-review.googlesource.com/67490 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-02 03:09:28 +00:00
case ssa.OpARMCMPshiftRAreg, ssa.OpARMCMNshiftRAreg, ssa.OpARMTSTshiftRAreg, ssa.OpARMTEQshiftRAreg:
genregshift(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg(), v.Args[2].Reg(), 0, arm.SHIFT_AR)
case ssa.OpARMMOVWaddr:
p := s.Prog(arm.AMOVW)
p.From.Type = obj.TYPE_ADDR
p.From.Reg = v.Args[0].Reg()
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
var wantreg string
// MOVW $sym+off(base), R
// the assembler expands it as the following:
// - base is SP: add constant offset to SP (R13)
// when constant is large, tmp register (R11) may be used
// - base is SB: load external address from constant pool (use relocation)
switch v.Aux.(type) {
default:
v.Fatalf("aux is of unknown type %T", v.Aux)
case *obj.LSym:
wantreg = "SB"
[dev.regabi] cmd/compile: split out package ssagen [generated] [git-generate] cd src/cmd/compile/internal/gc rf ' # maxOpenDefers is declared in ssa.go but used only by walk. mv maxOpenDefers walk.go # gc.Arch -> ssagen.Arch # It is not as nice but will do for now. mv Arch ArchInfo mv thearch Arch mv Arch ArchInfo arch.go # Pull dwarf out of pgen.go. mv debuginfo declPos createDwarfVars preInliningDcls \ createSimpleVars createSimpleVar \ createComplexVars createComplexVar \ dwarf.go # Pull high-level compilation out of pgen.go, # leaving only the SSA code. mv compilequeue funccompile compile compilenow \ compileFunctions isInlinableButNotInlined \ initLSym \ compile.go mv BoundsCheckFunc GCWriteBarrierReg ssa.go mv largeStack largeStackFrames CheckLargeStacks pgen.go # All that is left in dcl.go is the nowritebarrierrecCheck mv dcl.go nowb.go # Export API and unexport non-API. mv initssaconfig InitConfig mv isIntrinsicCall IsIntrinsicCall mv ssaDumpInline DumpInline mv initSSATables InitTables mv initSSAEnv InitEnv mv compileSSA Compile mv stackOffset StackOffset mv canSSAType TypeOK mv SSAGenState State mv FwdRefAux fwdRefAux mv cgoSymABIs CgoSymABIs mv readSymABIs ReadSymABIs mv initLSym InitLSym mv useABIWrapGen symabiDefs CgoSymABIs ReadSymABIs InitLSym selectLSym makeABIWrapper setupTextLSym abi.go mv arch.go abi.go nowb.go phi.go pgen.go pgen_test.go ssa.go cmd/compile/internal/ssagen ' rm go.go gsubr.go Change-Id: I47fad6cbf1d1e583fd9139003a08401d7cd048a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/279476 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:57:10 -05:00
ssagen.AddAux(&p.From, v)
case *ir.Name:
wantreg = "SP"
[dev.regabi] cmd/compile: split out package ssagen [generated] [git-generate] cd src/cmd/compile/internal/gc rf ' # maxOpenDefers is declared in ssa.go but used only by walk. mv maxOpenDefers walk.go # gc.Arch -> ssagen.Arch # It is not as nice but will do for now. mv Arch ArchInfo mv thearch Arch mv Arch ArchInfo arch.go # Pull dwarf out of pgen.go. mv debuginfo declPos createDwarfVars preInliningDcls \ createSimpleVars createSimpleVar \ createComplexVars createComplexVar \ dwarf.go # Pull high-level compilation out of pgen.go, # leaving only the SSA code. mv compilequeue funccompile compile compilenow \ compileFunctions isInlinableButNotInlined \ initLSym \ compile.go mv BoundsCheckFunc GCWriteBarrierReg ssa.go mv largeStack largeStackFrames CheckLargeStacks pgen.go # All that is left in dcl.go is the nowritebarrierrecCheck mv dcl.go nowb.go # Export API and unexport non-API. mv initssaconfig InitConfig mv isIntrinsicCall IsIntrinsicCall mv ssaDumpInline DumpInline mv initSSATables InitTables mv initSSAEnv InitEnv mv compileSSA Compile mv stackOffset StackOffset mv canSSAType TypeOK mv SSAGenState State mv FwdRefAux fwdRefAux mv cgoSymABIs CgoSymABIs mv readSymABIs ReadSymABIs mv initLSym InitLSym mv useABIWrapGen symabiDefs CgoSymABIs ReadSymABIs InitLSym selectLSym makeABIWrapper setupTextLSym abi.go mv arch.go abi.go nowb.go phi.go pgen.go pgen_test.go ssa.go cmd/compile/internal/ssagen ' rm go.go gsubr.go Change-Id: I47fad6cbf1d1e583fd9139003a08401d7cd048a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/279476 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:57:10 -05:00
ssagen.AddAux(&p.From, v)
case nil:
// No sym, just MOVW $off(SP), R
wantreg = "SP"
p.From.Offset = v.AuxInt
}
if reg := v.Args[0].RegName(); reg != wantreg {
v.Fatalf("bad reg %s for symbol type %T, want %s", reg, v.Aux, wantreg)
}
case ssa.OpARMMOVBload,
ssa.OpARMMOVBUload,
ssa.OpARMMOVHload,
ssa.OpARMMOVHUload,
ssa.OpARMMOVWload,
ssa.OpARMMOVFload,
ssa.OpARMMOVDload:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_MEM
p.From.Reg = v.Args[0].Reg()
[dev.regabi] cmd/compile: split out package ssagen [generated] [git-generate] cd src/cmd/compile/internal/gc rf ' # maxOpenDefers is declared in ssa.go but used only by walk. mv maxOpenDefers walk.go # gc.Arch -> ssagen.Arch # It is not as nice but will do for now. mv Arch ArchInfo mv thearch Arch mv Arch ArchInfo arch.go # Pull dwarf out of pgen.go. mv debuginfo declPos createDwarfVars preInliningDcls \ createSimpleVars createSimpleVar \ createComplexVars createComplexVar \ dwarf.go # Pull high-level compilation out of pgen.go, # leaving only the SSA code. mv compilequeue funccompile compile compilenow \ compileFunctions isInlinableButNotInlined \ initLSym \ compile.go mv BoundsCheckFunc GCWriteBarrierReg ssa.go mv largeStack largeStackFrames CheckLargeStacks pgen.go # All that is left in dcl.go is the nowritebarrierrecCheck mv dcl.go nowb.go # Export API and unexport non-API. mv initssaconfig InitConfig mv isIntrinsicCall IsIntrinsicCall mv ssaDumpInline DumpInline mv initSSATables InitTables mv initSSAEnv InitEnv mv compileSSA Compile mv stackOffset StackOffset mv canSSAType TypeOK mv SSAGenState State mv FwdRefAux fwdRefAux mv cgoSymABIs CgoSymABIs mv readSymABIs ReadSymABIs mv initLSym InitLSym mv useABIWrapGen symabiDefs CgoSymABIs ReadSymABIs InitLSym selectLSym makeABIWrapper setupTextLSym abi.go mv arch.go abi.go nowb.go phi.go pgen.go pgen_test.go ssa.go cmd/compile/internal/ssagen ' rm go.go gsubr.go Change-Id: I47fad6cbf1d1e583fd9139003a08401d7cd048a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/279476 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:57:10 -05:00
ssagen.AddAux(&p.From, v)
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpARMMOVBstore,
ssa.OpARMMOVHstore,
ssa.OpARMMOVWstore,
ssa.OpARMMOVFstore,
ssa.OpARMMOVDstore:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[1].Reg()
p.To.Type = obj.TYPE_MEM
p.To.Reg = v.Args[0].Reg()
[dev.regabi] cmd/compile: split out package ssagen [generated] [git-generate] cd src/cmd/compile/internal/gc rf ' # maxOpenDefers is declared in ssa.go but used only by walk. mv maxOpenDefers walk.go # gc.Arch -> ssagen.Arch # It is not as nice but will do for now. mv Arch ArchInfo mv thearch Arch mv Arch ArchInfo arch.go # Pull dwarf out of pgen.go. mv debuginfo declPos createDwarfVars preInliningDcls \ createSimpleVars createSimpleVar \ createComplexVars createComplexVar \ dwarf.go # Pull high-level compilation out of pgen.go, # leaving only the SSA code. mv compilequeue funccompile compile compilenow \ compileFunctions isInlinableButNotInlined \ initLSym \ compile.go mv BoundsCheckFunc GCWriteBarrierReg ssa.go mv largeStack largeStackFrames CheckLargeStacks pgen.go # All that is left in dcl.go is the nowritebarrierrecCheck mv dcl.go nowb.go # Export API and unexport non-API. mv initssaconfig InitConfig mv isIntrinsicCall IsIntrinsicCall mv ssaDumpInline DumpInline mv initSSATables InitTables mv initSSAEnv InitEnv mv compileSSA Compile mv stackOffset StackOffset mv canSSAType TypeOK mv SSAGenState State mv FwdRefAux fwdRefAux mv cgoSymABIs CgoSymABIs mv readSymABIs ReadSymABIs mv initLSym InitLSym mv useABIWrapGen symabiDefs CgoSymABIs ReadSymABIs InitLSym selectLSym makeABIWrapper setupTextLSym abi.go mv arch.go abi.go nowb.go phi.go pgen.go pgen_test.go ssa.go cmd/compile/internal/ssagen ' rm go.go gsubr.go Change-Id: I47fad6cbf1d1e583fd9139003a08401d7cd048a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/279476 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:57:10 -05:00
ssagen.AddAux(&p.To, v)
cmd/compile: optimize ARM with more efficient MOVB/MOVBU/MOVH/MOVHU Like the indexed MOVW (MOVWloadidx/MOVWstoreidx) used in current ARM backend, the indexed MOVB/MOVBU/MOVH/MOVHU can also be used to generate further optimized ARM code. My patch implements this optimization. Here are some contrast test results against the original go compiler. 1. The total size of all .a files in pkg/ shrinks by 0.03%. 2. The compilecmp benchmark shows a little decline. name old time/op new time/op delta Template 2.35s ± 1% 2.37s ± 3% +0.94% (p=0.006 n=19+19) Unicode 1.33s ± 3% 1.33s ± 2% ~ (p=0.158 n=20+18) GoTypes 7.86s ± 2% 7.84s ± 1% ~ (p=0.284 n=19+18) Compiler 37.5s ± 1% 37.7s ± 2% ~ (p=0.101 n=20+19) SSA 83.4s ± 2% 83.6s ± 2% ~ (p=0.231 n=20+20) Flate 1.46s ± 2% 1.45s ± 1% ~ (p=0.097 n=20+17) GoParser 1.86s ± 2% 1.86s ± 4% ~ (p=0.738 n=20+20) Reflect 5.10s ± 1% 5.11s ± 1% ~ (p=0.290 n=20+18) Tar 1.78s ± 2% 1.77s ± 2% ~ (p=0.166 n=19+20) XML 2.61s ± 2% 2.61s ± 2% ~ (p=0.665 n=19+19) [Geo mean] 4.67s 4.68s +0.16% name old user-time/op new user-time/op delta Template 2.79s ± 3% 2.80s ± 2% ~ (p=0.662 n=20+20) Unicode 1.62s ± 3% 1.64s ± 4% ~ (p=0.252 n=20+20) GoTypes 9.58s ± 2% 9.62s ± 2% ~ (p=0.250 n=20+20) Compiler 46.2s ± 1% 46.2s ± 1% ~ (p=0.602 n=20+19) SSA 108s ± 1% 108s ± 2% ~ (p=0.242 n=18+20) Flate 1.69s ± 3% 1.69s ± 4% ~ (p=0.470 n=20+20) GoParser 2.16s ± 3% 2.20s ± 4% +1.70% (p=0.005 n=19+20) Reflect 6.02s ± 2% 6.02s ± 2% ~ (p=0.700 n=20+17) Tar 2.11s ± 2% 2.11s ± 3% ~ (p=0.480 n=18+20) XML 3.07s ± 2% 3.11s ± 4% +1.50% (p=0.043 n=20+20) [Geo mean] 5.61s 5.64s +0.55% name old text-bytes new text-bytes delta HelloSize 586kB ± 0% 586kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. The go1 benchmark shows improvement totally, and even more than 10% improvement in the test case Revcomp. name old time/op new time/op delta BinaryTree17-4 42.0s ± 1% 41.5s ± 1% -1.32% (p=0.000 n=39+40) Fannkuch11-4 24.1s ± 1% 23.6s ± 0% -2.38% (p=0.000 n=40+40) FmtFprintfEmpty-4 843ns ± 0% 839ns ± 1% -0.46% (p=0.000 n=33+40) FmtFprintfString-4 1.44µs ± 1% 1.37µs ± 1% -5.48% (p=0.000 n=40+35) FmtFprintfInt-4 1.44µs ± 1% 1.41µs ± 2% -1.50% (p=0.000 n=40+40) FmtFprintfIntInt-4 2.07µs ± 1% 2.06µs ± 0% -0.78% (p=0.000 n=40+40) FmtFprintfPrefixedInt-4 2.50µs ± 1% 2.33µs ± 1% -6.85% (p=0.000 n=40+40) FmtFprintfFloat-4 4.36µs ± 1% 4.34µs ± 0% -0.39% (p=0.017 n=40+40) FmtManyArgs-4 8.11µs ± 0% 8.00µs ± 0% -1.37% (p=0.000 n=40+40) GobDecode-4 105ms ± 2% 103ms ± 2% -2.17% (p=0.000 n=39+39) GobEncode-4 90.1ms ± 2% 88.6ms ± 1% -1.67% (p=0.000 n=40+39) Gzip-4 4.18s ± 1% 4.09s ± 1% -2.03% (p=0.000 n=40+40) Gunzip-4 608ms ± 1% 603ms ± 1% -0.86% (p=0.000 n=40+34) HTTPClientServer-4 674µs ± 3% 661µs ± 2% -1.82% (p=0.000 n=40+39) JSONEncode-4 256ms ± 1% 243ms ± 0% -5.11% (p=0.000 n=39+31) JSONDecode-4 915ms ± 1% 904ms ± 1% -1.18% (p=0.000 n=40+36) Mandelbrot200-4 49.2ms ± 0% 49.3ms ± 0% ~ (p=0.254 n=34+40) GoParse-4 46.9ms ± 2% 46.9ms ± 1% ~ (p=0.737 n=40+39) RegexpMatchEasy0_32-4 1.28µs ± 1% 1.27µs ± 1% -0.71% (p=0.000 n=40+40) RegexpMatchEasy0_1K-4 7.86µs ± 4% 7.67µs ± 4% -2.46% (p=0.000 n=38+40) RegexpMatchEasy1_32-4 1.28µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=40+40) RegexpMatchEasy1_1K-4 10.4µs ± 2% 10.3µs ± 2% -0.88% (p=0.003 n=40+39) RegexpMatchMedium_32-4 2.05µs ± 0% 2.04µs ± 0% -0.34% (p=0.000 n=40+33) RegexpMatchMedium_1K-4 541µs ± 1% 535µs ± 1% -1.02% (p=0.000 n=40+38) RegexpMatchHard_32-4 29.3µs ± 1% 29.1µs ± 1% -0.51% (p=0.000 n=40+40) RegexpMatchHard_1K-4 881µs ± 1% 871µs ± 1% -1.15% (p=0.000 n=40+40) Revcomp-4 81.7ms ± 2% 67.5ms ± 2% -17.37% (p=0.000 n=39+39) Template-4 1.05s ± 1% 1.08s ± 2% +3.67% (p=0.000 n=40+40) TimeParse-4 7.24µs ± 1% 7.09µs ± 1% -2.13% (p=0.000 n=40+40) TimeFormat-4 13.2µs ± 1% 13.1µs ± 0% -0.31% (p=0.007 n=40+31) [Geo mean] 733µs 718µs -2.03% name old speed new speed delta GobDecode-4 7.28MB/s ± 2% 7.44MB/s ± 2% +2.23% (p=0.000 n=39+39) GobEncode-4 8.52MB/s ± 2% 8.67MB/s ± 1% +1.70% (p=0.000 n=40+39) Gzip-4 4.65MB/s ± 1% 4.74MB/s ± 1% +1.94% (p=0.000 n=37+40) Gunzip-4 31.9MB/s ± 1% 32.2MB/s ± 1% +0.90% (p=0.000 n=40+36) JSONEncode-4 7.57MB/s ± 1% 7.98MB/s ± 0% +5.41% (p=0.000 n=40+31) JSONDecode-4 2.12MB/s ± 1% 2.15MB/s ± 1% +1.23% (p=0.000 n=40+40) GoParse-4 1.23MB/s ± 1% 1.23MB/s ± 1% ~ (p=0.769 n=39+40) RegexpMatchEasy0_32-4 25.0MB/s ± 1% 25.2MB/s ± 1% +0.71% (p=0.000 n=40+40) RegexpMatchEasy0_1K-4 130MB/s ± 5% 134MB/s ± 4% +2.53% (p=0.000 n=38+40) RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.1MB/s ± 1% +0.55% (p=0.000 n=40+40) RegexpMatchEasy1_1K-4 98.5MB/s ± 2% 99.4MB/s ± 2% +0.88% (p=0.003 n=40+39) RegexpMatchMedium_32-4 490kB/s ± 0% 490kB/s ± 0% ~ (all equal) RegexpMatchMedium_1K-4 1.89MB/s ± 1% 1.91MB/s ± 1% +1.02% (p=0.000 n=40+38) RegexpMatchHard_32-4 1.10MB/s ± 1% 1.10MB/s ± 0% +0.41% (p=0.000 n=40+33) RegexpMatchHard_1K-4 1.16MB/s ± 1% 1.17MB/s ± 1% +1.21% (p=0.000 n=40+40) Revcomp-4 31.1MB/s ± 2% 37.6MB/s ± 2% +21.03% (p=0.000 n=39+39) Template-4 1.86MB/s ± 1% 1.79MB/s ± 1% -3.51% (p=0.000 n=40+38) [Geo mean] 6.66MB/s 6.80MB/s +2.13% fixes #21492 Change-Id: Ia26e7ca393f0a5f31de240e8ff9a220453ca7e0d Reviewed-on: https://go-review.googlesource.com/58450 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-24 10:51:34 +00:00
case ssa.OpARMMOVWloadidx, ssa.OpARMMOVBUloadidx, ssa.OpARMMOVBloadidx, ssa.OpARMMOVHUloadidx, ssa.OpARMMOVHloadidx:
// this is just shift 0 bits
fallthrough
case ssa.OpARMMOVWloadshiftLL:
p := genshift(s, v, v.Op.Asm(), 0, v.Args[1].Reg(), v.Reg(), arm.SHIFT_LL, v.AuxInt)
p.From.Reg = v.Args[0].Reg()
case ssa.OpARMMOVWloadshiftRL:
p := genshift(s, v, v.Op.Asm(), 0, v.Args[1].Reg(), v.Reg(), arm.SHIFT_LR, v.AuxInt)
p.From.Reg = v.Args[0].Reg()
case ssa.OpARMMOVWloadshiftRA:
p := genshift(s, v, v.Op.Asm(), 0, v.Args[1].Reg(), v.Reg(), arm.SHIFT_AR, v.AuxInt)
p.From.Reg = v.Args[0].Reg()
cmd/compile: optimize ARM with more efficient MOVB/MOVBU/MOVH/MOVHU Like the indexed MOVW (MOVWloadidx/MOVWstoreidx) used in current ARM backend, the indexed MOVB/MOVBU/MOVH/MOVHU can also be used to generate further optimized ARM code. My patch implements this optimization. Here are some contrast test results against the original go compiler. 1. The total size of all .a files in pkg/ shrinks by 0.03%. 2. The compilecmp benchmark shows a little decline. name old time/op new time/op delta Template 2.35s ± 1% 2.37s ± 3% +0.94% (p=0.006 n=19+19) Unicode 1.33s ± 3% 1.33s ± 2% ~ (p=0.158 n=20+18) GoTypes 7.86s ± 2% 7.84s ± 1% ~ (p=0.284 n=19+18) Compiler 37.5s ± 1% 37.7s ± 2% ~ (p=0.101 n=20+19) SSA 83.4s ± 2% 83.6s ± 2% ~ (p=0.231 n=20+20) Flate 1.46s ± 2% 1.45s ± 1% ~ (p=0.097 n=20+17) GoParser 1.86s ± 2% 1.86s ± 4% ~ (p=0.738 n=20+20) Reflect 5.10s ± 1% 5.11s ± 1% ~ (p=0.290 n=20+18) Tar 1.78s ± 2% 1.77s ± 2% ~ (p=0.166 n=19+20) XML 2.61s ± 2% 2.61s ± 2% ~ (p=0.665 n=19+19) [Geo mean] 4.67s 4.68s +0.16% name old user-time/op new user-time/op delta Template 2.79s ± 3% 2.80s ± 2% ~ (p=0.662 n=20+20) Unicode 1.62s ± 3% 1.64s ± 4% ~ (p=0.252 n=20+20) GoTypes 9.58s ± 2% 9.62s ± 2% ~ (p=0.250 n=20+20) Compiler 46.2s ± 1% 46.2s ± 1% ~ (p=0.602 n=20+19) SSA 108s ± 1% 108s ± 2% ~ (p=0.242 n=18+20) Flate 1.69s ± 3% 1.69s ± 4% ~ (p=0.470 n=20+20) GoParser 2.16s ± 3% 2.20s ± 4% +1.70% (p=0.005 n=19+20) Reflect 6.02s ± 2% 6.02s ± 2% ~ (p=0.700 n=20+17) Tar 2.11s ± 2% 2.11s ± 3% ~ (p=0.480 n=18+20) XML 3.07s ± 2% 3.11s ± 4% +1.50% (p=0.043 n=20+20) [Geo mean] 5.61s 5.64s +0.55% name old text-bytes new text-bytes delta HelloSize 586kB ± 0% 586kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. The go1 benchmark shows improvement totally, and even more than 10% improvement in the test case Revcomp. name old time/op new time/op delta BinaryTree17-4 42.0s ± 1% 41.5s ± 1% -1.32% (p=0.000 n=39+40) Fannkuch11-4 24.1s ± 1% 23.6s ± 0% -2.38% (p=0.000 n=40+40) FmtFprintfEmpty-4 843ns ± 0% 839ns ± 1% -0.46% (p=0.000 n=33+40) FmtFprintfString-4 1.44µs ± 1% 1.37µs ± 1% -5.48% (p=0.000 n=40+35) FmtFprintfInt-4 1.44µs ± 1% 1.41µs ± 2% -1.50% (p=0.000 n=40+40) FmtFprintfIntInt-4 2.07µs ± 1% 2.06µs ± 0% -0.78% (p=0.000 n=40+40) FmtFprintfPrefixedInt-4 2.50µs ± 1% 2.33µs ± 1% -6.85% (p=0.000 n=40+40) FmtFprintfFloat-4 4.36µs ± 1% 4.34µs ± 0% -0.39% (p=0.017 n=40+40) FmtManyArgs-4 8.11µs ± 0% 8.00µs ± 0% -1.37% (p=0.000 n=40+40) GobDecode-4 105ms ± 2% 103ms ± 2% -2.17% (p=0.000 n=39+39) GobEncode-4 90.1ms ± 2% 88.6ms ± 1% -1.67% (p=0.000 n=40+39) Gzip-4 4.18s ± 1% 4.09s ± 1% -2.03% (p=0.000 n=40+40) Gunzip-4 608ms ± 1% 603ms ± 1% -0.86% (p=0.000 n=40+34) HTTPClientServer-4 674µs ± 3% 661µs ± 2% -1.82% (p=0.000 n=40+39) JSONEncode-4 256ms ± 1% 243ms ± 0% -5.11% (p=0.000 n=39+31) JSONDecode-4 915ms ± 1% 904ms ± 1% -1.18% (p=0.000 n=40+36) Mandelbrot200-4 49.2ms ± 0% 49.3ms ± 0% ~ (p=0.254 n=34+40) GoParse-4 46.9ms ± 2% 46.9ms ± 1% ~ (p=0.737 n=40+39) RegexpMatchEasy0_32-4 1.28µs ± 1% 1.27µs ± 1% -0.71% (p=0.000 n=40+40) RegexpMatchEasy0_1K-4 7.86µs ± 4% 7.67µs ± 4% -2.46% (p=0.000 n=38+40) RegexpMatchEasy1_32-4 1.28µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=40+40) RegexpMatchEasy1_1K-4 10.4µs ± 2% 10.3µs ± 2% -0.88% (p=0.003 n=40+39) RegexpMatchMedium_32-4 2.05µs ± 0% 2.04µs ± 0% -0.34% (p=0.000 n=40+33) RegexpMatchMedium_1K-4 541µs ± 1% 535µs ± 1% -1.02% (p=0.000 n=40+38) RegexpMatchHard_32-4 29.3µs ± 1% 29.1µs ± 1% -0.51% (p=0.000 n=40+40) RegexpMatchHard_1K-4 881µs ± 1% 871µs ± 1% -1.15% (p=0.000 n=40+40) Revcomp-4 81.7ms ± 2% 67.5ms ± 2% -17.37% (p=0.000 n=39+39) Template-4 1.05s ± 1% 1.08s ± 2% +3.67% (p=0.000 n=40+40) TimeParse-4 7.24µs ± 1% 7.09µs ± 1% -2.13% (p=0.000 n=40+40) TimeFormat-4 13.2µs ± 1% 13.1µs ± 0% -0.31% (p=0.007 n=40+31) [Geo mean] 733µs 718µs -2.03% name old speed new speed delta GobDecode-4 7.28MB/s ± 2% 7.44MB/s ± 2% +2.23% (p=0.000 n=39+39) GobEncode-4 8.52MB/s ± 2% 8.67MB/s ± 1% +1.70% (p=0.000 n=40+39) Gzip-4 4.65MB/s ± 1% 4.74MB/s ± 1% +1.94% (p=0.000 n=37+40) Gunzip-4 31.9MB/s ± 1% 32.2MB/s ± 1% +0.90% (p=0.000 n=40+36) JSONEncode-4 7.57MB/s ± 1% 7.98MB/s ± 0% +5.41% (p=0.000 n=40+31) JSONDecode-4 2.12MB/s ± 1% 2.15MB/s ± 1% +1.23% (p=0.000 n=40+40) GoParse-4 1.23MB/s ± 1% 1.23MB/s ± 1% ~ (p=0.769 n=39+40) RegexpMatchEasy0_32-4 25.0MB/s ± 1% 25.2MB/s ± 1% +0.71% (p=0.000 n=40+40) RegexpMatchEasy0_1K-4 130MB/s ± 5% 134MB/s ± 4% +2.53% (p=0.000 n=38+40) RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.1MB/s ± 1% +0.55% (p=0.000 n=40+40) RegexpMatchEasy1_1K-4 98.5MB/s ± 2% 99.4MB/s ± 2% +0.88% (p=0.003 n=40+39) RegexpMatchMedium_32-4 490kB/s ± 0% 490kB/s ± 0% ~ (all equal) RegexpMatchMedium_1K-4 1.89MB/s ± 1% 1.91MB/s ± 1% +1.02% (p=0.000 n=40+38) RegexpMatchHard_32-4 1.10MB/s ± 1% 1.10MB/s ± 0% +0.41% (p=0.000 n=40+33) RegexpMatchHard_1K-4 1.16MB/s ± 1% 1.17MB/s ± 1% +1.21% (p=0.000 n=40+40) Revcomp-4 31.1MB/s ± 2% 37.6MB/s ± 2% +21.03% (p=0.000 n=39+39) Template-4 1.86MB/s ± 1% 1.79MB/s ± 1% -3.51% (p=0.000 n=40+38) [Geo mean] 6.66MB/s 6.80MB/s +2.13% fixes #21492 Change-Id: Ia26e7ca393f0a5f31de240e8ff9a220453ca7e0d Reviewed-on: https://go-review.googlesource.com/58450 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-24 10:51:34 +00:00
case ssa.OpARMMOVWstoreidx, ssa.OpARMMOVBstoreidx, ssa.OpARMMOVHstoreidx:
// this is just shift 0 bits
fallthrough
case ssa.OpARMMOVWstoreshiftLL:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[2].Reg()
p.To.Type = obj.TYPE_SHIFT
p.To.Reg = v.Args[0].Reg()
p.To.Offset = int64(makeshift(v, v.Args[1].Reg(), arm.SHIFT_LL, v.AuxInt))
case ssa.OpARMMOVWstoreshiftRL:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[2].Reg()
p.To.Type = obj.TYPE_SHIFT
p.To.Reg = v.Args[0].Reg()
p.To.Offset = int64(makeshift(v, v.Args[1].Reg(), arm.SHIFT_LR, v.AuxInt))
case ssa.OpARMMOVWstoreshiftRA:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[2].Reg()
p.To.Type = obj.TYPE_SHIFT
p.To.Reg = v.Args[0].Reg()
p.To.Offset = int64(makeshift(v, v.Args[1].Reg(), arm.SHIFT_AR, v.AuxInt))
case ssa.OpARMMOVBreg,
ssa.OpARMMOVBUreg,
ssa.OpARMMOVHreg,
ssa.OpARMMOVHUreg:
a := v.Args[0]
for a.Op == ssa.OpCopy || a.Op == ssa.OpARMMOVWreg || a.Op == ssa.OpARMMOVWnop {
a = a.Args[0]
}
if a.Op == ssa.OpLoadReg {
t := a.Type
switch {
case v.Op == ssa.OpARMMOVBreg && t.Size() == 1 && t.IsSigned(),
v.Op == ssa.OpARMMOVBUreg && t.Size() == 1 && !t.IsSigned(),
v.Op == ssa.OpARMMOVHreg && t.Size() == 2 && t.IsSigned(),
v.Op == ssa.OpARMMOVHUreg && t.Size() == 2 && !t.IsSigned():
// arg is a proper-typed load, already zero/sign-extended, don't extend again
if v.Reg() == v.Args[0].Reg() {
return
}
p := s.Prog(arm.AMOVW)
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
return
default:
}
}
if buildcfg.GOARM.Version >= 6 {
cmd/compile: optimize MOVBS/MOVBU/MOVHS/MOVHU on ARMv6 and ARMv7 MOVBS/MOVBU/MOVHS/MOVHU can be optimized with a single instruction on ARMv6 and ARMv7, instead of a pair of left/right shifts. The benchmark tests show big improvement in special cases and a little improvement in total. 1. A special case gets about 29% improvement. name old time/op new time/op delta TypePro-4 3.81ms ± 1% 2.71ms ± 1% -28.97% (p=0.000 n=26+25) The source code of this case can be found at https://github.com/benshi001/ugo1/blob/master/typepromotion_test.go 2. There is a little improvement in the go1 benchmark, excluding the noise. name old time/op new time/op delta BinaryTree17-4 42.1s ± 3% 42.1s ± 2% ~ (p=0.883 n=28+30) Fannkuch11-4 24.3s ± 4% 24.7s ± 7% +1.64% (p=0.026 n=30+30) FmtFprintfEmpty-4 833ns ± 2% 835ns ± 2% ~ (p=0.371 n=26+28) FmtFprintfString-4 1.36µs ± 3% 1.35µs ± 1% ~ (p=0.202 n=26+23) FmtFprintfInt-4 1.42µs ± 3% 1.43µs ± 1% +0.66% (p=0.000 n=26+27) FmtFprintfIntInt-4 2.10µs ± 1% 2.10µs ± 2% ~ (p=0.104 n=25+26) FmtFprintfPrefixedInt-4 2.37µs ± 2% 2.33µs ± 1% -1.75% (p=0.000 n=25+28) FmtFprintfFloat-4 4.50µs ± 0% 4.37µs ± 1% -2.81% (p=0.000 n=23+25) FmtManyArgs-4 8.08µs ± 0% 8.13µs ± 3% ~ (p=0.160 n=23+26) GobDecode-4 102ms ± 4% 103ms ± 4% +1.08% (p=0.001 n=28+26) GobEncode-4 96.0ms ± 2% 95.2ms ± 3% -0.81% (p=0.000 n=24+25) Gzip-4 4.17s ± 3% 4.11s ± 2% -1.45% (p=0.000 n=25+25) Gunzip-4 597ms ± 2% 594ms ± 2% -0.57% (p=0.000 n=24+26) HTTPClientServer-4 708µs ± 4% 708µs ± 4% ~ (p=0.852 n=28+28) JSONEncode-4 241ms ± 1% 245ms ± 3% +1.62% (p=0.000 n=27+28) JSONDecode-4 906ms ± 3% 889ms ± 3% -1.85% (p=0.000 n=23+24) Mandelbrot200-4 41.8ms ± 1% 41.8ms ± 1% ~ (p=0.929 n=25+24) GoParse-4 47.1ms ± 2% 45.3ms ± 4% -3.80% (p=0.000 n=28+24) RegexpMatchEasy0_32-4 1.27µs ± 2% 1.28µs ± 1% +0.77% (p=0.000 n=26+28) RegexpMatchEasy0_1K-4 8.08µs ± 9% 7.83µs ±10% -3.10% (p=0.012 n=26+26) RegexpMatchEasy1_32-4 1.29µs ± 5% 1.29µs ± 2% ~ (p=0.301 n=26+29) RegexpMatchEasy1_1K-4 10.5µs ± 4% 10.3µs ± 5% -1.95% (p=0.003 n=26+26) RegexpMatchMedium_32-4 1.94µs ± 1% 1.95µs ± 1% ~ (p=0.251 n=24+27) RegexpMatchMedium_1K-4 502µs ± 2% 502µs ± 2% ~ (p=0.336 n=25+28) RegexpMatchHard_32-4 26.7µs ± 1% 26.6µs ± 3% ~ (p=0.454 n=27+26) RegexpMatchHard_1K-4 801µs ± 3% 799µs ± 2% ~ (p=0.097 n=24+26) Revcomp-4 73.5ms ± 5% 73.2ms ± 3% ~ (p=0.240 n=26+26) Template-4 1.07s ± 2% 1.05s ± 1% -2.39% (p=0.000 n=26+24) TimeParse-4 6.87µs ± 1% 6.85µs ± 1% ~ (p=0.094 n=28+23) TimeFormat-4 13.4µs ± 1% 13.4µs ± 1% ~ (p=0.664 n=25+29) [Geo mean] 717µs 713µs -0.54% name old speed new speed delta GobDecode-4 7.52MB/s ± 4% 7.44MB/s ± 4% -1.10% (p=0.001 n=28+26) GobEncode-4 7.99MB/s ± 2% 8.06MB/s ± 3% +0.81% (p=0.000 n=24+25) Gzip-4 4.66MB/s ± 3% 4.72MB/s ± 2% +1.43% (p=0.000 n=25+25) Gunzip-4 32.5MB/s ± 2% 32.7MB/s ± 2% +0.56% (p=0.001 n=24+26) JSONEncode-4 8.04MB/s ± 1% 7.92MB/s ± 3% -1.59% (p=0.000 n=27+28) JSONDecode-4 2.14MB/s ± 3% 2.18MB/s ± 3% +1.90% (p=0.000 n=23+24) GoParse-4 1.23MB/s ± 3% 1.28MB/s ± 4% +4.23% (p=0.000 n=30+24) RegexpMatchEasy0_32-4 25.2MB/s ± 2% 25.0MB/s ± 1% -0.76% (p=0.000 n=26+28) RegexpMatchEasy0_1K-4 127MB/s ± 8% 131MB/s ± 9% +3.29% (p=0.012 n=26+26) RegexpMatchEasy1_32-4 24.8MB/s ± 5% 24.8MB/s ± 2% ~ (p=0.339 n=26+29) RegexpMatchEasy1_1K-4 97.9MB/s ± 4% 99.8MB/s ± 5% +1.98% (p=0.004 n=26+26) RegexpMatchMedium_32-4 514kB/s ± 3% 515kB/s ± 3% ~ (p=0.391 n=28+28) RegexpMatchMedium_1K-4 2.04MB/s ± 2% 2.04MB/s ± 2% ~ (p=0.517 n=25+28) RegexpMatchHard_32-4 1.20MB/s ± 3% 1.20MB/s ± 3% ~ (p=0.203 n=28+28) RegexpMatchHard_1K-4 1.28MB/s ± 3% 1.28MB/s ± 2% ~ (p=0.499 n=24+26) Revcomp-4 34.6MB/s ± 4% 34.7MB/s ± 3% ~ (p=0.245 n=26+26) Template-4 1.81MB/s ± 2% 1.85MB/s ± 3% +2.30% (p=0.000 n=26+25) [Geo mean] 6.82MB/s 6.88MB/s +0.84% fixes #20653 Change-Id: Ief0d6e726e517e51ae511325b21ee72598e759ff Reviewed-on: https://go-review.googlesource.com/71992 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-20 03:50:15 +00:00
// generate more efficient "MOVB/MOVBU/MOVH/MOVHU Reg@>0, Reg" on ARMv6 & ARMv7
genshift(s, v, v.Op.Asm(), 0, v.Args[0].Reg(), v.Reg(), arm.SHIFT_RR, 0)
cmd/compile: optimize MOVBS/MOVBU/MOVHS/MOVHU on ARMv6 and ARMv7 MOVBS/MOVBU/MOVHS/MOVHU can be optimized with a single instruction on ARMv6 and ARMv7, instead of a pair of left/right shifts. The benchmark tests show big improvement in special cases and a little improvement in total. 1. A special case gets about 29% improvement. name old time/op new time/op delta TypePro-4 3.81ms ± 1% 2.71ms ± 1% -28.97% (p=0.000 n=26+25) The source code of this case can be found at https://github.com/benshi001/ugo1/blob/master/typepromotion_test.go 2. There is a little improvement in the go1 benchmark, excluding the noise. name old time/op new time/op delta BinaryTree17-4 42.1s ± 3% 42.1s ± 2% ~ (p=0.883 n=28+30) Fannkuch11-4 24.3s ± 4% 24.7s ± 7% +1.64% (p=0.026 n=30+30) FmtFprintfEmpty-4 833ns ± 2% 835ns ± 2% ~ (p=0.371 n=26+28) FmtFprintfString-4 1.36µs ± 3% 1.35µs ± 1% ~ (p=0.202 n=26+23) FmtFprintfInt-4 1.42µs ± 3% 1.43µs ± 1% +0.66% (p=0.000 n=26+27) FmtFprintfIntInt-4 2.10µs ± 1% 2.10µs ± 2% ~ (p=0.104 n=25+26) FmtFprintfPrefixedInt-4 2.37µs ± 2% 2.33µs ± 1% -1.75% (p=0.000 n=25+28) FmtFprintfFloat-4 4.50µs ± 0% 4.37µs ± 1% -2.81% (p=0.000 n=23+25) FmtManyArgs-4 8.08µs ± 0% 8.13µs ± 3% ~ (p=0.160 n=23+26) GobDecode-4 102ms ± 4% 103ms ± 4% +1.08% (p=0.001 n=28+26) GobEncode-4 96.0ms ± 2% 95.2ms ± 3% -0.81% (p=0.000 n=24+25) Gzip-4 4.17s ± 3% 4.11s ± 2% -1.45% (p=0.000 n=25+25) Gunzip-4 597ms ± 2% 594ms ± 2% -0.57% (p=0.000 n=24+26) HTTPClientServer-4 708µs ± 4% 708µs ± 4% ~ (p=0.852 n=28+28) JSONEncode-4 241ms ± 1% 245ms ± 3% +1.62% (p=0.000 n=27+28) JSONDecode-4 906ms ± 3% 889ms ± 3% -1.85% (p=0.000 n=23+24) Mandelbrot200-4 41.8ms ± 1% 41.8ms ± 1% ~ (p=0.929 n=25+24) GoParse-4 47.1ms ± 2% 45.3ms ± 4% -3.80% (p=0.000 n=28+24) RegexpMatchEasy0_32-4 1.27µs ± 2% 1.28µs ± 1% +0.77% (p=0.000 n=26+28) RegexpMatchEasy0_1K-4 8.08µs ± 9% 7.83µs ±10% -3.10% (p=0.012 n=26+26) RegexpMatchEasy1_32-4 1.29µs ± 5% 1.29µs ± 2% ~ (p=0.301 n=26+29) RegexpMatchEasy1_1K-4 10.5µs ± 4% 10.3µs ± 5% -1.95% (p=0.003 n=26+26) RegexpMatchMedium_32-4 1.94µs ± 1% 1.95µs ± 1% ~ (p=0.251 n=24+27) RegexpMatchMedium_1K-4 502µs ± 2% 502µs ± 2% ~ (p=0.336 n=25+28) RegexpMatchHard_32-4 26.7µs ± 1% 26.6µs ± 3% ~ (p=0.454 n=27+26) RegexpMatchHard_1K-4 801µs ± 3% 799µs ± 2% ~ (p=0.097 n=24+26) Revcomp-4 73.5ms ± 5% 73.2ms ± 3% ~ (p=0.240 n=26+26) Template-4 1.07s ± 2% 1.05s ± 1% -2.39% (p=0.000 n=26+24) TimeParse-4 6.87µs ± 1% 6.85µs ± 1% ~ (p=0.094 n=28+23) TimeFormat-4 13.4µs ± 1% 13.4µs ± 1% ~ (p=0.664 n=25+29) [Geo mean] 717µs 713µs -0.54% name old speed new speed delta GobDecode-4 7.52MB/s ± 4% 7.44MB/s ± 4% -1.10% (p=0.001 n=28+26) GobEncode-4 7.99MB/s ± 2% 8.06MB/s ± 3% +0.81% (p=0.000 n=24+25) Gzip-4 4.66MB/s ± 3% 4.72MB/s ± 2% +1.43% (p=0.000 n=25+25) Gunzip-4 32.5MB/s ± 2% 32.7MB/s ± 2% +0.56% (p=0.001 n=24+26) JSONEncode-4 8.04MB/s ± 1% 7.92MB/s ± 3% -1.59% (p=0.000 n=27+28) JSONDecode-4 2.14MB/s ± 3% 2.18MB/s ± 3% +1.90% (p=0.000 n=23+24) GoParse-4 1.23MB/s ± 3% 1.28MB/s ± 4% +4.23% (p=0.000 n=30+24) RegexpMatchEasy0_32-4 25.2MB/s ± 2% 25.0MB/s ± 1% -0.76% (p=0.000 n=26+28) RegexpMatchEasy0_1K-4 127MB/s ± 8% 131MB/s ± 9% +3.29% (p=0.012 n=26+26) RegexpMatchEasy1_32-4 24.8MB/s ± 5% 24.8MB/s ± 2% ~ (p=0.339 n=26+29) RegexpMatchEasy1_1K-4 97.9MB/s ± 4% 99.8MB/s ± 5% +1.98% (p=0.004 n=26+26) RegexpMatchMedium_32-4 514kB/s ± 3% 515kB/s ± 3% ~ (p=0.391 n=28+28) RegexpMatchMedium_1K-4 2.04MB/s ± 2% 2.04MB/s ± 2% ~ (p=0.517 n=25+28) RegexpMatchHard_32-4 1.20MB/s ± 3% 1.20MB/s ± 3% ~ (p=0.203 n=28+28) RegexpMatchHard_1K-4 1.28MB/s ± 3% 1.28MB/s ± 2% ~ (p=0.499 n=24+26) Revcomp-4 34.6MB/s ± 4% 34.7MB/s ± 3% ~ (p=0.245 n=26+26) Template-4 1.81MB/s ± 2% 1.85MB/s ± 3% +2.30% (p=0.000 n=26+25) [Geo mean] 6.82MB/s 6.88MB/s +0.84% fixes #20653 Change-Id: Ief0d6e726e517e51ae511325b21ee72598e759ff Reviewed-on: https://go-review.googlesource.com/71992 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-20 03:50:15 +00:00
return
}
fallthrough
case ssa.OpARMMVN,
ssa.OpARMCLZ,
ssa.OpARMREV,
ssa.OpARMREV16,
ssa.OpARMRBIT,
ssa.OpARMSQRTF,
ssa.OpARMSQRTD,
ssa.OpARMNEGF,
ssa.OpARMNEGD,
cmd/compile: optimize ARM's math.Abs This CL optimizes math.Abs to an inline ABSD instruction on ARM. The benchmark results of src/math/ show big improvements. name old time/op new time/op delta Acos-4 181ns ± 0% 182ns ± 0% +0.30% (p=0.000 n=40+40) Acosh-4 202ns ± 0% 202ns ± 0% ~ (all equal) Asin-4 163ns ± 0% 163ns ± 0% ~ (all equal) Asinh-4 242ns ± 0% 242ns ± 0% ~ (all equal) Atan-4 120ns ± 0% 121ns ± 0% +0.83% (p=0.000 n=40+40) Atanh-4 202ns ± 0% 202ns ± 0% ~ (all equal) Atan2-4 173ns ± 0% 173ns ± 0% ~ (all equal) Cbrt-4 1.06µs ± 0% 1.06µs ± 0% +0.09% (p=0.000 n=39+37) Ceil-4 72.9ns ± 0% 72.8ns ± 0% ~ (p=0.237 n=40+40) Copysign-4 13.2ns ± 0% 13.2ns ± 0% ~ (all equal) Cos-4 193ns ± 0% 183ns ± 0% -5.18% (p=0.000 n=40+40) Cosh-4 254ns ± 0% 239ns ± 0% -5.91% (p=0.000 n=40+40) Erf-4 112ns ± 0% 112ns ± 0% ~ (all equal) Erfc-4 117ns ± 0% 117ns ± 0% ~ (all equal) Erfinv-4 127ns ± 0% 127ns ± 1% ~ (p=0.492 n=40+40) Erfcinv-4 128ns ± 0% 128ns ± 0% ~ (all equal) Exp-4 212ns ± 0% 206ns ± 0% -3.05% (p=0.000 n=40+40) ExpGo-4 216ns ± 0% 209ns ± 0% -3.24% (p=0.000 n=40+40) Expm1-4 142ns ± 0% 142ns ± 0% ~ (all equal) Exp2-4 191ns ± 0% 184ns ± 0% -3.45% (p=0.000 n=40+40) Exp2Go-4 194ns ± 0% 187ns ± 0% -3.61% (p=0.000 n=40+40) Abs-4 14.4ns ± 0% 6.3ns ± 0% -56.39% (p=0.000 n=38+39) Dim-4 12.6ns ± 0% 12.6ns ± 0% ~ (all equal) Floor-4 49.6ns ± 0% 49.6ns ± 0% ~ (all equal) Max-4 27.6ns ± 0% 27.6ns ± 0% ~ (all equal) Min-4 27.0ns ± 0% 27.0ns ± 0% ~ (all equal) Mod-4 349ns ± 0% 305ns ± 1% -12.55% (p=0.000 n=33+40) Frexp-4 54.0ns ± 0% 47.1ns ± 0% -12.78% (p=0.000 n=38+38) Gamma-4 242ns ± 0% 234ns ± 0% -3.16% (p=0.000 n=36+40) Hypot-4 84.8ns ± 0% 67.8ns ± 0% -20.05% (p=0.000 n=31+35) HypotGo-4 88.5ns ± 0% 71.6ns ± 0% -19.12% (p=0.000 n=40+38) Ilogb-4 45.8ns ± 0% 38.9ns ± 0% -15.12% (p=0.000 n=40+32) J0-4 821ns ± 0% 802ns ± 0% -2.33% (p=0.000 n=33+40) J1-4 816ns ± 0% 807ns ± 0% -1.05% (p=0.000 n=40+29) Jn-4 1.67µs ± 0% 1.65µs ± 0% -1.45% (p=0.000 n=40+39) Ldexp-4 61.5ns ± 0% 54.6ns ± 0% -11.27% (p=0.000 n=40+32) Lgamma-4 188ns ± 0% 188ns ± 0% ~ (all equal) Log-4 154ns ± 0% 147ns ± 0% -4.78% (p=0.000 n=40+40) Logb-4 50.9ns ± 0% 42.7ns ± 0% -16.11% (p=0.000 n=34+39) Log1p-4 160ns ± 0% 159ns ± 0% ~ (p=0.828 n=40+40) Log10-4 173ns ± 0% 166ns ± 0% -4.05% (p=0.000 n=40+40) Log2-4 65.3ns ± 0% 58.4ns ± 0% -10.57% (p=0.000 n=37+37) Modf-4 36.4ns ± 0% 36.4ns ± 0% ~ (all equal) Nextafter32-4 36.4ns ± 0% 36.4ns ± 0% ~ (all equal) Nextafter64-4 32.7ns ± 0% 32.6ns ± 0% ~ (p=0.375 n=40+40) PowInt-4 300ns ± 0% 277ns ± 0% -7.78% (p=0.000 n=40+40) PowFrac-4 676ns ± 0% 635ns ± 0% -6.00% (p=0.000 n=40+35) Pow10Pos-4 17.6ns ± 0% 17.6ns ± 0% ~ (all equal) Pow10Neg-4 22.0ns ± 0% 22.0ns ± 0% ~ (all equal) Round-4 30.1ns ± 0% 30.1ns ± 0% ~ (all equal) RoundToEven-4 38.9ns ± 0% 38.9ns ± 0% ~ (all equal) Remainder-4 291ns ± 0% 263ns ± 0% -9.62% (p=0.000 n=40+40) Signbit-4 11.3ns ± 0% 11.3ns ± 0% ~ (all equal) Sin-4 185ns ± 0% 185ns ± 0% ~ (all equal) Sincos-4 230ns ± 0% 230ns ± 0% ~ (all equal) Sinh-4 253ns ± 0% 246ns ± 0% -2.77% (p=0.000 n=39+39) SqrtIndirect-4 41.4ns ± 0% 41.4ns ± 0% ~ (all equal) SqrtLatency-4 13.8ns ± 0% 13.8ns ± 0% ~ (all equal) SqrtIndirectLatency-4 37.0ns ± 0% 37.0ns ± 0% ~ (p=0.632 n=40+40) SqrtGoLatency-4 911ns ± 0% 911ns ± 0% +0.08% (p=0.000 n=40+40) SqrtPrime-4 13.2µs ± 0% 13.2µs ± 0% +0.01% (p=0.038 n=38+40) Tan-4 205ns ± 0% 205ns ± 0% ~ (all equal) Tanh-4 264ns ± 0% 247ns ± 0% -6.44% (p=0.000 n=39+32) Trunc-4 45.2ns ± 0% 45.2ns ± 0% ~ (all equal) Y0-4 796ns ± 0% 792ns ± 0% -0.55% (p=0.000 n=35+40) Y1-4 804ns ± 0% 797ns ± 0% -0.82% (p=0.000 n=24+40) Yn-4 1.64µs ± 0% 1.62µs ± 0% -1.27% (p=0.000 n=40+39) Float64bits-4 8.16ns ± 0% 8.16ns ± 0% +0.04% (p=0.000 n=35+40) Float64frombits-4 10.7ns ± 0% 10.7ns ± 0% ~ (all equal) Float32bits-4 7.53ns ± 0% 7.53ns ± 0% ~ (p=0.760 n=40+40) Float32frombits-4 6.91ns ± 0% 6.91ns ± 0% -0.04% (p=0.002 n=32+38) [Geo mean] 111ns 106ns -3.98% Change-Id: I54f4fd7f5160db020b430b556bde59cc0fdb996d Reviewed-on: https://go-review.googlesource.com/c/go/+/188678 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-08-02 02:41:59 +00:00
ssa.OpARMABSD,
ssa.OpARMMOVWF,
ssa.OpARMMOVWD,
ssa.OpARMMOVFW,
ssa.OpARMMOVDW,
ssa.OpARMMOVFD,
ssa.OpARMMOVDF:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpARMMOVWUF,
ssa.OpARMMOVWUD,
ssa.OpARMMOVFWU,
ssa.OpARMMOVDWU:
p := s.Prog(v.Op.Asm())
p.Scond = arm.C_UBIT
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpARMCMOVWHSconst:
p := s.Prog(arm.AMOVW)
p.Scond = arm.C_SCOND_HS
p.From.Type = obj.TYPE_CONST
p.From.Offset = v.AuxInt
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpARMCMOVWLSconst:
p := s.Prog(arm.AMOVW)
p.Scond = arm.C_SCOND_LS
p.From.Type = obj.TYPE_CONST
p.From.Offset = v.AuxInt
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpARMCALLstatic, ssa.OpARMCALLclosure, ssa.OpARMCALLinter:
s.Call(v)
cmd/compile: restore tail call for method wrappers For certain type of method wrappers we used to generate a tail call. That was disabled in CL 307234 when register ABI is used, because with the current IR it was difficult to generate a tail call with the arguments in the right places. The problem was that the IR does not contain a CALL-like node with arguments; instead, it contains an OAS node that adjusts the receiver, than an OTAILCALL node that just contains the target, but no argument (with the assumption that the OAS node will put the adjusted receiver in the right place). With register ABI, putting arguments in registers are done in SSA. The assignment (OAS) doesn't put the receiver in register. This CL changes the IR of a tail call to take an actual OCALL node. Specifically, a tail call is represented as OTAILCALL (OCALL target args...) This way, the call target and args are connected through the OCALL node. So the call can be analyzed in SSA and the args can be passed in the right places. (Alternatively, we could have OTAILCALL node directly take the target and the args, without the OCALL node. Using an OCALL node is convenient as there are existing code that processes OCALL nodes which do not need to be changed. Also, a tail call is similar to ORETURN (OCALL target args...), except it doesn't preserve the frame. I did the former but I'm open to change.) The SSA representation is similar. Previously, the IR lowers to a Store the receiver then a BlockRetJmp which jumps to the target (without putting the arg in register). Now we use a TailCall op, which takes the target and the args. The call expansion pass and the register allocator handles TailCall pretty much like a StaticCall, and it will do the right ABI analysis and put the args in the right places. (Args other than the receiver are already in the right places. For register args it generates no code for them. For stack args currently it generates a self copy. I'll work on optimize that out.) BlockRetJmp is still used, signaling it is a tail call. The actual call is made in the TailCall op so BlockRetJmp generates no code (we could use BlockExit if we like). This slightly reduces binary size: old new cmd/go 14003088 13953936 cmd/link 6275552 6271456 Change-Id: I2d16d8d419fe1f17554916d317427383e17e27f0 Reviewed-on: https://go-review.googlesource.com/c/go/+/350145 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: David Chase <drchase@google.com>
2021-09-10 22:05:55 -04:00
case ssa.OpARMCALLtail:
s.TailCall(v)
case ssa.OpARMCALLudiv:
p := s.Prog(obj.ACALL)
p.To.Type = obj.TYPE_MEM
p.To.Name = obj.NAME_EXTERN
[dev.regabi] cmd/compile: group known symbols, packages, names [generated] There are a handful of pre-computed magic symbols known by package gc, and we need a place to store them. If we keep them together, the need for type *ir.Name means that package ir is the lowest package in the import hierarchy that they can go in. And package ir needs gopkg for methodSymSuffix (in a later CL), so they can't go any higher either, at least not all together. So package ir it is. Rather than dump them all into the top-level package ir namespace, however, we introduce global structs, Syms, Pkgs, and Names, and make the known symbols, packages, and names fields of those. [git-generate] cd src/cmd/compile/internal/gc rf ' add go.go:$ \ // Names holds known names. \ var Names struct{} \ \ // Syms holds known symbols. \ var Syms struct {} \ \ // Pkgs holds known packages. \ var Pkgs struct {} \ mv staticuint64s Names.Staticuint64s mv zerobase Names.Zerobase mv assertE2I Syms.AssertE2I mv assertE2I2 Syms.AssertE2I2 mv assertI2I Syms.AssertI2I mv assertI2I2 Syms.AssertI2I2 mv deferproc Syms.Deferproc mv deferprocStack Syms.DeferprocStack mv Deferreturn Syms.Deferreturn mv Duffcopy Syms.Duffcopy mv Duffzero Syms.Duffzero mv gcWriteBarrier Syms.GCWriteBarrier mv goschedguarded Syms.Goschedguarded mv growslice Syms.Growslice mv msanread Syms.Msanread mv msanwrite Syms.Msanwrite mv msanmove Syms.Msanmove mv newobject Syms.Newobject mv newproc Syms.Newproc mv panicdivide Syms.Panicdivide mv panicshift Syms.Panicshift mv panicdottypeE Syms.PanicdottypeE mv panicdottypeI Syms.PanicdottypeI mv panicnildottype Syms.Panicnildottype mv panicoverflow Syms.Panicoverflow mv raceread Syms.Raceread mv racereadrange Syms.Racereadrange mv racewrite Syms.Racewrite mv racewriterange Syms.Racewriterange mv SigPanic Syms.SigPanic mv typedmemclr Syms.Typedmemclr mv typedmemmove Syms.Typedmemmove mv Udiv Syms.Udiv mv writeBarrier Syms.WriteBarrier mv zerobaseSym Syms.Zerobase mv arm64HasATOMICS Syms.ARM64HasATOMICS mv armHasVFPv4 Syms.ARMHasVFPv4 mv x86HasFMA Syms.X86HasFMA mv x86HasPOPCNT Syms.X86HasPOPCNT mv x86HasSSE41 Syms.X86HasSSE41 mv WasmDiv Syms.WasmDiv mv WasmMove Syms.WasmMove mv WasmZero Syms.WasmZero mv WasmTruncS Syms.WasmTruncS mv WasmTruncU Syms.WasmTruncU mv gopkg Pkgs.Go mv itabpkg Pkgs.Itab mv itablinkpkg Pkgs.Itablink mv mappkg Pkgs.Map mv msanpkg Pkgs.Msan mv racepkg Pkgs.Race mv Runtimepkg Pkgs.Runtime mv trackpkg Pkgs.Track mv unsafepkg Pkgs.Unsafe mv Names Syms Pkgs symtab.go mv symtab.go cmd/compile/internal/ir ' Change-Id: Ic143862148569a3bcde8e70b26d75421aa2d00f3 Reviewed-on: https://go-review.googlesource.com/c/go/+/279235 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:10:25 -05:00
p.To.Sym = ir.Syms.Udiv
case ssa.OpARMLoweredWB:
p := s.Prog(obj.ACALL)
p.To.Type = obj.TYPE_MEM
p.To.Name = obj.NAME_EXTERN
// AuxInt encodes how many buffer entries we need.
p.To.Sym = ir.Syms.GCWriteBarrier[v.AuxInt-1]
case ssa.OpARMLoweredPanicBoundsRR, ssa.OpARMLoweredPanicBoundsRC, ssa.OpARMLoweredPanicBoundsCR, ssa.OpARMLoweredPanicBoundsCC,
ssa.OpARMLoweredPanicExtendRR, ssa.OpARMLoweredPanicExtendRC:
// Compute the constant we put in the PCData entry for this call.
code, signed := ssa.BoundsKind(v.AuxInt).Code()
xIsReg := false
yIsReg := false
xVal := 0
yVal := 0
extend := false
switch v.Op {
case ssa.OpARMLoweredPanicBoundsRR:
xIsReg = true
xVal = int(v.Args[0].Reg() - arm.REG_R0)
yIsReg = true
yVal = int(v.Args[1].Reg() - arm.REG_R0)
case ssa.OpARMLoweredPanicExtendRR:
extend = true
xIsReg = true
hi := int(v.Args[0].Reg() - arm.REG_R0)
lo := int(v.Args[1].Reg() - arm.REG_R0)
xVal = hi<<2 + lo // encode 2 register numbers
yIsReg = true
yVal = int(v.Args[2].Reg() - arm.REG_R0)
case ssa.OpARMLoweredPanicBoundsRC:
xIsReg = true
xVal = int(v.Args[0].Reg() - arm.REG_R0)
c := v.Aux.(ssa.PanicBoundsC).C
if c >= 0 && c <= abi.BoundsMaxConst {
yVal = int(c)
} else {
// Move constant to a register
yIsReg = true
if yVal == xVal {
yVal = 1
}
p := s.Prog(arm.AMOVW)
p.From.Type = obj.TYPE_CONST
p.From.Offset = c
p.To.Type = obj.TYPE_REG
p.To.Reg = arm.REG_R0 + int16(yVal)
}
case ssa.OpARMLoweredPanicExtendRC:
extend = true
xIsReg = true
hi := int(v.Args[0].Reg() - arm.REG_R0)
lo := int(v.Args[1].Reg() - arm.REG_R0)
xVal = hi<<2 + lo // encode 2 register numbers
c := v.Aux.(ssa.PanicBoundsC).C
if c >= 0 && c <= abi.BoundsMaxConst {
yVal = int(c)
} else {
// Move constant to a register
for yVal == hi || yVal == lo {
yVal++
}
p := s.Prog(arm.AMOVW)
p.From.Type = obj.TYPE_CONST
p.From.Offset = c
p.To.Type = obj.TYPE_REG
p.To.Reg = arm.REG_R0 + int16(yVal)
}
case ssa.OpARMLoweredPanicBoundsCR:
yIsReg = true
yVal := int(v.Args[0].Reg() - arm.REG_R0)
c := v.Aux.(ssa.PanicBoundsC).C
if c >= 0 && c <= abi.BoundsMaxConst {
xVal = int(c)
} else if signed && int64(int32(c)) == c || !signed && int64(uint32(c)) == c {
// Move constant to a register
xIsReg = true
if xVal == yVal {
xVal = 1
}
p := s.Prog(arm.AMOVW)
p.From.Type = obj.TYPE_CONST
p.From.Offset = c
p.To.Type = obj.TYPE_REG
p.To.Reg = arm.REG_R0 + int16(xVal)
} else {
// Move constant to two registers
extend = true
xIsReg = true
hi := 0
lo := 1
if hi == yVal {
hi = 2
}
if lo == yVal {
lo = 2
}
xVal = hi<<2 + lo
p := s.Prog(arm.AMOVW)
p.From.Type = obj.TYPE_CONST
p.From.Offset = c >> 32
p.To.Type = obj.TYPE_REG
p.To.Reg = arm.REG_R0 + int16(hi)
p = s.Prog(arm.AMOVW)
p.From.Type = obj.TYPE_CONST
p.From.Offset = int64(int32(c))
p.To.Type = obj.TYPE_REG
p.To.Reg = arm.REG_R0 + int16(lo)
}
case ssa.OpARMLoweredPanicBoundsCC:
c := v.Aux.(ssa.PanicBoundsCC).Cx
if c >= 0 && c <= abi.BoundsMaxConst {
xVal = int(c)
} else if signed && int64(int32(c)) == c || !signed && int64(uint32(c)) == c {
// Move constant to a register
xIsReg = true
p := s.Prog(arm.AMOVW)
p.From.Type = obj.TYPE_CONST
p.From.Offset = c
p.To.Type = obj.TYPE_REG
p.To.Reg = arm.REG_R0 + int16(xVal)
} else {
// Move constant to two registers
extend = true
xIsReg = true
hi := 0
lo := 1
xVal = hi<<2 + lo
p := s.Prog(arm.AMOVW)
p.From.Type = obj.TYPE_CONST
p.From.Offset = c >> 32
p.To.Type = obj.TYPE_REG
p.To.Reg = arm.REG_R0 + int16(hi)
p = s.Prog(arm.AMOVW)
p.From.Type = obj.TYPE_CONST
p.From.Offset = int64(int32(c))
p.To.Type = obj.TYPE_REG
p.To.Reg = arm.REG_R0 + int16(lo)
}
c = v.Aux.(ssa.PanicBoundsCC).Cy
if c >= 0 && c <= abi.BoundsMaxConst {
yVal = int(c)
} else {
// Move constant to a register
yIsReg = true
yVal = 2
p := s.Prog(arm.AMOVW)
p.From.Type = obj.TYPE_CONST
p.From.Offset = c
p.To.Type = obj.TYPE_REG
p.To.Reg = arm.REG_R0 + int16(yVal)
}
}
c := abi.BoundsEncode(code, signed, xIsReg, yIsReg, xVal, yVal)
p := s.Prog(obj.APCDATA)
p.From.SetConst(abi.PCDATA_PanicBounds)
p.To.SetConst(int64(c))
p = s.Prog(obj.ACALL)
p.To.Type = obj.TYPE_MEM
p.To.Name = obj.NAME_EXTERN
if extend {
p.To.Sym = ir.Syms.PanicExtend
} else {
p.To.Sym = ir.Syms.PanicBounds
}
case ssa.OpARMDUFFZERO:
p := s.Prog(obj.ADUFFZERO)
p.To.Type = obj.TYPE_MEM
p.To.Name = obj.NAME_EXTERN
[dev.regabi] cmd/compile: group known symbols, packages, names [generated] There are a handful of pre-computed magic symbols known by package gc, and we need a place to store them. If we keep them together, the need for type *ir.Name means that package ir is the lowest package in the import hierarchy that they can go in. And package ir needs gopkg for methodSymSuffix (in a later CL), so they can't go any higher either, at least not all together. So package ir it is. Rather than dump them all into the top-level package ir namespace, however, we introduce global structs, Syms, Pkgs, and Names, and make the known symbols, packages, and names fields of those. [git-generate] cd src/cmd/compile/internal/gc rf ' add go.go:$ \ // Names holds known names. \ var Names struct{} \ \ // Syms holds known symbols. \ var Syms struct {} \ \ // Pkgs holds known packages. \ var Pkgs struct {} \ mv staticuint64s Names.Staticuint64s mv zerobase Names.Zerobase mv assertE2I Syms.AssertE2I mv assertE2I2 Syms.AssertE2I2 mv assertI2I Syms.AssertI2I mv assertI2I2 Syms.AssertI2I2 mv deferproc Syms.Deferproc mv deferprocStack Syms.DeferprocStack mv Deferreturn Syms.Deferreturn mv Duffcopy Syms.Duffcopy mv Duffzero Syms.Duffzero mv gcWriteBarrier Syms.GCWriteBarrier mv goschedguarded Syms.Goschedguarded mv growslice Syms.Growslice mv msanread Syms.Msanread mv msanwrite Syms.Msanwrite mv msanmove Syms.Msanmove mv newobject Syms.Newobject mv newproc Syms.Newproc mv panicdivide Syms.Panicdivide mv panicshift Syms.Panicshift mv panicdottypeE Syms.PanicdottypeE mv panicdottypeI Syms.PanicdottypeI mv panicnildottype Syms.Panicnildottype mv panicoverflow Syms.Panicoverflow mv raceread Syms.Raceread mv racereadrange Syms.Racereadrange mv racewrite Syms.Racewrite mv racewriterange Syms.Racewriterange mv SigPanic Syms.SigPanic mv typedmemclr Syms.Typedmemclr mv typedmemmove Syms.Typedmemmove mv Udiv Syms.Udiv mv writeBarrier Syms.WriteBarrier mv zerobaseSym Syms.Zerobase mv arm64HasATOMICS Syms.ARM64HasATOMICS mv armHasVFPv4 Syms.ARMHasVFPv4 mv x86HasFMA Syms.X86HasFMA mv x86HasPOPCNT Syms.X86HasPOPCNT mv x86HasSSE41 Syms.X86HasSSE41 mv WasmDiv Syms.WasmDiv mv WasmMove Syms.WasmMove mv WasmZero Syms.WasmZero mv WasmTruncS Syms.WasmTruncS mv WasmTruncU Syms.WasmTruncU mv gopkg Pkgs.Go mv itabpkg Pkgs.Itab mv itablinkpkg Pkgs.Itablink mv mappkg Pkgs.Map mv msanpkg Pkgs.Msan mv racepkg Pkgs.Race mv Runtimepkg Pkgs.Runtime mv trackpkg Pkgs.Track mv unsafepkg Pkgs.Unsafe mv Names Syms Pkgs symtab.go mv symtab.go cmd/compile/internal/ir ' Change-Id: Ic143862148569a3bcde8e70b26d75421aa2d00f3 Reviewed-on: https://go-review.googlesource.com/c/go/+/279235 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:10:25 -05:00
p.To.Sym = ir.Syms.Duffzero
p.To.Offset = v.AuxInt
case ssa.OpARMDUFFCOPY:
p := s.Prog(obj.ADUFFCOPY)
p.To.Type = obj.TYPE_MEM
p.To.Name = obj.NAME_EXTERN
[dev.regabi] cmd/compile: group known symbols, packages, names [generated] There are a handful of pre-computed magic symbols known by package gc, and we need a place to store them. If we keep them together, the need for type *ir.Name means that package ir is the lowest package in the import hierarchy that they can go in. And package ir needs gopkg for methodSymSuffix (in a later CL), so they can't go any higher either, at least not all together. So package ir it is. Rather than dump them all into the top-level package ir namespace, however, we introduce global structs, Syms, Pkgs, and Names, and make the known symbols, packages, and names fields of those. [git-generate] cd src/cmd/compile/internal/gc rf ' add go.go:$ \ // Names holds known names. \ var Names struct{} \ \ // Syms holds known symbols. \ var Syms struct {} \ \ // Pkgs holds known packages. \ var Pkgs struct {} \ mv staticuint64s Names.Staticuint64s mv zerobase Names.Zerobase mv assertE2I Syms.AssertE2I mv assertE2I2 Syms.AssertE2I2 mv assertI2I Syms.AssertI2I mv assertI2I2 Syms.AssertI2I2 mv deferproc Syms.Deferproc mv deferprocStack Syms.DeferprocStack mv Deferreturn Syms.Deferreturn mv Duffcopy Syms.Duffcopy mv Duffzero Syms.Duffzero mv gcWriteBarrier Syms.GCWriteBarrier mv goschedguarded Syms.Goschedguarded mv growslice Syms.Growslice mv msanread Syms.Msanread mv msanwrite Syms.Msanwrite mv msanmove Syms.Msanmove mv newobject Syms.Newobject mv newproc Syms.Newproc mv panicdivide Syms.Panicdivide mv panicshift Syms.Panicshift mv panicdottypeE Syms.PanicdottypeE mv panicdottypeI Syms.PanicdottypeI mv panicnildottype Syms.Panicnildottype mv panicoverflow Syms.Panicoverflow mv raceread Syms.Raceread mv racereadrange Syms.Racereadrange mv racewrite Syms.Racewrite mv racewriterange Syms.Racewriterange mv SigPanic Syms.SigPanic mv typedmemclr Syms.Typedmemclr mv typedmemmove Syms.Typedmemmove mv Udiv Syms.Udiv mv writeBarrier Syms.WriteBarrier mv zerobaseSym Syms.Zerobase mv arm64HasATOMICS Syms.ARM64HasATOMICS mv armHasVFPv4 Syms.ARMHasVFPv4 mv x86HasFMA Syms.X86HasFMA mv x86HasPOPCNT Syms.X86HasPOPCNT mv x86HasSSE41 Syms.X86HasSSE41 mv WasmDiv Syms.WasmDiv mv WasmMove Syms.WasmMove mv WasmZero Syms.WasmZero mv WasmTruncS Syms.WasmTruncS mv WasmTruncU Syms.WasmTruncU mv gopkg Pkgs.Go mv itabpkg Pkgs.Itab mv itablinkpkg Pkgs.Itablink mv mappkg Pkgs.Map mv msanpkg Pkgs.Msan mv racepkg Pkgs.Race mv Runtimepkg Pkgs.Runtime mv trackpkg Pkgs.Track mv unsafepkg Pkgs.Unsafe mv Names Syms Pkgs symtab.go mv symtab.go cmd/compile/internal/ir ' Change-Id: Ic143862148569a3bcde8e70b26d75421aa2d00f3 Reviewed-on: https://go-review.googlesource.com/c/go/+/279235 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:10:25 -05:00
p.To.Sym = ir.Syms.Duffcopy
p.To.Offset = v.AuxInt
case ssa.OpARMLoweredNilCheck:
// Issue a load which will fault if arg is nil.
p := s.Prog(arm.AMOVB)
p.From.Type = obj.TYPE_MEM
p.From.Reg = v.Args[0].Reg()
[dev.regabi] cmd/compile: split out package ssagen [generated] [git-generate] cd src/cmd/compile/internal/gc rf ' # maxOpenDefers is declared in ssa.go but used only by walk. mv maxOpenDefers walk.go # gc.Arch -> ssagen.Arch # It is not as nice but will do for now. mv Arch ArchInfo mv thearch Arch mv Arch ArchInfo arch.go # Pull dwarf out of pgen.go. mv debuginfo declPos createDwarfVars preInliningDcls \ createSimpleVars createSimpleVar \ createComplexVars createComplexVar \ dwarf.go # Pull high-level compilation out of pgen.go, # leaving only the SSA code. mv compilequeue funccompile compile compilenow \ compileFunctions isInlinableButNotInlined \ initLSym \ compile.go mv BoundsCheckFunc GCWriteBarrierReg ssa.go mv largeStack largeStackFrames CheckLargeStacks pgen.go # All that is left in dcl.go is the nowritebarrierrecCheck mv dcl.go nowb.go # Export API and unexport non-API. mv initssaconfig InitConfig mv isIntrinsicCall IsIntrinsicCall mv ssaDumpInline DumpInline mv initSSATables InitTables mv initSSAEnv InitEnv mv compileSSA Compile mv stackOffset StackOffset mv canSSAType TypeOK mv SSAGenState State mv FwdRefAux fwdRefAux mv cgoSymABIs CgoSymABIs mv readSymABIs ReadSymABIs mv initLSym InitLSym mv useABIWrapGen symabiDefs CgoSymABIs ReadSymABIs InitLSym selectLSym makeABIWrapper setupTextLSym abi.go mv arch.go abi.go nowb.go phi.go pgen.go pgen_test.go ssa.go cmd/compile/internal/ssagen ' rm go.go gsubr.go Change-Id: I47fad6cbf1d1e583fd9139003a08401d7cd048a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/279476 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:57:10 -05:00
ssagen.AddAux(&p.From, v)
p.To.Type = obj.TYPE_REG
p.To.Reg = arm.REGTMP
if logopt.Enabled() {
logopt.LogOpt(v.Pos, "nilcheck", "genssa", v.Block.Func.Name)
}
if base.Debug.Nil != 0 && v.Pos.Line() > 1 { // v.Pos.Line()==1 in generated wrappers
base.WarnfAt(v.Pos, "generated nil check")
}
case ssa.OpARMLoweredZero:
// MOVW.P Rarg2, 4(R1)
// CMP Rarg1, R1
// BLE -2(PC)
// arg1 is the address of the last element to zero
// arg2 is known to be zero
// auxint is alignment
var sz int64
var mov obj.As
switch {
case v.AuxInt%4 == 0:
sz = 4
mov = arm.AMOVW
case v.AuxInt%2 == 0:
sz = 2
mov = arm.AMOVH
default:
sz = 1
mov = arm.AMOVB
}
p := s.Prog(mov)
p.Scond = arm.C_PBIT
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[2].Reg()
p.To.Type = obj.TYPE_MEM
p.To.Reg = arm.REG_R1
p.To.Offset = sz
p2 := s.Prog(arm.ACMP)
p2.From.Type = obj.TYPE_REG
p2.From.Reg = v.Args[1].Reg()
p2.Reg = arm.REG_R1
p3 := s.Prog(arm.ABLE)
p3.To.Type = obj.TYPE_BRANCH
[dev.regabi] cmd/compile: split out package objw [generated] Object file writing routines are used not just at the end of the compilation but also during static data layout in walk. Split them into their own package. [git-generate] cd src/cmd/compile/internal/gc rf ' # Move bit vector to new package bitvec mv bvec.n bvec.N mv bvec.b bvec.B mv bvec BitVec mv bvalloc New mv bvbulkalloc NewBulk mv bulkBvec.next bulkBvec.Next mv bulkBvec Bulk mv H0 h0 mv Hp hp # Leave bvecSet and bitmap hashes behind - not needed as broadly. mv bvecSet.extractUniqe bvecSet.extractUnique mv h0 bvecSet bvecSet.grow bvecSet.add \ bvecSet.extractUnique hashbitmap bvset.go mv bv.go cmd/compile/internal/bitvec ex . ../arm ../arm64 ../mips ../mips64 ../ppc64 ../s390x ../riscv64 { import "cmd/internal/obj" var a *obj.Addr var i int64 Addrconst(a, i) -> a.SetConst(i) var p, to *obj.Prog Patch(p, to) -> p.To.SetTarget(to) } rm Addrconst Patch # Move object-writing API to new package objw mv duint8 Objw_Uint8 mv duint16 Objw_Uint16 mv duint32 Objw_Uint32 mv duintptr Objw_Uintptr mv duintxx Objw_UintN mv dsymptr Objw_SymPtr mv dsymptrOff Objw_SymPtrOff mv dsymptrWeakOff Objw_SymPtrWeakOff mv ggloblsym Objw_Global mv dbvec Objw_BitVec mv newProgs NewProgs mv Progs.clearp Progs.Clear mv Progs.settext Progs.SetText mv Progs.next Progs.Next mv Progs.pc Progs.PC mv Progs.pos Progs.Pos mv Progs.curfn Progs.CurFunc mv Progs.progcache Progs.Cache mv Progs.cacheidx Progs.CacheIndex mv Progs.nextLive Progs.NextLive mv Progs.prevLive Progs.PrevLive mv Progs.Appendpp Progs.Append mv LivenessIndex.stackMapIndex LivenessIndex.StackMapIndex mv LivenessIndex.isUnsafePoint LivenessIndex.IsUnsafePoint mv Objw_Uint8 Objw_Uint16 Objw_Uint32 Objw_Uintptr Objw_UintN \ Objw_SymPtr Objw_SymPtrOff Objw_SymPtrWeakOff Objw_Global \ Objw_BitVec \ objw.go mv sharedProgArray NewProgs Progs \ LivenessIndex StackMapDontCare \ LivenessDontCare LivenessIndex.StackMapValid \ Progs.NewProg Progs.Flush Progs.Free Progs.Prog Progs.Clear Progs.Append Progs.SetText \ prog.go mv prog.go objw.go cmd/compile/internal/objw # Move ggloblnod to obj with the rest of the non-objw higher-level writing. mv ggloblnod obj.go ' cd ../objw rf ' mv Objw_Uint8 Uint8 mv Objw_Uint16 Uint16 mv Objw_Uint32 Uint32 mv Objw_Uintptr Uintptr mv Objw_UintN UintN mv Objw_SymPtr SymPtr mv Objw_SymPtrOff SymPtrOff mv Objw_SymPtrWeakOff SymPtrWeakOff mv Objw_Global Global mv Objw_BitVec BitVec ' Change-Id: I2b87085aa788564fb322e9c55bddd73347b4d5fd Reviewed-on: https://go-review.googlesource.com/c/go/+/279310 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:46:27 -05:00
p3.To.SetTarget(p)
case ssa.OpARMLoweredMove:
// MOVW.P 4(R1), Rtmp
// MOVW.P Rtmp, 4(R2)
// CMP Rarg2, R1
// BLE -3(PC)
// arg2 is the address of the last element of src
// auxint is alignment
var sz int64
var mov obj.As
switch {
case v.AuxInt%4 == 0:
sz = 4
mov = arm.AMOVW
case v.AuxInt%2 == 0:
sz = 2
mov = arm.AMOVH
default:
sz = 1
mov = arm.AMOVB
}
p := s.Prog(mov)
p.Scond = arm.C_PBIT
p.From.Type = obj.TYPE_MEM
p.From.Reg = arm.REG_R1
p.From.Offset = sz
p.To.Type = obj.TYPE_REG
p.To.Reg = arm.REGTMP
p2 := s.Prog(mov)
p2.Scond = arm.C_PBIT
p2.From.Type = obj.TYPE_REG
p2.From.Reg = arm.REGTMP
p2.To.Type = obj.TYPE_MEM
p2.To.Reg = arm.REG_R2
p2.To.Offset = sz
p3 := s.Prog(arm.ACMP)
p3.From.Type = obj.TYPE_REG
p3.From.Reg = v.Args[2].Reg()
p3.Reg = arm.REG_R1
p4 := s.Prog(arm.ABLE)
p4.To.Type = obj.TYPE_BRANCH
[dev.regabi] cmd/compile: split out package objw [generated] Object file writing routines are used not just at the end of the compilation but also during static data layout in walk. Split them into their own package. [git-generate] cd src/cmd/compile/internal/gc rf ' # Move bit vector to new package bitvec mv bvec.n bvec.N mv bvec.b bvec.B mv bvec BitVec mv bvalloc New mv bvbulkalloc NewBulk mv bulkBvec.next bulkBvec.Next mv bulkBvec Bulk mv H0 h0 mv Hp hp # Leave bvecSet and bitmap hashes behind - not needed as broadly. mv bvecSet.extractUniqe bvecSet.extractUnique mv h0 bvecSet bvecSet.grow bvecSet.add \ bvecSet.extractUnique hashbitmap bvset.go mv bv.go cmd/compile/internal/bitvec ex . ../arm ../arm64 ../mips ../mips64 ../ppc64 ../s390x ../riscv64 { import "cmd/internal/obj" var a *obj.Addr var i int64 Addrconst(a, i) -> a.SetConst(i) var p, to *obj.Prog Patch(p, to) -> p.To.SetTarget(to) } rm Addrconst Patch # Move object-writing API to new package objw mv duint8 Objw_Uint8 mv duint16 Objw_Uint16 mv duint32 Objw_Uint32 mv duintptr Objw_Uintptr mv duintxx Objw_UintN mv dsymptr Objw_SymPtr mv dsymptrOff Objw_SymPtrOff mv dsymptrWeakOff Objw_SymPtrWeakOff mv ggloblsym Objw_Global mv dbvec Objw_BitVec mv newProgs NewProgs mv Progs.clearp Progs.Clear mv Progs.settext Progs.SetText mv Progs.next Progs.Next mv Progs.pc Progs.PC mv Progs.pos Progs.Pos mv Progs.curfn Progs.CurFunc mv Progs.progcache Progs.Cache mv Progs.cacheidx Progs.CacheIndex mv Progs.nextLive Progs.NextLive mv Progs.prevLive Progs.PrevLive mv Progs.Appendpp Progs.Append mv LivenessIndex.stackMapIndex LivenessIndex.StackMapIndex mv LivenessIndex.isUnsafePoint LivenessIndex.IsUnsafePoint mv Objw_Uint8 Objw_Uint16 Objw_Uint32 Objw_Uintptr Objw_UintN \ Objw_SymPtr Objw_SymPtrOff Objw_SymPtrWeakOff Objw_Global \ Objw_BitVec \ objw.go mv sharedProgArray NewProgs Progs \ LivenessIndex StackMapDontCare \ LivenessDontCare LivenessIndex.StackMapValid \ Progs.NewProg Progs.Flush Progs.Free Progs.Prog Progs.Clear Progs.Append Progs.SetText \ prog.go mv prog.go objw.go cmd/compile/internal/objw # Move ggloblnod to obj with the rest of the non-objw higher-level writing. mv ggloblnod obj.go ' cd ../objw rf ' mv Objw_Uint8 Uint8 mv Objw_Uint16 Uint16 mv Objw_Uint32 Uint32 mv Objw_Uintptr Uintptr mv Objw_UintN UintN mv Objw_SymPtr SymPtr mv Objw_SymPtrOff SymPtrOff mv Objw_SymPtrWeakOff SymPtrWeakOff mv Objw_Global Global mv Objw_BitVec BitVec ' Change-Id: I2b87085aa788564fb322e9c55bddd73347b4d5fd Reviewed-on: https://go-review.googlesource.com/c/go/+/279310 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:46:27 -05:00
p4.To.SetTarget(p)
case ssa.OpARMEqual,
ssa.OpARMNotEqual,
ssa.OpARMLessThan,
ssa.OpARMLessEqual,
ssa.OpARMGreaterThan,
ssa.OpARMGreaterEqual,
ssa.OpARMLessThanU,
ssa.OpARMLessEqualU,
ssa.OpARMGreaterThanU,
ssa.OpARMGreaterEqualU:
// generate boolean values
// use conditional move
p := s.Prog(arm.AMOVW)
p.From.Type = obj.TYPE_CONST
p.From.Offset = 0
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
p = s.Prog(arm.AMOVW)
p.Scond = condBits[v.Op]
p.From.Type = obj.TYPE_CONST
p.From.Offset = 1
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpARMLoweredGetClosurePtr:
// Closure pointer is R7 (arm.REGCTXT).
[dev.regabi] cmd/compile: split out package ssagen [generated] [git-generate] cd src/cmd/compile/internal/gc rf ' # maxOpenDefers is declared in ssa.go but used only by walk. mv maxOpenDefers walk.go # gc.Arch -> ssagen.Arch # It is not as nice but will do for now. mv Arch ArchInfo mv thearch Arch mv Arch ArchInfo arch.go # Pull dwarf out of pgen.go. mv debuginfo declPos createDwarfVars preInliningDcls \ createSimpleVars createSimpleVar \ createComplexVars createComplexVar \ dwarf.go # Pull high-level compilation out of pgen.go, # leaving only the SSA code. mv compilequeue funccompile compile compilenow \ compileFunctions isInlinableButNotInlined \ initLSym \ compile.go mv BoundsCheckFunc GCWriteBarrierReg ssa.go mv largeStack largeStackFrames CheckLargeStacks pgen.go # All that is left in dcl.go is the nowritebarrierrecCheck mv dcl.go nowb.go # Export API and unexport non-API. mv initssaconfig InitConfig mv isIntrinsicCall IsIntrinsicCall mv ssaDumpInline DumpInline mv initSSATables InitTables mv initSSAEnv InitEnv mv compileSSA Compile mv stackOffset StackOffset mv canSSAType TypeOK mv SSAGenState State mv FwdRefAux fwdRefAux mv cgoSymABIs CgoSymABIs mv readSymABIs ReadSymABIs mv initLSym InitLSym mv useABIWrapGen symabiDefs CgoSymABIs ReadSymABIs InitLSym selectLSym makeABIWrapper setupTextLSym abi.go mv arch.go abi.go nowb.go phi.go pgen.go pgen_test.go ssa.go cmd/compile/internal/ssagen ' rm go.go gsubr.go Change-Id: I47fad6cbf1d1e583fd9139003a08401d7cd048a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/279476 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:57:10 -05:00
ssagen.CheckLoweredGetClosurePtr(v)
case ssa.OpARMLoweredGetCallerSP:
// caller's SP is FixedFrameSize below the address of the first arg
p := s.Prog(arm.AMOVW)
p.From.Type = obj.TYPE_ADDR
p.From.Offset = -base.Ctxt.Arch.FixedFrameSize
p.From.Name = obj.NAME_PARAM
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpARMLoweredGetCallerPC:
p := s.Prog(obj.AGETCALLERPC)
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpARMFlagConstant:
v.Fatalf("FlagConstant op should never make it to codegen %v", v.LongString())
case ssa.OpARMInvertFlags:
v.Fatalf("InvertFlags should never make it to codegen %v", v.LongString())
case ssa.OpClobber, ssa.OpClobberReg:
// TODO: implement for clobberdead experiment. Nop is ok for now.
default:
v.Fatalf("genValue not implemented: %s", v.LongString())
}
}
var condBits = map[ssa.Op]uint8{
ssa.OpARMEqual: arm.C_SCOND_EQ,
ssa.OpARMNotEqual: arm.C_SCOND_NE,
ssa.OpARMLessThan: arm.C_SCOND_LT,
ssa.OpARMLessThanU: arm.C_SCOND_LO,
ssa.OpARMLessEqual: arm.C_SCOND_LE,
ssa.OpARMLessEqualU: arm.C_SCOND_LS,
ssa.OpARMGreaterThan: arm.C_SCOND_GT,
ssa.OpARMGreaterThanU: arm.C_SCOND_HI,
ssa.OpARMGreaterEqual: arm.C_SCOND_GE,
ssa.OpARMGreaterEqualU: arm.C_SCOND_HS,
}
var blockJump = map[ssa.BlockKind]struct {
asm, invasm obj.As
}{
cmd/compile: ARM comparisons with 0 incorrect on overflow Some ARM rewriting rules convert 'comparing to zero' conditions of if statements to a simplified version utilizing CMN and CMP instructions to branch over condition flags, in order to save one Add or Sub caculation. Such optimizations lead to wrong branching in case an overflow/underflow occurs when executing CMN or CMP. Fix the issue by introducing new block opcodes that don't honor the overflow/underflow flag: Block-Op Meaning ARM condition codes 1. LTnoov less than MI 2. GEnoov greater than or equal PL 3. LEnoov less than or equal MI || EQ 4. GTnoov greater than NEQ & PL The patch also adds a few test cases to cover scenarios that are specific to ARM and fine-tunes the code generation tests for 'x-const'. For more details please refer to the previous fix on 64-bit ARM: https://go-review.googlesource.com/c/go/+/233097 Go1 perf, 'old' is the non-optimized version, that is removing all concerned rewriting rules. name old time/op new time/op delta BinaryTree17-8 7.73s ± 0% 7.81s ± 0% +0.97% (p=0.000 n=7+8) Fannkuch11-8 7.06s ± 0% 7.00s ± 0% -0.83% (p=0.000 n=8+8) FmtFprintfEmpty-8 181ns ± 1% 183ns ± 1% +1.31% (p=0.001 n=8+8) FmtFprintfString-8 319ns ± 1% 325ns ± 2% +1.71% (p=0.009 n=7+8) FmtFprintfInt-8 358ns ± 1% 359ns ± 1% ~ (p=0.293 n=7+7) FmtFprintfIntInt-8 459ns ± 3% 456ns ± 1% ~ (p=0.869 n=8+8) FmtFprintfPrefixedInt-8 535ns ± 4% 538ns ± 4% ~ (p=0.572 n=8+8) FmtFprintfFloat-8 1.01µs ± 2% 1.01µs ± 2% ~ (p=0.625 n=8+8) FmtManyArgs-8 1.93µs ± 2% 1.93µs ± 1% ~ (p=0.979 n=8+7) GobDecode-8 16.1ms ± 1% 16.5ms ± 1% +2.32% (p=0.000 n=8+8) GobEncode-8 15.9ms ± 0% 15.8ms ± 1% -1.00% (p=0.000 n=8+7) Gzip-8 690ms ± 1% 670ms ± 0% -2.90% (p=0.000 n=8+8) Gunzip-8 109ms ± 1% 109ms ± 1% ~ (p=0.694 n=7+8) HTTPClientServer-8 149µs ± 3% 146µs ± 2% -1.70% (p=0.028 n=8+8) JSONEncode-8 50.5ms ± 1% 49.2ms ± 0% -2.60% (p=0.001 n=7+7) JSONDecode-8 135ms ± 2% 137ms ± 1% ~ (p=0.054 n=8+7) Mandelbrot200-8 951ms ± 0% 952ms ± 0% ~ (p=0.852 n=6+8) GoParse-8 9.47ms ± 1% 9.66ms ± 1% +2.01% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 288ns ± 2% 277ns ± 2% -3.61% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 1.66µs ± 1% 1.69µs ± 2% +2.21% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 334ns ± 1% 305ns ± 2% -8.86% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 2.14µs ± 2% 2.15µs ± 0% ~ (p=0.099 n=8+8) RegexpMatchMedium_32-8 13.3ns ± 1% 13.3ns ± 0% ~ (p=1.000 n=7+7) RegexpMatchMedium_1K-8 81.1µs ± 3% 80.7µs ± 1% ~ (p=0.955 n=7+8) RegexpMatchHard_32-8 4.26µs ± 0% 4.26µs ± 0% ~ (p=0.933 n=7+8) RegexpMatchHard_1K-8 124µs ± 0% 124µs ± 0% +0.31% (p=0.000 n=8+8) Revcomp-8 14.7ms ± 2% 14.5ms ± 1% -1.66% (p=0.003 n=8+8) Template-8 197ms ± 2% 200ms ± 3% +1.62% (p=0.021 n=8+8) TimeParse-8 1.33µs ± 1% 1.30µs ± 1% -1.86% (p=0.002 n=8+8) TimeFormat-8 3.04µs ± 1% 3.02µs ± 0% -0.60% (p=0.000 n=8+8) name old speed new speed delta GobDecode-8 47.6MB/s ± 1% 46.5MB/s ± 1% -2.28% (p=0.000 n=8+8) GobEncode-8 48.1MB/s ± 0% 48.6MB/s ± 1% +1.02% (p=0.000 n=8+7) Gzip-8 28.1MB/s ± 1% 29.0MB/s ± 0% +2.97% (p=0.000 n=8+8) Gunzip-8 178MB/s ± 1% 179MB/s ± 2% ~ (p=0.694 n=7+8) JSONEncode-8 38.4MB/s ± 1% 39.4MB/s ± 0% +2.67% (p=0.001 n=7+7) JSONDecode-8 14.3MB/s ± 2% 14.2MB/s ± 1% -0.81% (p=0.043 n=8+7) GoParse-8 6.12MB/s ± 1% 5.99MB/s ± 1% -2.00% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 111MB/s ± 2% 115MB/s ± 2% +3.77% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 618MB/s ± 1% 604MB/s ± 2% -2.16% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 95.7MB/s ± 1% 105.1MB/s ± 2% +9.76% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 479MB/s ± 2% 477MB/s ± 0% ~ (p=0.105 n=8+8) RegexpMatchMedium_32-8 75.2MB/s ± 1% 75.2MB/s ± 0% ~ (p=0.247 n=7+7) RegexpMatchMedium_1K-8 12.6MB/s ± 3% 12.7MB/s ± 1% ~ (p=0.538 n=7+8) RegexpMatchHard_32-8 7.52MB/s ± 0% 7.52MB/s ± 0% ~ (p=0.968 n=7+8) RegexpMatchHard_1K-8 8.26MB/s ± 0% 8.24MB/s ± 0% -0.30% (p=0.001 n=8+8) Revcomp-8 173MB/s ± 2% 176MB/s ± 1% +1.68% (p=0.003 n=8+8) Template-8 9.85MB/s ± 2% 9.69MB/s ± 3% -1.59% (p=0.021 n=8+8) Fixes #39303 Updates #38740 Change-Id: I0a5f87bfda679f66414c0041ace2ca2e28363f36 Reviewed-on: https://go-review.googlesource.com/c/go/+/236637 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-06-01 11:01:14 +00:00
ssa.BlockARMEQ: {arm.ABEQ, arm.ABNE},
ssa.BlockARMNE: {arm.ABNE, arm.ABEQ},
ssa.BlockARMLT: {arm.ABLT, arm.ABGE},
ssa.BlockARMGE: {arm.ABGE, arm.ABLT},
ssa.BlockARMLE: {arm.ABLE, arm.ABGT},
ssa.BlockARMGT: {arm.ABGT, arm.ABLE},
ssa.BlockARMULT: {arm.ABLO, arm.ABHS},
ssa.BlockARMUGE: {arm.ABHS, arm.ABLO},
ssa.BlockARMUGT: {arm.ABHI, arm.ABLS},
ssa.BlockARMULE: {arm.ABLS, arm.ABHI},
ssa.BlockARMLTnoov: {arm.ABMI, arm.ABPL},
ssa.BlockARMGEnoov: {arm.ABPL, arm.ABMI},
}
// To model a 'LEnoov' ('<=' without overflow checking) branching.
[dev.regabi] cmd/compile: split out package ssagen [generated] [git-generate] cd src/cmd/compile/internal/gc rf ' # maxOpenDefers is declared in ssa.go but used only by walk. mv maxOpenDefers walk.go # gc.Arch -> ssagen.Arch # It is not as nice but will do for now. mv Arch ArchInfo mv thearch Arch mv Arch ArchInfo arch.go # Pull dwarf out of pgen.go. mv debuginfo declPos createDwarfVars preInliningDcls \ createSimpleVars createSimpleVar \ createComplexVars createComplexVar \ dwarf.go # Pull high-level compilation out of pgen.go, # leaving only the SSA code. mv compilequeue funccompile compile compilenow \ compileFunctions isInlinableButNotInlined \ initLSym \ compile.go mv BoundsCheckFunc GCWriteBarrierReg ssa.go mv largeStack largeStackFrames CheckLargeStacks pgen.go # All that is left in dcl.go is the nowritebarrierrecCheck mv dcl.go nowb.go # Export API and unexport non-API. mv initssaconfig InitConfig mv isIntrinsicCall IsIntrinsicCall mv ssaDumpInline DumpInline mv initSSATables InitTables mv initSSAEnv InitEnv mv compileSSA Compile mv stackOffset StackOffset mv canSSAType TypeOK mv SSAGenState State mv FwdRefAux fwdRefAux mv cgoSymABIs CgoSymABIs mv readSymABIs ReadSymABIs mv initLSym InitLSym mv useABIWrapGen symabiDefs CgoSymABIs ReadSymABIs InitLSym selectLSym makeABIWrapper setupTextLSym abi.go mv arch.go abi.go nowb.go phi.go pgen.go pgen_test.go ssa.go cmd/compile/internal/ssagen ' rm go.go gsubr.go Change-Id: I47fad6cbf1d1e583fd9139003a08401d7cd048a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/279476 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:57:10 -05:00
var leJumps = [2][2]ssagen.IndexJump{
cmd/compile: ARM comparisons with 0 incorrect on overflow Some ARM rewriting rules convert 'comparing to zero' conditions of if statements to a simplified version utilizing CMN and CMP instructions to branch over condition flags, in order to save one Add or Sub caculation. Such optimizations lead to wrong branching in case an overflow/underflow occurs when executing CMN or CMP. Fix the issue by introducing new block opcodes that don't honor the overflow/underflow flag: Block-Op Meaning ARM condition codes 1. LTnoov less than MI 2. GEnoov greater than or equal PL 3. LEnoov less than or equal MI || EQ 4. GTnoov greater than NEQ & PL The patch also adds a few test cases to cover scenarios that are specific to ARM and fine-tunes the code generation tests for 'x-const'. For more details please refer to the previous fix on 64-bit ARM: https://go-review.googlesource.com/c/go/+/233097 Go1 perf, 'old' is the non-optimized version, that is removing all concerned rewriting rules. name old time/op new time/op delta BinaryTree17-8 7.73s ± 0% 7.81s ± 0% +0.97% (p=0.000 n=7+8) Fannkuch11-8 7.06s ± 0% 7.00s ± 0% -0.83% (p=0.000 n=8+8) FmtFprintfEmpty-8 181ns ± 1% 183ns ± 1% +1.31% (p=0.001 n=8+8) FmtFprintfString-8 319ns ± 1% 325ns ± 2% +1.71% (p=0.009 n=7+8) FmtFprintfInt-8 358ns ± 1% 359ns ± 1% ~ (p=0.293 n=7+7) FmtFprintfIntInt-8 459ns ± 3% 456ns ± 1% ~ (p=0.869 n=8+8) FmtFprintfPrefixedInt-8 535ns ± 4% 538ns ± 4% ~ (p=0.572 n=8+8) FmtFprintfFloat-8 1.01µs ± 2% 1.01µs ± 2% ~ (p=0.625 n=8+8) FmtManyArgs-8 1.93µs ± 2% 1.93µs ± 1% ~ (p=0.979 n=8+7) GobDecode-8 16.1ms ± 1% 16.5ms ± 1% +2.32% (p=0.000 n=8+8) GobEncode-8 15.9ms ± 0% 15.8ms ± 1% -1.00% (p=0.000 n=8+7) Gzip-8 690ms ± 1% 670ms ± 0% -2.90% (p=0.000 n=8+8) Gunzip-8 109ms ± 1% 109ms ± 1% ~ (p=0.694 n=7+8) HTTPClientServer-8 149µs ± 3% 146µs ± 2% -1.70% (p=0.028 n=8+8) JSONEncode-8 50.5ms ± 1% 49.2ms ± 0% -2.60% (p=0.001 n=7+7) JSONDecode-8 135ms ± 2% 137ms ± 1% ~ (p=0.054 n=8+7) Mandelbrot200-8 951ms ± 0% 952ms ± 0% ~ (p=0.852 n=6+8) GoParse-8 9.47ms ± 1% 9.66ms ± 1% +2.01% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 288ns ± 2% 277ns ± 2% -3.61% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 1.66µs ± 1% 1.69µs ± 2% +2.21% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 334ns ± 1% 305ns ± 2% -8.86% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 2.14µs ± 2% 2.15µs ± 0% ~ (p=0.099 n=8+8) RegexpMatchMedium_32-8 13.3ns ± 1% 13.3ns ± 0% ~ (p=1.000 n=7+7) RegexpMatchMedium_1K-8 81.1µs ± 3% 80.7µs ± 1% ~ (p=0.955 n=7+8) RegexpMatchHard_32-8 4.26µs ± 0% 4.26µs ± 0% ~ (p=0.933 n=7+8) RegexpMatchHard_1K-8 124µs ± 0% 124µs ± 0% +0.31% (p=0.000 n=8+8) Revcomp-8 14.7ms ± 2% 14.5ms ± 1% -1.66% (p=0.003 n=8+8) Template-8 197ms ± 2% 200ms ± 3% +1.62% (p=0.021 n=8+8) TimeParse-8 1.33µs ± 1% 1.30µs ± 1% -1.86% (p=0.002 n=8+8) TimeFormat-8 3.04µs ± 1% 3.02µs ± 0% -0.60% (p=0.000 n=8+8) name old speed new speed delta GobDecode-8 47.6MB/s ± 1% 46.5MB/s ± 1% -2.28% (p=0.000 n=8+8) GobEncode-8 48.1MB/s ± 0% 48.6MB/s ± 1% +1.02% (p=0.000 n=8+7) Gzip-8 28.1MB/s ± 1% 29.0MB/s ± 0% +2.97% (p=0.000 n=8+8) Gunzip-8 178MB/s ± 1% 179MB/s ± 2% ~ (p=0.694 n=7+8) JSONEncode-8 38.4MB/s ± 1% 39.4MB/s ± 0% +2.67% (p=0.001 n=7+7) JSONDecode-8 14.3MB/s ± 2% 14.2MB/s ± 1% -0.81% (p=0.043 n=8+7) GoParse-8 6.12MB/s ± 1% 5.99MB/s ± 1% -2.00% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 111MB/s ± 2% 115MB/s ± 2% +3.77% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 618MB/s ± 1% 604MB/s ± 2% -2.16% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 95.7MB/s ± 1% 105.1MB/s ± 2% +9.76% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 479MB/s ± 2% 477MB/s ± 0% ~ (p=0.105 n=8+8) RegexpMatchMedium_32-8 75.2MB/s ± 1% 75.2MB/s ± 0% ~ (p=0.247 n=7+7) RegexpMatchMedium_1K-8 12.6MB/s ± 3% 12.7MB/s ± 1% ~ (p=0.538 n=7+8) RegexpMatchHard_32-8 7.52MB/s ± 0% 7.52MB/s ± 0% ~ (p=0.968 n=7+8) RegexpMatchHard_1K-8 8.26MB/s ± 0% 8.24MB/s ± 0% -0.30% (p=0.001 n=8+8) Revcomp-8 173MB/s ± 2% 176MB/s ± 1% +1.68% (p=0.003 n=8+8) Template-8 9.85MB/s ± 2% 9.69MB/s ± 3% -1.59% (p=0.021 n=8+8) Fixes #39303 Updates #38740 Change-Id: I0a5f87bfda679f66414c0041ace2ca2e28363f36 Reviewed-on: https://go-review.googlesource.com/c/go/+/236637 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-06-01 11:01:14 +00:00
{{Jump: arm.ABEQ, Index: 0}, {Jump: arm.ABPL, Index: 1}}, // next == b.Succs[0]
{{Jump: arm.ABMI, Index: 0}, {Jump: arm.ABEQ, Index: 0}}, // next == b.Succs[1]
}
// To model a 'GTnoov' ('>' without overflow checking) branching.
[dev.regabi] cmd/compile: split out package ssagen [generated] [git-generate] cd src/cmd/compile/internal/gc rf ' # maxOpenDefers is declared in ssa.go but used only by walk. mv maxOpenDefers walk.go # gc.Arch -> ssagen.Arch # It is not as nice but will do for now. mv Arch ArchInfo mv thearch Arch mv Arch ArchInfo arch.go # Pull dwarf out of pgen.go. mv debuginfo declPos createDwarfVars preInliningDcls \ createSimpleVars createSimpleVar \ createComplexVars createComplexVar \ dwarf.go # Pull high-level compilation out of pgen.go, # leaving only the SSA code. mv compilequeue funccompile compile compilenow \ compileFunctions isInlinableButNotInlined \ initLSym \ compile.go mv BoundsCheckFunc GCWriteBarrierReg ssa.go mv largeStack largeStackFrames CheckLargeStacks pgen.go # All that is left in dcl.go is the nowritebarrierrecCheck mv dcl.go nowb.go # Export API and unexport non-API. mv initssaconfig InitConfig mv isIntrinsicCall IsIntrinsicCall mv ssaDumpInline DumpInline mv initSSATables InitTables mv initSSAEnv InitEnv mv compileSSA Compile mv stackOffset StackOffset mv canSSAType TypeOK mv SSAGenState State mv FwdRefAux fwdRefAux mv cgoSymABIs CgoSymABIs mv readSymABIs ReadSymABIs mv initLSym InitLSym mv useABIWrapGen symabiDefs CgoSymABIs ReadSymABIs InitLSym selectLSym makeABIWrapper setupTextLSym abi.go mv arch.go abi.go nowb.go phi.go pgen.go pgen_test.go ssa.go cmd/compile/internal/ssagen ' rm go.go gsubr.go Change-Id: I47fad6cbf1d1e583fd9139003a08401d7cd048a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/279476 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:57:10 -05:00
var gtJumps = [2][2]ssagen.IndexJump{
cmd/compile: ARM comparisons with 0 incorrect on overflow Some ARM rewriting rules convert 'comparing to zero' conditions of if statements to a simplified version utilizing CMN and CMP instructions to branch over condition flags, in order to save one Add or Sub caculation. Such optimizations lead to wrong branching in case an overflow/underflow occurs when executing CMN or CMP. Fix the issue by introducing new block opcodes that don't honor the overflow/underflow flag: Block-Op Meaning ARM condition codes 1. LTnoov less than MI 2. GEnoov greater than or equal PL 3. LEnoov less than or equal MI || EQ 4. GTnoov greater than NEQ & PL The patch also adds a few test cases to cover scenarios that are specific to ARM and fine-tunes the code generation tests for 'x-const'. For more details please refer to the previous fix on 64-bit ARM: https://go-review.googlesource.com/c/go/+/233097 Go1 perf, 'old' is the non-optimized version, that is removing all concerned rewriting rules. name old time/op new time/op delta BinaryTree17-8 7.73s ± 0% 7.81s ± 0% +0.97% (p=0.000 n=7+8) Fannkuch11-8 7.06s ± 0% 7.00s ± 0% -0.83% (p=0.000 n=8+8) FmtFprintfEmpty-8 181ns ± 1% 183ns ± 1% +1.31% (p=0.001 n=8+8) FmtFprintfString-8 319ns ± 1% 325ns ± 2% +1.71% (p=0.009 n=7+8) FmtFprintfInt-8 358ns ± 1% 359ns ± 1% ~ (p=0.293 n=7+7) FmtFprintfIntInt-8 459ns ± 3% 456ns ± 1% ~ (p=0.869 n=8+8) FmtFprintfPrefixedInt-8 535ns ± 4% 538ns ± 4% ~ (p=0.572 n=8+8) FmtFprintfFloat-8 1.01µs ± 2% 1.01µs ± 2% ~ (p=0.625 n=8+8) FmtManyArgs-8 1.93µs ± 2% 1.93µs ± 1% ~ (p=0.979 n=8+7) GobDecode-8 16.1ms ± 1% 16.5ms ± 1% +2.32% (p=0.000 n=8+8) GobEncode-8 15.9ms ± 0% 15.8ms ± 1% -1.00% (p=0.000 n=8+7) Gzip-8 690ms ± 1% 670ms ± 0% -2.90% (p=0.000 n=8+8) Gunzip-8 109ms ± 1% 109ms ± 1% ~ (p=0.694 n=7+8) HTTPClientServer-8 149µs ± 3% 146µs ± 2% -1.70% (p=0.028 n=8+8) JSONEncode-8 50.5ms ± 1% 49.2ms ± 0% -2.60% (p=0.001 n=7+7) JSONDecode-8 135ms ± 2% 137ms ± 1% ~ (p=0.054 n=8+7) Mandelbrot200-8 951ms ± 0% 952ms ± 0% ~ (p=0.852 n=6+8) GoParse-8 9.47ms ± 1% 9.66ms ± 1% +2.01% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 288ns ± 2% 277ns ± 2% -3.61% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 1.66µs ± 1% 1.69µs ± 2% +2.21% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 334ns ± 1% 305ns ± 2% -8.86% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 2.14µs ± 2% 2.15µs ± 0% ~ (p=0.099 n=8+8) RegexpMatchMedium_32-8 13.3ns ± 1% 13.3ns ± 0% ~ (p=1.000 n=7+7) RegexpMatchMedium_1K-8 81.1µs ± 3% 80.7µs ± 1% ~ (p=0.955 n=7+8) RegexpMatchHard_32-8 4.26µs ± 0% 4.26µs ± 0% ~ (p=0.933 n=7+8) RegexpMatchHard_1K-8 124µs ± 0% 124µs ± 0% +0.31% (p=0.000 n=8+8) Revcomp-8 14.7ms ± 2% 14.5ms ± 1% -1.66% (p=0.003 n=8+8) Template-8 197ms ± 2% 200ms ± 3% +1.62% (p=0.021 n=8+8) TimeParse-8 1.33µs ± 1% 1.30µs ± 1% -1.86% (p=0.002 n=8+8) TimeFormat-8 3.04µs ± 1% 3.02µs ± 0% -0.60% (p=0.000 n=8+8) name old speed new speed delta GobDecode-8 47.6MB/s ± 1% 46.5MB/s ± 1% -2.28% (p=0.000 n=8+8) GobEncode-8 48.1MB/s ± 0% 48.6MB/s ± 1% +1.02% (p=0.000 n=8+7) Gzip-8 28.1MB/s ± 1% 29.0MB/s ± 0% +2.97% (p=0.000 n=8+8) Gunzip-8 178MB/s ± 1% 179MB/s ± 2% ~ (p=0.694 n=7+8) JSONEncode-8 38.4MB/s ± 1% 39.4MB/s ± 0% +2.67% (p=0.001 n=7+7) JSONDecode-8 14.3MB/s ± 2% 14.2MB/s ± 1% -0.81% (p=0.043 n=8+7) GoParse-8 6.12MB/s ± 1% 5.99MB/s ± 1% -2.00% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 111MB/s ± 2% 115MB/s ± 2% +3.77% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 618MB/s ± 1% 604MB/s ± 2% -2.16% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 95.7MB/s ± 1% 105.1MB/s ± 2% +9.76% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 479MB/s ± 2% 477MB/s ± 0% ~ (p=0.105 n=8+8) RegexpMatchMedium_32-8 75.2MB/s ± 1% 75.2MB/s ± 0% ~ (p=0.247 n=7+7) RegexpMatchMedium_1K-8 12.6MB/s ± 3% 12.7MB/s ± 1% ~ (p=0.538 n=7+8) RegexpMatchHard_32-8 7.52MB/s ± 0% 7.52MB/s ± 0% ~ (p=0.968 n=7+8) RegexpMatchHard_1K-8 8.26MB/s ± 0% 8.24MB/s ± 0% -0.30% (p=0.001 n=8+8) Revcomp-8 173MB/s ± 2% 176MB/s ± 1% +1.68% (p=0.003 n=8+8) Template-8 9.85MB/s ± 2% 9.69MB/s ± 3% -1.59% (p=0.021 n=8+8) Fixes #39303 Updates #38740 Change-Id: I0a5f87bfda679f66414c0041ace2ca2e28363f36 Reviewed-on: https://go-review.googlesource.com/c/go/+/236637 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-06-01 11:01:14 +00:00
{{Jump: arm.ABMI, Index: 1}, {Jump: arm.ABEQ, Index: 1}}, // next == b.Succs[0]
{{Jump: arm.ABEQ, Index: 1}, {Jump: arm.ABPL, Index: 0}}, // next == b.Succs[1]
}
[dev.regabi] cmd/compile: split out package ssagen [generated] [git-generate] cd src/cmd/compile/internal/gc rf ' # maxOpenDefers is declared in ssa.go but used only by walk. mv maxOpenDefers walk.go # gc.Arch -> ssagen.Arch # It is not as nice but will do for now. mv Arch ArchInfo mv thearch Arch mv Arch ArchInfo arch.go # Pull dwarf out of pgen.go. mv debuginfo declPos createDwarfVars preInliningDcls \ createSimpleVars createSimpleVar \ createComplexVars createComplexVar \ dwarf.go # Pull high-level compilation out of pgen.go, # leaving only the SSA code. mv compilequeue funccompile compile compilenow \ compileFunctions isInlinableButNotInlined \ initLSym \ compile.go mv BoundsCheckFunc GCWriteBarrierReg ssa.go mv largeStack largeStackFrames CheckLargeStacks pgen.go # All that is left in dcl.go is the nowritebarrierrecCheck mv dcl.go nowb.go # Export API and unexport non-API. mv initssaconfig InitConfig mv isIntrinsicCall IsIntrinsicCall mv ssaDumpInline DumpInline mv initSSATables InitTables mv initSSAEnv InitEnv mv compileSSA Compile mv stackOffset StackOffset mv canSSAType TypeOK mv SSAGenState State mv FwdRefAux fwdRefAux mv cgoSymABIs CgoSymABIs mv readSymABIs ReadSymABIs mv initLSym InitLSym mv useABIWrapGen symabiDefs CgoSymABIs ReadSymABIs InitLSym selectLSym makeABIWrapper setupTextLSym abi.go mv arch.go abi.go nowb.go phi.go pgen.go pgen_test.go ssa.go cmd/compile/internal/ssagen ' rm go.go gsubr.go Change-Id: I47fad6cbf1d1e583fd9139003a08401d7cd048a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/279476 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:57:10 -05:00
func ssaGenBlock(s *ssagen.State, b, next *ssa.Block) {
switch b.Kind {
cmd/compile, runtime: use PC of deferreturn for panic transfer this removes the old conditional-on-register-value handshake from the deferproc/deferprocstack logic. The "line" for the recovery-exit frame itself (not the defers that it runs) is the closing brace of the function. Reduces code size slightly (e.g. go command is 0.2% smaller) Sample output showing effect of this change, also what sort of code it requires to observe the effect: ``` package main import "os" func main() { g(len(os.Args) - 1) // stack[0] } var gi int var pi *int = &gi //go:noinline func g(i int) { switch i { case 0: defer func() { println("g0", i) q() // stack[2] if i == 0 }() for j := *pi; j < 1; j++ { defer func() { println("recover0", recover().(string)) }() } default: for j := *pi; j < 1; j++ { defer func() { println("g1", i) q() // stack[2] if i == 1 }() } defer func() { println("recover1", recover().(string)) }() } p() } // stack[1] (deferreturn) //go:noinline func p() { panic("p()") } //go:noinline func q() { panic("q()") // stack[3] } /* Sample output for "./foo foo": recover1 p() g1 1 panic: q() goroutine 1 [running]: main.q() .../main.go:46 +0x2c main.g.func3() .../main.go:29 +0x48 main.g(0x1?) .../main.go:37 +0x68 main.main() .../main.go:6 +0x28 */ ``` Change-Id: Ie39ea62ecc244213500380ea06d44024cadc2317 Reviewed-on: https://go-review.googlesource.com/c/go/+/650795 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-19 16:47:31 -05:00
case ssa.BlockPlain, ssa.BlockDefer:
if b.Succs[0].Block() != next {
p := s.Prog(obj.AJMP)
p.To.Type = obj.TYPE_BRANCH
[dev.regabi] cmd/compile: split out package ssagen [generated] [git-generate] cd src/cmd/compile/internal/gc rf ' # maxOpenDefers is declared in ssa.go but used only by walk. mv maxOpenDefers walk.go # gc.Arch -> ssagen.Arch # It is not as nice but will do for now. mv Arch ArchInfo mv thearch Arch mv Arch ArchInfo arch.go # Pull dwarf out of pgen.go. mv debuginfo declPos createDwarfVars preInliningDcls \ createSimpleVars createSimpleVar \ createComplexVars createComplexVar \ dwarf.go # Pull high-level compilation out of pgen.go, # leaving only the SSA code. mv compilequeue funccompile compile compilenow \ compileFunctions isInlinableButNotInlined \ initLSym \ compile.go mv BoundsCheckFunc GCWriteBarrierReg ssa.go mv largeStack largeStackFrames CheckLargeStacks pgen.go # All that is left in dcl.go is the nowritebarrierrecCheck mv dcl.go nowb.go # Export API and unexport non-API. mv initssaconfig InitConfig mv isIntrinsicCall IsIntrinsicCall mv ssaDumpInline DumpInline mv initSSATables InitTables mv initSSAEnv InitEnv mv compileSSA Compile mv stackOffset StackOffset mv canSSAType TypeOK mv SSAGenState State mv FwdRefAux fwdRefAux mv cgoSymABIs CgoSymABIs mv readSymABIs ReadSymABIs mv initLSym InitLSym mv useABIWrapGen symabiDefs CgoSymABIs ReadSymABIs InitLSym selectLSym makeABIWrapper setupTextLSym abi.go mv arch.go abi.go nowb.go phi.go pgen.go pgen_test.go ssa.go cmd/compile/internal/ssagen ' rm go.go gsubr.go Change-Id: I47fad6cbf1d1e583fd9139003a08401d7cd048a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/279476 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:57:10 -05:00
s.Branches = append(s.Branches, ssagen.Branch{P: p, B: b.Succs[0].Block()})
}
cmd/compile: restore tail call for method wrappers For certain type of method wrappers we used to generate a tail call. That was disabled in CL 307234 when register ABI is used, because with the current IR it was difficult to generate a tail call with the arguments in the right places. The problem was that the IR does not contain a CALL-like node with arguments; instead, it contains an OAS node that adjusts the receiver, than an OTAILCALL node that just contains the target, but no argument (with the assumption that the OAS node will put the adjusted receiver in the right place). With register ABI, putting arguments in registers are done in SSA. The assignment (OAS) doesn't put the receiver in register. This CL changes the IR of a tail call to take an actual OCALL node. Specifically, a tail call is represented as OTAILCALL (OCALL target args...) This way, the call target and args are connected through the OCALL node. So the call can be analyzed in SSA and the args can be passed in the right places. (Alternatively, we could have OTAILCALL node directly take the target and the args, without the OCALL node. Using an OCALL node is convenient as there are existing code that processes OCALL nodes which do not need to be changed. Also, a tail call is similar to ORETURN (OCALL target args...), except it doesn't preserve the frame. I did the former but I'm open to change.) The SSA representation is similar. Previously, the IR lowers to a Store the receiver then a BlockRetJmp which jumps to the target (without putting the arg in register). Now we use a TailCall op, which takes the target and the args. The call expansion pass and the register allocator handles TailCall pretty much like a StaticCall, and it will do the right ABI analysis and put the args in the right places. (Args other than the receiver are already in the right places. For register args it generates no code for them. For stack args currently it generates a self copy. I'll work on optimize that out.) BlockRetJmp is still used, signaling it is a tail call. The actual call is made in the TailCall op so BlockRetJmp generates no code (we could use BlockExit if we like). This slightly reduces binary size: old new cmd/go 14003088 13953936 cmd/link 6275552 6271456 Change-Id: I2d16d8d419fe1f17554916d317427383e17e27f0 Reviewed-on: https://go-review.googlesource.com/c/go/+/350145 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: David Chase <drchase@google.com>
2021-09-10 22:05:55 -04:00
case ssa.BlockExit, ssa.BlockRetJmp:
case ssa.BlockRet:
s.Prog(obj.ARET)
case ssa.BlockARMEQ, ssa.BlockARMNE,
ssa.BlockARMLT, ssa.BlockARMGE,
ssa.BlockARMLE, ssa.BlockARMGT,
ssa.BlockARMULT, ssa.BlockARMUGT,
cmd/compile: ARM comparisons with 0 incorrect on overflow Some ARM rewriting rules convert 'comparing to zero' conditions of if statements to a simplified version utilizing CMN and CMP instructions to branch over condition flags, in order to save one Add or Sub caculation. Such optimizations lead to wrong branching in case an overflow/underflow occurs when executing CMN or CMP. Fix the issue by introducing new block opcodes that don't honor the overflow/underflow flag: Block-Op Meaning ARM condition codes 1. LTnoov less than MI 2. GEnoov greater than or equal PL 3. LEnoov less than or equal MI || EQ 4. GTnoov greater than NEQ & PL The patch also adds a few test cases to cover scenarios that are specific to ARM and fine-tunes the code generation tests for 'x-const'. For more details please refer to the previous fix on 64-bit ARM: https://go-review.googlesource.com/c/go/+/233097 Go1 perf, 'old' is the non-optimized version, that is removing all concerned rewriting rules. name old time/op new time/op delta BinaryTree17-8 7.73s ± 0% 7.81s ± 0% +0.97% (p=0.000 n=7+8) Fannkuch11-8 7.06s ± 0% 7.00s ± 0% -0.83% (p=0.000 n=8+8) FmtFprintfEmpty-8 181ns ± 1% 183ns ± 1% +1.31% (p=0.001 n=8+8) FmtFprintfString-8 319ns ± 1% 325ns ± 2% +1.71% (p=0.009 n=7+8) FmtFprintfInt-8 358ns ± 1% 359ns ± 1% ~ (p=0.293 n=7+7) FmtFprintfIntInt-8 459ns ± 3% 456ns ± 1% ~ (p=0.869 n=8+8) FmtFprintfPrefixedInt-8 535ns ± 4% 538ns ± 4% ~ (p=0.572 n=8+8) FmtFprintfFloat-8 1.01µs ± 2% 1.01µs ± 2% ~ (p=0.625 n=8+8) FmtManyArgs-8 1.93µs ± 2% 1.93µs ± 1% ~ (p=0.979 n=8+7) GobDecode-8 16.1ms ± 1% 16.5ms ± 1% +2.32% (p=0.000 n=8+8) GobEncode-8 15.9ms ± 0% 15.8ms ± 1% -1.00% (p=0.000 n=8+7) Gzip-8 690ms ± 1% 670ms ± 0% -2.90% (p=0.000 n=8+8) Gunzip-8 109ms ± 1% 109ms ± 1% ~ (p=0.694 n=7+8) HTTPClientServer-8 149µs ± 3% 146µs ± 2% -1.70% (p=0.028 n=8+8) JSONEncode-8 50.5ms ± 1% 49.2ms ± 0% -2.60% (p=0.001 n=7+7) JSONDecode-8 135ms ± 2% 137ms ± 1% ~ (p=0.054 n=8+7) Mandelbrot200-8 951ms ± 0% 952ms ± 0% ~ (p=0.852 n=6+8) GoParse-8 9.47ms ± 1% 9.66ms ± 1% +2.01% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 288ns ± 2% 277ns ± 2% -3.61% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 1.66µs ± 1% 1.69µs ± 2% +2.21% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 334ns ± 1% 305ns ± 2% -8.86% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 2.14µs ± 2% 2.15µs ± 0% ~ (p=0.099 n=8+8) RegexpMatchMedium_32-8 13.3ns ± 1% 13.3ns ± 0% ~ (p=1.000 n=7+7) RegexpMatchMedium_1K-8 81.1µs ± 3% 80.7µs ± 1% ~ (p=0.955 n=7+8) RegexpMatchHard_32-8 4.26µs ± 0% 4.26µs ± 0% ~ (p=0.933 n=7+8) RegexpMatchHard_1K-8 124µs ± 0% 124µs ± 0% +0.31% (p=0.000 n=8+8) Revcomp-8 14.7ms ± 2% 14.5ms ± 1% -1.66% (p=0.003 n=8+8) Template-8 197ms ± 2% 200ms ± 3% +1.62% (p=0.021 n=8+8) TimeParse-8 1.33µs ± 1% 1.30µs ± 1% -1.86% (p=0.002 n=8+8) TimeFormat-8 3.04µs ± 1% 3.02µs ± 0% -0.60% (p=0.000 n=8+8) name old speed new speed delta GobDecode-8 47.6MB/s ± 1% 46.5MB/s ± 1% -2.28% (p=0.000 n=8+8) GobEncode-8 48.1MB/s ± 0% 48.6MB/s ± 1% +1.02% (p=0.000 n=8+7) Gzip-8 28.1MB/s ± 1% 29.0MB/s ± 0% +2.97% (p=0.000 n=8+8) Gunzip-8 178MB/s ± 1% 179MB/s ± 2% ~ (p=0.694 n=7+8) JSONEncode-8 38.4MB/s ± 1% 39.4MB/s ± 0% +2.67% (p=0.001 n=7+7) JSONDecode-8 14.3MB/s ± 2% 14.2MB/s ± 1% -0.81% (p=0.043 n=8+7) GoParse-8 6.12MB/s ± 1% 5.99MB/s ± 1% -2.00% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 111MB/s ± 2% 115MB/s ± 2% +3.77% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 618MB/s ± 1% 604MB/s ± 2% -2.16% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 95.7MB/s ± 1% 105.1MB/s ± 2% +9.76% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 479MB/s ± 2% 477MB/s ± 0% ~ (p=0.105 n=8+8) RegexpMatchMedium_32-8 75.2MB/s ± 1% 75.2MB/s ± 0% ~ (p=0.247 n=7+7) RegexpMatchMedium_1K-8 12.6MB/s ± 3% 12.7MB/s ± 1% ~ (p=0.538 n=7+8) RegexpMatchHard_32-8 7.52MB/s ± 0% 7.52MB/s ± 0% ~ (p=0.968 n=7+8) RegexpMatchHard_1K-8 8.26MB/s ± 0% 8.24MB/s ± 0% -0.30% (p=0.001 n=8+8) Revcomp-8 173MB/s ± 2% 176MB/s ± 1% +1.68% (p=0.003 n=8+8) Template-8 9.85MB/s ± 2% 9.69MB/s ± 3% -1.59% (p=0.021 n=8+8) Fixes #39303 Updates #38740 Change-Id: I0a5f87bfda679f66414c0041ace2ca2e28363f36 Reviewed-on: https://go-review.googlesource.com/c/go/+/236637 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-06-01 11:01:14 +00:00
ssa.BlockARMULE, ssa.BlockARMUGE,
ssa.BlockARMLTnoov, ssa.BlockARMGEnoov:
jmp := blockJump[b.Kind]
switch next {
case b.Succs[0].Block():
s.Br(jmp.invasm, b.Succs[1].Block())
case b.Succs[1].Block():
s.Br(jmp.asm, b.Succs[0].Block())
default:
if b.Likely != ssa.BranchUnlikely {
s.Br(jmp.asm, b.Succs[0].Block())
s.Br(obj.AJMP, b.Succs[1].Block())
} else {
s.Br(jmp.invasm, b.Succs[1].Block())
s.Br(obj.AJMP, b.Succs[0].Block())
}
}
cmd/compile: ARM comparisons with 0 incorrect on overflow Some ARM rewriting rules convert 'comparing to zero' conditions of if statements to a simplified version utilizing CMN and CMP instructions to branch over condition flags, in order to save one Add or Sub caculation. Such optimizations lead to wrong branching in case an overflow/underflow occurs when executing CMN or CMP. Fix the issue by introducing new block opcodes that don't honor the overflow/underflow flag: Block-Op Meaning ARM condition codes 1. LTnoov less than MI 2. GEnoov greater than or equal PL 3. LEnoov less than or equal MI || EQ 4. GTnoov greater than NEQ & PL The patch also adds a few test cases to cover scenarios that are specific to ARM and fine-tunes the code generation tests for 'x-const'. For more details please refer to the previous fix on 64-bit ARM: https://go-review.googlesource.com/c/go/+/233097 Go1 perf, 'old' is the non-optimized version, that is removing all concerned rewriting rules. name old time/op new time/op delta BinaryTree17-8 7.73s ± 0% 7.81s ± 0% +0.97% (p=0.000 n=7+8) Fannkuch11-8 7.06s ± 0% 7.00s ± 0% -0.83% (p=0.000 n=8+8) FmtFprintfEmpty-8 181ns ± 1% 183ns ± 1% +1.31% (p=0.001 n=8+8) FmtFprintfString-8 319ns ± 1% 325ns ± 2% +1.71% (p=0.009 n=7+8) FmtFprintfInt-8 358ns ± 1% 359ns ± 1% ~ (p=0.293 n=7+7) FmtFprintfIntInt-8 459ns ± 3% 456ns ± 1% ~ (p=0.869 n=8+8) FmtFprintfPrefixedInt-8 535ns ± 4% 538ns ± 4% ~ (p=0.572 n=8+8) FmtFprintfFloat-8 1.01µs ± 2% 1.01µs ± 2% ~ (p=0.625 n=8+8) FmtManyArgs-8 1.93µs ± 2% 1.93µs ± 1% ~ (p=0.979 n=8+7) GobDecode-8 16.1ms ± 1% 16.5ms ± 1% +2.32% (p=0.000 n=8+8) GobEncode-8 15.9ms ± 0% 15.8ms ± 1% -1.00% (p=0.000 n=8+7) Gzip-8 690ms ± 1% 670ms ± 0% -2.90% (p=0.000 n=8+8) Gunzip-8 109ms ± 1% 109ms ± 1% ~ (p=0.694 n=7+8) HTTPClientServer-8 149µs ± 3% 146µs ± 2% -1.70% (p=0.028 n=8+8) JSONEncode-8 50.5ms ± 1% 49.2ms ± 0% -2.60% (p=0.001 n=7+7) JSONDecode-8 135ms ± 2% 137ms ± 1% ~ (p=0.054 n=8+7) Mandelbrot200-8 951ms ± 0% 952ms ± 0% ~ (p=0.852 n=6+8) GoParse-8 9.47ms ± 1% 9.66ms ± 1% +2.01% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 288ns ± 2% 277ns ± 2% -3.61% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 1.66µs ± 1% 1.69µs ± 2% +2.21% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 334ns ± 1% 305ns ± 2% -8.86% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 2.14µs ± 2% 2.15µs ± 0% ~ (p=0.099 n=8+8) RegexpMatchMedium_32-8 13.3ns ± 1% 13.3ns ± 0% ~ (p=1.000 n=7+7) RegexpMatchMedium_1K-8 81.1µs ± 3% 80.7µs ± 1% ~ (p=0.955 n=7+8) RegexpMatchHard_32-8 4.26µs ± 0% 4.26µs ± 0% ~ (p=0.933 n=7+8) RegexpMatchHard_1K-8 124µs ± 0% 124µs ± 0% +0.31% (p=0.000 n=8+8) Revcomp-8 14.7ms ± 2% 14.5ms ± 1% -1.66% (p=0.003 n=8+8) Template-8 197ms ± 2% 200ms ± 3% +1.62% (p=0.021 n=8+8) TimeParse-8 1.33µs ± 1% 1.30µs ± 1% -1.86% (p=0.002 n=8+8) TimeFormat-8 3.04µs ± 1% 3.02µs ± 0% -0.60% (p=0.000 n=8+8) name old speed new speed delta GobDecode-8 47.6MB/s ± 1% 46.5MB/s ± 1% -2.28% (p=0.000 n=8+8) GobEncode-8 48.1MB/s ± 0% 48.6MB/s ± 1% +1.02% (p=0.000 n=8+7) Gzip-8 28.1MB/s ± 1% 29.0MB/s ± 0% +2.97% (p=0.000 n=8+8) Gunzip-8 178MB/s ± 1% 179MB/s ± 2% ~ (p=0.694 n=7+8) JSONEncode-8 38.4MB/s ± 1% 39.4MB/s ± 0% +2.67% (p=0.001 n=7+7) JSONDecode-8 14.3MB/s ± 2% 14.2MB/s ± 1% -0.81% (p=0.043 n=8+7) GoParse-8 6.12MB/s ± 1% 5.99MB/s ± 1% -2.00% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 111MB/s ± 2% 115MB/s ± 2% +3.77% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 618MB/s ± 1% 604MB/s ± 2% -2.16% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 95.7MB/s ± 1% 105.1MB/s ± 2% +9.76% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 479MB/s ± 2% 477MB/s ± 0% ~ (p=0.105 n=8+8) RegexpMatchMedium_32-8 75.2MB/s ± 1% 75.2MB/s ± 0% ~ (p=0.247 n=7+7) RegexpMatchMedium_1K-8 12.6MB/s ± 3% 12.7MB/s ± 1% ~ (p=0.538 n=7+8) RegexpMatchHard_32-8 7.52MB/s ± 0% 7.52MB/s ± 0% ~ (p=0.968 n=7+8) RegexpMatchHard_1K-8 8.26MB/s ± 0% 8.24MB/s ± 0% -0.30% (p=0.001 n=8+8) Revcomp-8 173MB/s ± 2% 176MB/s ± 1% +1.68% (p=0.003 n=8+8) Template-8 9.85MB/s ± 2% 9.69MB/s ± 3% -1.59% (p=0.021 n=8+8) Fixes #39303 Updates #38740 Change-Id: I0a5f87bfda679f66414c0041ace2ca2e28363f36 Reviewed-on: https://go-review.googlesource.com/c/go/+/236637 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-06-01 11:01:14 +00:00
case ssa.BlockARMLEnoov:
s.CombJump(b, next, &leJumps)
case ssa.BlockARMGTnoov:
s.CombJump(b, next, &gtJumps)
default:
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
b.Fatalf("branch not implemented: %s", b.LongString())
}
}