go/src/cmd/compile/internal/x86/ssa.go

962 lines
27 KiB
Go
Raw Normal View History

// Copyright 2016 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package x86
import (
"fmt"
"math"
"cmd/compile/internal/gc"
"cmd/compile/internal/logopt"
"cmd/compile/internal/ssa"
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
"cmd/compile/internal/types"
"cmd/internal/obj"
"cmd/internal/obj/x86"
)
// markMoves marks any MOVXconst ops that need to avoid clobbering flags.
func ssaMarkMoves(s *gc.SSAGenState, b *ssa.Block) {
flive := b.FlagsLiveAtEnd
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
for _, c := range b.ControlValues() {
flive = c.Type.IsFlags() || flive
}
for i := len(b.Values) - 1; i >= 0; i-- {
v := b.Values[i]
if flive && v.Op == ssa.Op386MOVLconst {
// The "mark" is any non-nil Aux value.
v.Aux = v
}
if v.Type.IsFlags() {
flive = false
}
for _, a := range v.Args {
if a.Type.IsFlags() {
flive = true
}
}
}
}
// loadByType returns the load instruction of the given type.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func loadByType(t *types.Type) obj.As {
// Avoid partial register write
if !t.IsFloat() && t.Size() <= 2 {
if t.Size() == 1 {
return x86.AMOVBLZX
} else {
return x86.AMOVWLZX
}
}
// Otherwise, there's no difference between load and store opcodes.
return storeByType(t)
}
// storeByType returns the store instruction of the given type.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func storeByType(t *types.Type) obj.As {
width := t.Size()
if t.IsFloat() {
switch width {
case 4:
return x86.AMOVSS
case 8:
return x86.AMOVSD
}
} else {
switch width {
case 1:
return x86.AMOVB
case 2:
return x86.AMOVW
case 4:
return x86.AMOVL
}
}
panic("bad store type")
}
// moveByType returns the reg->reg move instruction of the given type.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func moveByType(t *types.Type) obj.As {
if t.IsFloat() {
switch t.Size() {
case 4:
return x86.AMOVSS
case 8:
return x86.AMOVSD
default:
panic(fmt.Sprintf("bad float register width %d:%s", t.Size(), t))
}
} else {
switch t.Size() {
case 1:
// Avoids partial register write
return x86.AMOVL
case 2:
return x86.AMOVL
case 4:
return x86.AMOVL
default:
panic(fmt.Sprintf("bad int register width %d:%s", t.Size(), t))
}
}
}
// opregreg emits instructions for
// dest := dest(To) op src(From)
// and also returns the created obj.Prog so it
// may be further adjusted (offset, scale, etc).
func opregreg(s *gc.SSAGenState, op obj.As, dest, src int16) *obj.Prog {
p := s.Prog(op)
p.From.Type = obj.TYPE_REG
p.To.Type = obj.TYPE_REG
p.To.Reg = dest
p.From.Reg = src
return p
}
func ssaGenValue(s *gc.SSAGenState, v *ssa.Value) {
switch v.Op {
case ssa.Op386ADDL:
r := v.Reg()
r1 := v.Args[0].Reg()
r2 := v.Args[1].Reg()
switch {
case r == r1:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = r2
p.To.Type = obj.TYPE_REG
p.To.Reg = r
case r == r2:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = r1
p.To.Type = obj.TYPE_REG
p.To.Reg = r
default:
p := s.Prog(x86.ALEAL)
p.From.Type = obj.TYPE_MEM
p.From.Reg = r1
p.From.Scale = 1
p.From.Index = r2
p.To.Type = obj.TYPE_REG
p.To.Reg = r
}
// 2-address opcode arithmetic
case ssa.Op386SUBL,
ssa.Op386MULL,
ssa.Op386ANDL,
ssa.Op386ORL,
ssa.Op386XORL,
ssa.Op386SHLL,
ssa.Op386SHRL, ssa.Op386SHRW, ssa.Op386SHRB,
ssa.Op386SARL, ssa.Op386SARW, ssa.Op386SARB,
ssa.Op386ADDSS, ssa.Op386ADDSD, ssa.Op386SUBSS, ssa.Op386SUBSD,
ssa.Op386MULSS, ssa.Op386MULSD, ssa.Op386DIVSS, ssa.Op386DIVSD,
ssa.Op386PXOR,
ssa.Op386ADCL,
ssa.Op386SBBL:
r := v.Reg()
if r != v.Args[0].Reg() {
v.Fatalf("input[0] and output not in same register %s", v.LongString())
}
opregreg(s, v.Op.Asm(), r, v.Args[1].Reg())
case ssa.Op386ADDLcarry, ssa.Op386SUBLcarry:
// output 0 is carry/borrow, output 1 is the low 32 bits.
r := v.Reg0()
if r != v.Args[0].Reg() {
v.Fatalf("input[0] and output[0] not in same register %s", v.LongString())
}
opregreg(s, v.Op.Asm(), r, v.Args[1].Reg())
case ssa.Op386ADDLconstcarry, ssa.Op386SUBLconstcarry:
// output 0 is carry/borrow, output 1 is the low 32 bits.
r := v.Reg0()
if r != v.Args[0].Reg() {
v.Fatalf("input[0] and output[0] not in same register %s", v.LongString())
}
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
p.From.Offset = v.AuxInt
p.To.Type = obj.TYPE_REG
p.To.Reg = r
case ssa.Op386DIVL, ssa.Op386DIVW,
ssa.Op386DIVLU, ssa.Op386DIVWU,
ssa.Op386MODL, ssa.Op386MODW,
ssa.Op386MODLU, ssa.Op386MODWU:
// Arg[0] is already in AX as it's the only register we allow
// and AX is the only output
x := v.Args[1].Reg()
// CPU faults upon signed overflow, which occurs when most
// negative int is divided by -1.
var j *obj.Prog
if v.Op == ssa.Op386DIVL || v.Op == ssa.Op386DIVW ||
v.Op == ssa.Op386MODL || v.Op == ssa.Op386MODW {
if ssa.NeedsFixUp(v) {
var c *obj.Prog
switch v.Op {
case ssa.Op386DIVL, ssa.Op386MODL:
c = s.Prog(x86.ACMPL)
j = s.Prog(x86.AJEQ)
case ssa.Op386DIVW, ssa.Op386MODW:
c = s.Prog(x86.ACMPW)
j = s.Prog(x86.AJEQ)
}
c.From.Type = obj.TYPE_REG
c.From.Reg = x
c.To.Type = obj.TYPE_CONST
c.To.Offset = -1
j.To.Type = obj.TYPE_BRANCH
}
// sign extend the dividend
switch v.Op {
case ssa.Op386DIVL, ssa.Op386MODL:
s.Prog(x86.ACDQ)
case ssa.Op386DIVW, ssa.Op386MODW:
s.Prog(x86.ACWD)
}
}
// for unsigned ints, we sign extend by setting DX = 0
// signed ints were sign extended above
if v.Op == ssa.Op386DIVLU || v.Op == ssa.Op386MODLU ||
v.Op == ssa.Op386DIVWU || v.Op == ssa.Op386MODWU {
c := s.Prog(x86.AXORL)
c.From.Type = obj.TYPE_REG
c.From.Reg = x86.REG_DX
c.To.Type = obj.TYPE_REG
c.To.Reg = x86.REG_DX
}
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = x
// signed division, rest of the check for -1 case
if j != nil {
j2 := s.Prog(obj.AJMP)
j2.To.Type = obj.TYPE_BRANCH
var n *obj.Prog
if v.Op == ssa.Op386DIVL || v.Op == ssa.Op386DIVW {
// n * -1 = -n
n = s.Prog(x86.ANEGL)
n.To.Type = obj.TYPE_REG
n.To.Reg = x86.REG_AX
} else {
// n % -1 == 0
n = s.Prog(x86.AXORL)
n.From.Type = obj.TYPE_REG
n.From.Reg = x86.REG_DX
n.To.Type = obj.TYPE_REG
n.To.Reg = x86.REG_DX
}
j.To.Val = n
j2.To.Val = s.Pc()
}
case ssa.Op386HMULL, ssa.Op386HMULLU:
// the frontend rewrites constant division by 8/16/32 bit integers into
// HMUL by a constant
// SSA rewrites generate the 64 bit versions
// Arg[0] is already in AX as it's the only register we allow
// and DX is the only output we care about (the high bits)
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[1].Reg()
// IMULB puts the high portion in AH instead of DL,
// so move it to DL for consistency
if v.Type.Size() == 1 {
m := s.Prog(x86.AMOVB)
m.From.Type = obj.TYPE_REG
m.From.Reg = x86.REG_AH
m.To.Type = obj.TYPE_REG
m.To.Reg = x86.REG_DX
}
case ssa.Op386MULLU:
// Arg[0] is already in AX as it's the only register we allow
// results lo in AX
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[1].Reg()
case ssa.Op386MULLQU:
// AX * args[1], high 32 bits in DX (result[0]), low 32 bits in AX (result[1]).
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[1].Reg()
case ssa.Op386AVGLU:
// compute (x+y)/2 unsigned.
// Do a 32-bit add, the overflow goes into the carry.
// Shift right once and pull the carry back into the 31st bit.
r := v.Reg()
if r != v.Args[0].Reg() {
v.Fatalf("input[0] and output not in same register %s", v.LongString())
}
p := s.Prog(x86.AADDL)
p.From.Type = obj.TYPE_REG
p.To.Type = obj.TYPE_REG
p.To.Reg = r
p.From.Reg = v.Args[1].Reg()
p = s.Prog(x86.ARCRL)
p.From.Type = obj.TYPE_CONST
p.From.Offset = 1
p.To.Type = obj.TYPE_REG
p.To.Reg = r
case ssa.Op386ADDLconst:
r := v.Reg()
a := v.Args[0].Reg()
if r == a {
if v.AuxInt == 1 {
p := s.Prog(x86.AINCL)
p.To.Type = obj.TYPE_REG
p.To.Reg = r
return
}
if v.AuxInt == -1 {
p := s.Prog(x86.ADECL)
p.To.Type = obj.TYPE_REG
p.To.Reg = r
return
}
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
p.From.Offset = v.AuxInt
p.To.Type = obj.TYPE_REG
p.To.Reg = r
return
}
p := s.Prog(x86.ALEAL)
p.From.Type = obj.TYPE_MEM
p.From.Reg = a
p.From.Offset = v.AuxInt
p.To.Type = obj.TYPE_REG
p.To.Reg = r
case ssa.Op386MULLconst:
r := v.Reg()
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
p.From.Offset = v.AuxInt
p.To.Type = obj.TYPE_REG
p.To.Reg = r
p.SetFrom3(obj.Addr{Type: obj.TYPE_REG, Reg: v.Args[0].Reg()})
case ssa.Op386SUBLconst,
ssa.Op386ADCLconst,
ssa.Op386SBBLconst,
ssa.Op386ANDLconst,
ssa.Op386ORLconst,
ssa.Op386XORLconst,
ssa.Op386SHLLconst,
ssa.Op386SHRLconst, ssa.Op386SHRWconst, ssa.Op386SHRBconst,
ssa.Op386SARLconst, ssa.Op386SARWconst, ssa.Op386SARBconst,
ssa.Op386ROLLconst, ssa.Op386ROLWconst, ssa.Op386ROLBconst:
r := v.Reg()
if r != v.Args[0].Reg() {
v.Fatalf("input[0] and output not in same register %s", v.LongString())
}
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
p.From.Offset = v.AuxInt
p.To.Type = obj.TYPE_REG
p.To.Reg = r
case ssa.Op386SBBLcarrymask:
r := v.Reg()
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = r
p.To.Type = obj.TYPE_REG
p.To.Reg = r
case ssa.Op386LEAL1, ssa.Op386LEAL2, ssa.Op386LEAL4, ssa.Op386LEAL8:
r := v.Args[0].Reg()
i := v.Args[1].Reg()
p := s.Prog(x86.ALEAL)
switch v.Op {
case ssa.Op386LEAL1:
p.From.Scale = 1
if i == x86.REG_SP {
r, i = i, r
}
case ssa.Op386LEAL2:
p.From.Scale = 2
case ssa.Op386LEAL4:
p.From.Scale = 4
case ssa.Op386LEAL8:
p.From.Scale = 8
}
p.From.Type = obj.TYPE_MEM
p.From.Reg = r
p.From.Index = i
gc.AddAux(&p.From, v)
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.Op386LEAL:
p := s.Prog(x86.ALEAL)
p.From.Type = obj.TYPE_MEM
p.From.Reg = v.Args[0].Reg()
gc.AddAux(&p.From, v)
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.Op386CMPL, ssa.Op386CMPW, ssa.Op386CMPB,
ssa.Op386TESTL, ssa.Op386TESTW, ssa.Op386TESTB:
opregreg(s, v.Op.Asm(), v.Args[1].Reg(), v.Args[0].Reg())
case ssa.Op386UCOMISS, ssa.Op386UCOMISD:
// Go assembler has swapped operands for UCOMISx relative to CMP,
// must account for that right here.
opregreg(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg())
case ssa.Op386CMPLconst, ssa.Op386CMPWconst, ssa.Op386CMPBconst:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
p.To.Type = obj.TYPE_CONST
p.To.Offset = v.AuxInt
case ssa.Op386TESTLconst, ssa.Op386TESTWconst, ssa.Op386TESTBconst:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
p.From.Offset = v.AuxInt
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Args[0].Reg()
cmd/compile: optimize 386's comparison CMPL/CMPW/CMPB can take a memory operand on 386, and this CL implements that optimization. 1. The total size of pkg/linux_386 decreases about 45KB, excluding cmd/compile. 2. The go1 benchmark shows a little improvement. name old time/op new time/op delta BinaryTree17-4 3.36s ± 2% 3.37s ± 3% ~ (p=0.537 n=40+40) Fannkuch11-4 3.59s ± 1% 3.53s ± 2% -1.58% (p=0.000 n=40+40) FmtFprintfEmpty-4 46.0ns ± 3% 45.8ns ± 3% ~ (p=0.249 n=40+40) FmtFprintfString-4 80.0ns ± 4% 78.8ns ± 3% -1.49% (p=0.001 n=40+40) FmtFprintfInt-4 89.7ns ± 2% 90.3ns ± 2% +0.74% (p=0.003 n=40+40) FmtFprintfIntInt-4 144ns ± 3% 143ns ± 3% -0.95% (p=0.003 n=40+40) FmtFprintfPrefixedInt-4 181ns ± 4% 180ns ± 2% ~ (p=0.103 n=40+40) FmtFprintfFloat-4 412ns ± 3% 408ns ± 4% -0.97% (p=0.018 n=40+40) FmtManyArgs-4 607ns ± 4% 605ns ± 4% ~ (p=0.148 n=40+40) GobDecode-4 7.19ms ± 4% 7.24ms ± 5% ~ (p=0.340 n=40+40) GobEncode-4 7.04ms ± 9% 6.99ms ± 9% ~ (p=0.289 n=40+40) Gzip-4 400ms ± 6% 398ms ± 5% ~ (p=0.168 n=40+40) Gunzip-4 41.2ms ± 3% 41.7ms ± 3% +1.40% (p=0.001 n=40+40) HTTPClientServer-4 62.5µs ± 1% 62.1µs ± 2% -0.61% (p=0.000 n=37+37) JSONEncode-4 20.7ms ± 4% 20.4ms ± 3% -1.60% (p=0.000 n=40+40) JSONDecode-4 69.4ms ± 4% 69.2ms ± 6% ~ (p=0.177 n=40+40) Mandelbrot200-4 5.22ms ± 6% 5.21ms ± 3% ~ (p=0.531 n=40+40) GoParse-4 3.29ms ± 3% 3.28ms ± 3% ~ (p=0.321 n=40+39) RegexpMatchEasy0_32-4 104ns ± 4% 103ns ± 7% -0.89% (p=0.040 n=40+40) RegexpMatchEasy0_1K-4 852ns ± 3% 853ns ± 2% ~ (p=0.357 n=40+40) RegexpMatchEasy1_32-4 113ns ± 8% 113ns ± 3% ~ (p=0.906 n=40+40) RegexpMatchEasy1_1K-4 1.03µs ± 4% 1.03µs ± 5% ~ (p=0.326 n=40+40) RegexpMatchMedium_32-4 136ns ± 3% 133ns ± 3% -2.31% (p=0.000 n=40+40) RegexpMatchMedium_1K-4 44.0µs ± 3% 43.7µs ± 3% ~ (p=0.053 n=40+40) RegexpMatchHard_32-4 2.27µs ± 3% 2.26µs ± 4% ~ (p=0.391 n=40+40) RegexpMatchHard_1K-4 68.0µs ± 3% 68.9µs ± 3% +1.28% (p=0.000 n=40+40) Revcomp-4 1.86s ± 5% 1.86s ± 2% ~ (p=0.950 n=40+40) Template-4 73.4ms ± 4% 69.9ms ± 7% -4.78% (p=0.000 n=40+40) TimeParse-4 449ns ± 4% 441ns ± 5% -1.76% (p=0.000 n=40+40) TimeFormat-4 416ns ± 3% 417ns ± 4% ~ (p=0.304 n=40+40) [Geo mean] 67.7µs 67.3µs -0.55% name old speed new speed delta GobDecode-4 107MB/s ± 4% 106MB/s ± 5% ~ (p=0.336 n=40+40) GobEncode-4 109MB/s ± 5% 110MB/s ± 9% ~ (p=0.142 n=38+40) Gzip-4 48.5MB/s ± 5% 48.8MB/s ± 5% ~ (p=0.172 n=40+40) Gunzip-4 472MB/s ± 3% 465MB/s ± 3% -1.39% (p=0.001 n=40+40) JSONEncode-4 93.6MB/s ± 4% 95.1MB/s ± 3% +1.61% (p=0.000 n=40+40) JSONDecode-4 28.0MB/s ± 3% 28.1MB/s ± 6% ~ (p=0.181 n=40+40) GoParse-4 17.6MB/s ± 3% 17.7MB/s ± 3% ~ (p=0.350 n=40+39) RegexpMatchEasy0_32-4 308MB/s ± 4% 311MB/s ± 6% +0.96% (p=0.025 n=40+40) RegexpMatchEasy0_1K-4 1.20GB/s ± 3% 1.20GB/s ± 2% ~ (p=0.317 n=40+40) RegexpMatchEasy1_32-4 282MB/s ± 7% 282MB/s ± 3% ~ (p=0.516 n=40+40) RegexpMatchEasy1_1K-4 994MB/s ± 4% 991MB/s ± 5% ~ (p=0.319 n=40+40) RegexpMatchMedium_32-4 7.31MB/s ± 3% 7.49MB/s ± 3% +2.46% (p=0.000 n=40+40) RegexpMatchMedium_1K-4 23.3MB/s ± 3% 23.4MB/s ± 3% ~ (p=0.052 n=40+40) RegexpMatchHard_32-4 14.1MB/s ± 3% 14.1MB/s ± 4% ~ (p=0.391 n=40+40) RegexpMatchHard_1K-4 15.1MB/s ± 3% 14.9MB/s ± 3% -1.27% (p=0.000 n=40+40) Revcomp-4 137MB/s ± 5% 137MB/s ± 2% ~ (p=0.942 n=40+40) Template-4 26.5MB/s ± 4% 27.8MB/s ± 7% +5.03% (p=0.000 n=40+40) [Geo mean] 78.6MB/s 79.0MB/s +0.57% Change-Id: Idcacc6881ef57cd7dc33aa87b711282842b72a53 Reviewed-on: https://go-review.googlesource.com/126618 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-07-29 12:50:50 +00:00
case ssa.Op386CMPLload, ssa.Op386CMPWload, ssa.Op386CMPBload:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_MEM
p.From.Reg = v.Args[0].Reg()
gc.AddAux(&p.From, v)
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Args[1].Reg()
case ssa.Op386CMPLconstload, ssa.Op386CMPWconstload, ssa.Op386CMPBconstload:
sc := v.AuxValAndOff()
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_MEM
p.From.Reg = v.Args[0].Reg()
gc.AddAux2(&p.From, v, sc.Off())
p.To.Type = obj.TYPE_CONST
p.To.Offset = sc.Val()
case ssa.Op386MOVLconst:
x := v.Reg()
// If flags aren't live (indicated by v.Aux == nil),
// then we can rewrite MOV $0, AX into XOR AX, AX.
if v.AuxInt == 0 && v.Aux == nil {
p := s.Prog(x86.AXORL)
p.From.Type = obj.TYPE_REG
p.From.Reg = x
p.To.Type = obj.TYPE_REG
p.To.Reg = x
break
}
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
p.From.Offset = v.AuxInt
p.To.Type = obj.TYPE_REG
p.To.Reg = x
case ssa.Op386MOVSSconst, ssa.Op386MOVSDconst:
x := v.Reg()
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_FCONST
p.From.Val = math.Float64frombits(uint64(v.AuxInt))
p.To.Type = obj.TYPE_REG
p.To.Reg = x
case ssa.Op386MOVSSconst1, ssa.Op386MOVSDconst1:
p := s.Prog(x86.ALEAL)
p.From.Type = obj.TYPE_MEM
p.From.Name = obj.NAME_EXTERN
f := math.Float64frombits(uint64(v.AuxInt))
if v.Op == ssa.Op386MOVSDconst1 {
p.From.Sym = gc.Ctxt.Float64Sym(f)
} else {
p.From.Sym = gc.Ctxt.Float32Sym(float32(f))
}
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.Op386MOVSSconst2, ssa.Op386MOVSDconst2:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_MEM
p.From.Reg = v.Args[0].Reg()
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.Op386MOVSSload, ssa.Op386MOVSDload, ssa.Op386MOVLload, ssa.Op386MOVWload, ssa.Op386MOVBload, ssa.Op386MOVBLSXload, ssa.Op386MOVWLSXload:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_MEM
p.From.Reg = v.Args[0].Reg()
gc.AddAux(&p.From, v)
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.Op386MOVBloadidx1, ssa.Op386MOVWloadidx1, ssa.Op386MOVLloadidx1, ssa.Op386MOVSSloadidx1, ssa.Op386MOVSDloadidx1,
ssa.Op386MOVSDloadidx8, ssa.Op386MOVLloadidx4, ssa.Op386MOVSSloadidx4, ssa.Op386MOVWloadidx2:
r := v.Args[0].Reg()
i := v.Args[1].Reg()
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_MEM
switch v.Op {
case ssa.Op386MOVBloadidx1, ssa.Op386MOVWloadidx1, ssa.Op386MOVLloadidx1, ssa.Op386MOVSSloadidx1, ssa.Op386MOVSDloadidx1:
if i == x86.REG_SP {
r, i = i, r
}
p.From.Scale = 1
case ssa.Op386MOVSDloadidx8:
p.From.Scale = 8
case ssa.Op386MOVLloadidx4, ssa.Op386MOVSSloadidx4:
p.From.Scale = 4
case ssa.Op386MOVWloadidx2:
p.From.Scale = 2
}
p.From.Reg = r
p.From.Index = i
gc.AddAux(&p.From, v)
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
cmd/compile: add indexed form for several 386 instructions This CL implements indexed memory operands for the following instructions. (ADD|SUB|MUL|AND|OR|XOR)Lload -> (ADD|SUB|MUL|AND|OR|XOR)Lloadidx4 (ADD|SUB|AND|OR|XOR)Lmodify -> (ADD|SUB|AND|OR|XOR)Lmodifyidx4 (ADD|AND|OR|XOR)Lconstmodify -> (ADD|AND|OR|XOR)Lconstmodifyidx4 1. The total size of pkg/linux_386/ decreases about 2.5KB, excluding cmd/compile/ . 2. There is little regression in the go1 benchmark test, excluding noise. name old time/op new time/op delta BinaryTree17-4 3.25s ± 3% 3.25s ± 3% ~ (p=0.218 n=40+40) Fannkuch11-4 3.53s ± 1% 3.53s ± 1% ~ (p=0.303 n=40+40) FmtFprintfEmpty-4 44.9ns ± 3% 45.6ns ± 3% +1.48% (p=0.030 n=40+36) FmtFprintfString-4 78.7ns ± 5% 80.1ns ± 7% ~ (p=0.217 n=36+40) FmtFprintfInt-4 90.2ns ± 6% 89.8ns ± 5% ~ (p=0.659 n=40+38) FmtFprintfIntInt-4 140ns ± 5% 141ns ± 5% +1.00% (p=0.027 n=40+40) FmtFprintfPrefixedInt-4 185ns ± 3% 183ns ± 3% ~ (p=0.104 n=40+40) FmtFprintfFloat-4 411ns ± 4% 406ns ± 3% -1.37% (p=0.005 n=40+40) FmtManyArgs-4 590ns ± 4% 598ns ± 4% +1.35% (p=0.008 n=40+40) GobDecode-4 7.16ms ± 5% 7.10ms ± 5% ~ (p=0.335 n=40+40) GobEncode-4 6.85ms ± 7% 6.74ms ± 9% ~ (p=0.058 n=38+40) Gzip-4 400ms ± 4% 399ms ± 2% -0.34% (p=0.003 n=40+33) Gunzip-4 41.4ms ± 3% 41.4ms ± 4% -0.12% (p=0.020 n=40+40) HTTPClientServer-4 64.1µs ± 4% 63.5µs ± 2% -1.07% (p=0.000 n=39+37) JSONEncode-4 15.9ms ± 2% 15.9ms ± 3% ~ (p=0.103 n=40+40) JSONDecode-4 62.2ms ± 4% 61.6ms ± 3% -0.98% (p=0.006 n=39+40) Mandelbrot200-4 5.18ms ± 3% 5.14ms ± 4% ~ (p=0.125 n=40+40) GoParse-4 3.29ms ± 2% 3.27ms ± 2% -0.66% (p=0.006 n=40+40) RegexpMatchEasy0_32-4 103ns ± 4% 103ns ± 4% ~ (p=0.632 n=40+40) RegexpMatchEasy0_1K-4 830ns ± 3% 828ns ± 3% ~ (p=0.563 n=40+40) RegexpMatchEasy1_32-4 113ns ± 4% 113ns ± 4% ~ (p=0.494 n=40+40) RegexpMatchEasy1_1K-4 1.03µs ± 4% 1.03µs ± 4% ~ (p=0.665 n=40+40) RegexpMatchMedium_32-4 130ns ± 4% 129ns ± 3% ~ (p=0.458 n=40+40) RegexpMatchMedium_1K-4 39.4µs ± 3% 39.7µs ± 3% ~ (p=0.825 n=40+40) RegexpMatchHard_32-4 2.16µs ± 4% 2.15µs ± 4% ~ (p=0.137 n=40+40) RegexpMatchHard_1K-4 65.2µs ± 3% 65.4µs ± 4% ~ (p=0.160 n=40+40) Revcomp-4 1.87s ± 2% 1.87s ± 1% +0.17% (p=0.019 n=33+33) Template-4 69.4ms ± 3% 69.8ms ± 3% +0.60% (p=0.009 n=40+40) TimeParse-4 437ns ± 4% 438ns ± 4% ~ (p=0.234 n=40+40) TimeFormat-4 408ns ± 3% 408ns ± 3% ~ (p=0.904 n=40+40) [Geo mean] 65.7µs 65.6µs -0.08% name old speed new speed delta GobDecode-4 107MB/s ± 5% 108MB/s ± 5% ~ (p=0.336 n=40+40) GobEncode-4 112MB/s ± 6% 114MB/s ± 9% +1.95% (p=0.036 n=37+40) Gzip-4 48.5MB/s ± 4% 48.6MB/s ± 2% +0.28% (p=0.003 n=40+33) Gunzip-4 469MB/s ± 4% 469MB/s ± 4% +0.11% (p=0.021 n=40+40) JSONEncode-4 122MB/s ± 2% 122MB/s ± 3% ~ (p=0.105 n=40+40) JSONDecode-4 31.2MB/s ± 4% 31.5MB/s ± 4% +0.99% (p=0.007 n=39+40) GoParse-4 17.6MB/s ± 2% 17.7MB/s ± 2% +0.66% (p=0.007 n=40+40) RegexpMatchEasy0_32-4 310MB/s ± 4% 310MB/s ± 4% ~ (p=0.384 n=40+40) RegexpMatchEasy0_1K-4 1.23GB/s ± 3% 1.24GB/s ± 3% ~ (p=0.186 n=40+40) RegexpMatchEasy1_32-4 283MB/s ± 3% 281MB/s ± 4% ~ (p=0.855 n=40+40) RegexpMatchEasy1_1K-4 1.00GB/s ± 4% 1.00GB/s ± 4% ~ (p=0.665 n=40+40) RegexpMatchMedium_32-4 7.68MB/s ± 4% 7.73MB/s ± 3% ~ (p=0.359 n=40+40) RegexpMatchMedium_1K-4 26.0MB/s ± 3% 25.8MB/s ± 3% ~ (p=0.825 n=40+40) RegexpMatchHard_32-4 14.8MB/s ± 3% 14.9MB/s ± 4% ~ (p=0.136 n=40+40) RegexpMatchHard_1K-4 15.7MB/s ± 3% 15.7MB/s ± 4% ~ (p=0.150 n=40+40) Revcomp-4 136MB/s ± 1% 136MB/s ± 1% -0.09% (p=0.028 n=32+33) Template-4 28.0MB/s ± 3% 27.8MB/s ± 3% -0.59% (p=0.010 n=40+40) [Geo mean] 82.1MB/s 82.3MB/s +0.25% Change-Id: Ifa387a251056678326d3508aa02753b70bf7e5d0 Reviewed-on: https://go-review.googlesource.com/c/140303 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-10-06 13:13:48 +00:00
case ssa.Op386ADDLloadidx4, ssa.Op386SUBLloadidx4, ssa.Op386MULLloadidx4,
ssa.Op386ANDLloadidx4, ssa.Op386ORLloadidx4, ssa.Op386XORLloadidx4:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_MEM
p.From.Reg = v.Args[1].Reg()
p.From.Index = v.Args[2].Reg()
p.From.Scale = 4
gc.AddAux(&p.From, v)
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
if v.Reg() != v.Args[0].Reg() {
v.Fatalf("input[0] and output not in same register %s", v.LongString())
}
cmd/compile: optimize 386 code with MULLload/DIVSSload/DIVSDload IMULL/DIVSS/DIVSD all can take the source operand from memory directly. And this CL implement that optimization. 1. The total size of pkg/linux_386 decreases about 84KB (excluding cmd/compile). 2. The go1 benchmark shows little regression in total (excluding noise). name old time/op new time/op delta BinaryTree17-4 3.29s ± 2% 3.27s ± 4% ~ (p=0.192 n=30+30) Fannkuch11-4 3.49s ± 2% 3.54s ± 1% +1.48% (p=0.000 n=30+30) FmtFprintfEmpty-4 45.9ns ± 3% 46.3ns ± 4% +0.89% (p=0.037 n=30+30) FmtFprintfString-4 78.8ns ± 3% 78.7ns ± 4% ~ (p=0.209 n=30+27) FmtFprintfInt-4 91.0ns ± 2% 90.3ns ± 2% -0.82% (p=0.031 n=30+27) FmtFprintfIntInt-4 142ns ± 4% 143ns ± 4% ~ (p=0.136 n=30+30) FmtFprintfPrefixedInt-4 181ns ± 3% 183ns ± 4% +1.40% (p=0.005 n=30+30) FmtFprintfFloat-4 404ns ± 4% 408ns ± 3% ~ (p=0.397 n=30+30) FmtManyArgs-4 601ns ± 3% 609ns ± 5% ~ (p=0.059 n=30+30) GobDecode-4 7.21ms ± 5% 7.24ms ± 5% ~ (p=0.612 n=30+30) GobEncode-4 6.91ms ± 6% 6.91ms ± 6% ~ (p=0.797 n=30+30) Gzip-4 398ms ± 6% 399ms ± 4% ~ (p=0.173 n=30+30) Gunzip-4 41.7ms ± 3% 41.8ms ± 3% ~ (p=0.423 n=30+30) HTTPClientServer-4 62.3µs ± 2% 62.7µs ± 3% ~ (p=0.085 n=29+30) JSONEncode-4 21.0ms ± 4% 20.7ms ± 5% -1.39% (p=0.014 n=30+30) JSONDecode-4 66.3ms ± 3% 67.4ms ± 1% +1.71% (p=0.003 n=30+24) Mandelbrot200-4 5.15ms ± 3% 5.16ms ± 3% ~ (p=0.697 n=30+30) GoParse-4 3.24ms ± 3% 3.27ms ± 4% +0.91% (p=0.032 n=30+30) RegexpMatchEasy0_32-4 101ns ± 5% 99ns ± 4% -1.82% (p=0.008 n=29+30) RegexpMatchEasy0_1K-4 848ns ± 4% 841ns ± 2% -0.77% (p=0.043 n=30+30) RegexpMatchEasy1_32-4 106ns ± 6% 106ns ± 3% ~ (p=0.939 n=29+30) RegexpMatchEasy1_1K-4 1.02µs ± 3% 1.03µs ± 4% ~ (p=0.297 n=28+30) RegexpMatchMedium_32-4 129ns ± 4% 127ns ± 4% ~ (p=0.073 n=30+30) RegexpMatchMedium_1K-4 43.9µs ± 3% 43.8µs ± 3% ~ (p=0.186 n=30+30) RegexpMatchHard_32-4 2.24µs ± 4% 2.22µs ± 4% ~ (p=0.332 n=30+29) RegexpMatchHard_1K-4 68.0µs ± 4% 67.5µs ± 3% ~ (p=0.290 n=30+30) Revcomp-4 1.85s ± 3% 1.85s ± 3% ~ (p=0.358 n=30+30) Template-4 69.6ms ± 3% 70.0ms ± 4% ~ (p=0.273 n=30+30) TimeParse-4 445ns ± 3% 441ns ± 3% ~ (p=0.494 n=30+30) TimeFormat-4 412ns ± 3% 412ns ± 6% ~ (p=0.841 n=30+30) [Geo mean] 66.7µs 66.8µs +0.13% name old speed new speed delta GobDecode-4 107MB/s ± 5% 106MB/s ± 5% ~ (p=0.615 n=30+30) GobEncode-4 111MB/s ± 6% 111MB/s ± 6% ~ (p=0.790 n=30+30) Gzip-4 48.8MB/s ± 6% 48.7MB/s ± 4% ~ (p=0.167 n=30+30) Gunzip-4 465MB/s ± 3% 465MB/s ± 3% ~ (p=0.420 n=30+30) JSONEncode-4 92.4MB/s ± 4% 93.7MB/s ± 5% +1.42% (p=0.015 n=30+30) JSONDecode-4 29.3MB/s ± 3% 28.8MB/s ± 1% -1.72% (p=0.003 n=30+24) GoParse-4 17.9MB/s ± 3% 17.7MB/s ± 4% -0.89% (p=0.037 n=30+30) RegexpMatchEasy0_32-4 317MB/s ± 8% 324MB/s ± 4% +2.14% (p=0.006 n=30+30) RegexpMatchEasy0_1K-4 1.21GB/s ± 4% 1.22GB/s ± 2% +0.77% (p=0.036 n=30+30) RegexpMatchEasy1_32-4 298MB/s ± 7% 299MB/s ± 4% ~ (p=0.511 n=30+30) RegexpMatchEasy1_1K-4 1.00GB/s ± 3% 1.00GB/s ± 4% ~ (p=0.304 n=28+30) RegexpMatchMedium_32-4 7.75MB/s ± 4% 7.82MB/s ± 4% ~ (p=0.089 n=30+30) RegexpMatchMedium_1K-4 23.3MB/s ± 3% 23.4MB/s ± 3% ~ (p=0.181 n=30+30) RegexpMatchHard_32-4 14.3MB/s ± 4% 14.4MB/s ± 4% ~ (p=0.320 n=30+29) RegexpMatchHard_1K-4 15.1MB/s ± 4% 15.2MB/s ± 3% ~ (p=0.273 n=30+30) Revcomp-4 137MB/s ± 3% 137MB/s ± 3% ~ (p=0.352 n=30+30) Template-4 27.9MB/s ± 3% 27.7MB/s ± 4% ~ (p=0.277 n=30+30) [Geo mean] 79.9MB/s 80.1MB/s +0.15% Change-Id: I97333cd8ddabb3c7c88ca5aa9e14a005b74d306d Reviewed-on: https://go-review.googlesource.com/120695 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-06-24 07:04:21 +00:00
case ssa.Op386ADDLload, ssa.Op386SUBLload, ssa.Op386MULLload,
ssa.Op386ANDLload, ssa.Op386ORLload, ssa.Op386XORLload,
ssa.Op386ADDSDload, ssa.Op386ADDSSload, ssa.Op386SUBSDload, ssa.Op386SUBSSload,
ssa.Op386MULSDload, ssa.Op386MULSSload, ssa.Op386DIVSSload, ssa.Op386DIVSDload:
cmd/compile: optimize 386 binary operations with a memory operand Some integer/float binary operations of 386 can take a direct memory operand, which is more efficient than loading to a register. These CL does this optimization by copying the similar solution of amd64. And the go1 benchmark shows some inprovements, especially the test case Template. (excluding noise) name old time/op new time/op delta BinaryTree17-4 3.42s ± 2% 3.40s ± 2% ~ (p=0.069 n=38+39) Fannkuch11-4 3.48s ± 1% 3.53s ± 1% +1.59% (p=0.000 n=40+40) FmtFprintfEmpty-4 46.7ns ± 4% 46.3ns ± 3% -1.03% (p=0.001 n=40+40) FmtFprintfString-4 80.1ns ± 3% 80.6ns ± 3% +0.56% (p=0.029 n=40+40) FmtFprintfInt-4 92.4ns ± 2% 92.3ns ± 3% ~ (p=0.847 n=40+40) FmtFprintfIntInt-4 147ns ± 3% 144ns ± 3% -1.87% (p=0.000 n=40+40) FmtFprintfPrefixedInt-4 182ns ± 2% 184ns ± 3% +0.99% (p=0.002 n=40+40) FmtFprintfFloat-4 387ns ± 3% 384ns ± 3% ~ (p=0.069 n=40+40) FmtManyArgs-4 619ns ± 3% 616ns ± 3% ~ (p=0.320 n=40+40) GobDecode-4 7.28ms ± 6% 7.27ms ± 5% ~ (p=0.897 n=40+40) GobEncode-4 7.33ms ± 6% 7.21ms ± 6% -1.56% (p=0.022 n=38+40) Gzip-4 357ms ± 4% 357ms ± 4% ~ (p=0.071 n=40+40) Gunzip-4 45.3ms ± 3% 45.4ms ± 3% ~ (p=0.452 n=40+40) HTTPClientServer-4 63.0µs ± 2% 62.9µs ± 3% ~ (p=0.760 n=38+39) JSONEncode-4 22.0ms ± 4% 21.7ms ± 4% -1.49% (p=0.000 n=40+40) JSONDecode-4 67.7ms ± 4% 68.3ms ± 3% +0.86% (p=0.039 n=40+40) Mandelbrot200-4 5.16ms ± 3% 5.17ms ± 3% ~ (p=0.418 n=40+40) GoParse-4 3.30ms ± 2% 3.32ms ± 3% +0.55% (p=0.017 n=40+40) RegexpMatchEasy0_32-4 104ns ± 3% 104ns ± 4% ~ (p=0.992 n=40+40) RegexpMatchEasy0_1K-4 852ns ± 3% 851ns ± 2% ~ (p=0.344 n=40+40) RegexpMatchEasy1_32-4 113ns ± 4% 113ns ± 5% ~ (p=0.937 n=40+40) RegexpMatchEasy1_1K-4 1.03µs ± 5% 1.04µs ± 4% ~ (p=0.430 n=40+40) RegexpMatchMedium_32-4 132ns ± 4% 131ns ± 3% -1.06% (p=0.027 n=40+40) RegexpMatchMedium_1K-4 43.0µs ± 3% 43.2µs ± 3% ~ (p=0.122 n=40+40) RegexpMatchHard_32-4 2.21µs ± 4% 2.20µs ± 4% ~ (p=0.146 n=40+40) RegexpMatchHard_1K-4 67.1µs ± 4% 67.2µs ± 3% ~ (p=0.859 n=40+40) Revcomp-4 1.85s ± 2% 1.85s ± 3% ~ (p=0.184 n=40+40) Template-4 70.1ms ± 4% 67.5ms ± 3% -3.65% (p=0.000 n=40+40) TimeParse-4 457ns ±16% 439ns ± 4% ~ (p=0.683 n=40+34) TimeFormat-4 413ns ± 3% 414ns ± 3% ~ (p=0.850 n=40+40) [Geo mean] 67.5µs 67.3µs -0.38% name old speed new speed delta GobDecode-4 105MB/s ± 6% 106MB/s ± 5% ~ (p=0.893 n=40+40) GobEncode-4 105MB/s ± 6% 107MB/s ± 7% +1.60% (p=0.023 n=38+40) Gzip-4 54.4MB/s ± 4% 54.5MB/s ± 4% ~ (p=0.073 n=40+40) Gunzip-4 429MB/s ± 3% 428MB/s ± 3% ~ (p=0.453 n=40+40) JSONEncode-4 88.3MB/s ± 5% 89.6MB/s ± 4% +1.51% (p=0.000 n=40+40) JSONDecode-4 28.7MB/s ± 4% 28.4MB/s ± 3% -0.87% (p=0.039 n=40+40) GoParse-4 17.6MB/s ± 3% 17.5MB/s ± 3% -0.55% (p=0.020 n=40+40) RegexpMatchEasy0_32-4 308MB/s ± 4% 308MB/s ± 5% ~ (p=0.988 n=40+40) RegexpMatchEasy0_1K-4 1.20GB/s ± 3% 1.20GB/s ± 2% ~ (p=0.329 n=40+40) RegexpMatchEasy1_32-4 283MB/s ± 4% 283MB/s ± 4% ~ (p=0.507 n=40+40) RegexpMatchEasy1_1K-4 991MB/s ± 5% 987MB/s ± 4% ~ (p=0.446 n=40+40) RegexpMatchMedium_32-4 7.54MB/s ± 4% 7.63MB/s ± 3% +1.26% (p=0.004 n=40+40) RegexpMatchMedium_1K-4 23.8MB/s ± 3% 23.7MB/s ± 4% ~ (p=0.121 n=40+40) RegexpMatchHard_32-4 14.5MB/s ± 4% 14.6MB/s ± 4% ~ (p=0.145 n=40+40) RegexpMatchHard_1K-4 15.3MB/s ± 4% 15.2MB/s ± 3% ~ (p=0.874 n=40+40) Revcomp-4 137MB/s ± 2% 137MB/s ± 3% ~ (p=0.179 n=40+40) Template-4 27.7MB/s ± 4% 28.7MB/s ± 3% +3.78% (p=0.000 n=40+40) [Geo mean] 78.9MB/s 79.2MB/s +0.38% Change-Id: I3ba688c253b665485c1ebdf5a75f4ce82cc3def3 Reviewed-on: https://go-review.googlesource.com/102036 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Keith Randall <khr@golang.org>
2018-03-22 02:18:50 +00:00
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_MEM
p.From.Reg = v.Args[1].Reg()
gc.AddAux(&p.From, v)
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
if v.Reg() != v.Args[0].Reg() {
v.Fatalf("input[0] and output not in same register %s", v.LongString())
}
cmd/compile: emit more compact 386 instructions ADDL/SUBL/ANDL/ORL/XORL can have a memory operand as destination, and this CL optimize the compiler to emit such instructions on 386 for more compact binary. Here is test report: 1. The total size of pkg/linux_386/ and pkg/tool/linux_386/ decreases about 14KB. (pkg/linux_386/cmd/compile/ and pkg/tool/linux_386/compile are excluded) 2. The go1 benchmark shows little change, excluding ±2% noise. name old time/op new time/op delta BinaryTree17-4 3.34s ± 2% 3.38s ± 2% +1.27% (p=0.000 n=40+39) Fannkuch11-4 3.55s ± 1% 3.51s ± 1% -1.33% (p=0.000 n=40+40) FmtFprintfEmpty-4 46.3ns ± 3% 46.9ns ± 4% +1.41% (p=0.002 n=40+40) FmtFprintfString-4 80.8ns ± 3% 80.4ns ± 6% -0.54% (p=0.044 n=40+40) FmtFprintfInt-4 93.0ns ± 3% 92.2ns ± 4% -0.88% (p=0.007 n=39+40) FmtFprintfIntInt-4 144ns ± 5% 145ns ± 2% +0.78% (p=0.015 n=40+40) FmtFprintfPrefixedInt-4 184ns ± 2% 182ns ± 2% -1.06% (p=0.004 n=40+40) FmtFprintfFloat-4 415ns ± 4% 419ns ± 4% ~ (p=0.434 n=40+40) FmtManyArgs-4 615ns ± 3% 619ns ± 3% ~ (p=0.100 n=40+40) GobDecode-4 7.30ms ± 6% 7.36ms ± 6% ~ (p=0.074 n=40+40) GobEncode-4 7.10ms ± 6% 7.21ms ± 5% ~ (p=0.082 n=40+39) Gzip-4 364ms ± 3% 362ms ± 6% -0.71% (p=0.020 n=40+40) Gunzip-4 42.4ms ± 3% 42.2ms ± 3% ~ (p=0.303 n=40+40) HTTPClientServer-4 62.9µs ± 1% 62.9µs ± 1% ~ (p=0.768 n=38+39) JSONEncode-4 21.4ms ± 4% 21.5ms ± 5% ~ (p=0.210 n=40+40) JSONDecode-4 67.7ms ± 3% 67.9ms ± 4% ~ (p=0.713 n=40+40) Mandelbrot200-4 5.18ms ± 3% 5.21ms ± 3% +0.59% (p=0.021 n=40+40) GoParse-4 3.35ms ± 3% 3.34ms ± 2% ~ (p=0.996 n=40+40) RegexpMatchEasy0_32-4 98.5ns ± 5% 96.3ns ± 4% -2.15% (p=0.001 n=40+40) RegexpMatchEasy0_1K-4 851ns ± 4% 850ns ± 5% ~ (p=0.700 n=40+40) RegexpMatchEasy1_32-4 105ns ± 7% 107ns ± 4% +1.50% (p=0.017 n=40+40) RegexpMatchEasy1_1K-4 1.03µs ± 5% 1.03µs ± 4% ~ (p=0.992 n=40+40) RegexpMatchMedium_32-4 130ns ± 6% 128ns ± 4% -1.66% (p=0.012 n=40+40) RegexpMatchMedium_1K-4 44.0µs ± 5% 43.6µs ± 3% ~ (p=0.704 n=40+40) RegexpMatchHard_32-4 2.29µs ± 3% 2.23µs ± 4% -2.38% (p=0.000 n=40+40) RegexpMatchHard_1K-4 69.0µs ± 3% 68.1µs ± 3% -1.28% (p=0.003 n=40+40) Revcomp-4 1.85s ± 2% 1.87s ± 3% +1.11% (p=0.000 n=40+40) Template-4 69.8ms ± 3% 69.6ms ± 3% ~ (p=0.125 n=40+40) TimeParse-4 442ns ± 5% 440ns ± 3% ~ (p=0.585 n=40+40) TimeFormat-4 419ns ± 3% 420ns ± 3% ~ (p=0.824 n=40+40) [Geo mean] 67.3µs 67.2µs -0.11% name old speed new speed delta GobDecode-4 105MB/s ± 6% 104MB/s ± 6% ~ (p=0.074 n=40+40) GobEncode-4 108MB/s ± 7% 107MB/s ± 5% ~ (p=0.080 n=40+39) Gzip-4 53.3MB/s ± 3% 53.7MB/s ± 6% +0.73% (p=0.021 n=40+40) Gunzip-4 458MB/s ± 3% 460MB/s ± 3% ~ (p=0.301 n=40+40) JSONEncode-4 90.8MB/s ± 4% 90.3MB/s ± 4% ~ (p=0.213 n=40+40) JSONDecode-4 28.7MB/s ± 3% 28.6MB/s ± 4% ~ (p=0.679 n=40+40) GoParse-4 17.3MB/s ± 3% 17.3MB/s ± 2% ~ (p=1.000 n=40+40) RegexpMatchEasy0_32-4 325MB/s ± 5% 333MB/s ± 4% +2.44% (p=0.000 n=40+38) RegexpMatchEasy0_1K-4 1.20GB/s ± 4% 1.21GB/s ± 5% ~ (p=0.684 n=40+40) RegexpMatchEasy1_32-4 303MB/s ± 7% 298MB/s ± 4% -1.52% (p=0.022 n=40+40) RegexpMatchEasy1_1K-4 995MB/s ± 5% 996MB/s ± 4% ~ (p=0.996 n=40+40) RegexpMatchMedium_32-4 7.67MB/s ± 6% 7.80MB/s ± 4% +1.68% (p=0.011 n=40+40) RegexpMatchMedium_1K-4 23.3MB/s ± 5% 23.5MB/s ± 3% ~ (p=0.697 n=40+40) RegexpMatchHard_32-4 14.0MB/s ± 3% 14.3MB/s ± 4% +2.43% (p=0.000 n=40+40) RegexpMatchHard_1K-4 14.8MB/s ± 3% 15.0MB/s ± 3% +1.30% (p=0.003 n=40+40) Revcomp-4 137MB/s ± 2% 136MB/s ± 3% -1.10% (p=0.000 n=40+40) Template-4 27.8MB/s ± 3% 27.9MB/s ± 3% ~ (p=0.128 n=40+40) [Geo mean] 79.6MB/s 79.9MB/s +0.28% Change-Id: I02a3efc125dc81e18fc8495eb2bf1bba59ab8733 Reviewed-on: https://go-review.googlesource.com/110157 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-04-29 10:42:14 +00:00
case ssa.Op386MOVSSstore, ssa.Op386MOVSDstore, ssa.Op386MOVLstore, ssa.Op386MOVWstore, ssa.Op386MOVBstore,
ssa.Op386ADDLmodify, ssa.Op386SUBLmodify, ssa.Op386ANDLmodify, ssa.Op386ORLmodify, ssa.Op386XORLmodify:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[1].Reg()
p.To.Type = obj.TYPE_MEM
p.To.Reg = v.Args[0].Reg()
gc.AddAux(&p.To, v)
cmd/compile: implement "OPC $imm, (mem)" for 386 New read-modify-write operations are introduced in this CL for 386. 1. The total size of pkg/linux_386 decreases about 10KB (excluding cmd/compile). 2. The go1 benchmark shows little regression. name old time/op new time/op delta BinaryTree17-4 3.32s ± 4% 3.29s ± 2% ~ (p=0.059 n=30+30) Fannkuch11-4 3.49s ± 1% 3.46s ± 1% -0.92% (p=0.001 n=30+30) FmtFprintfEmpty-4 47.7ns ± 2% 46.8ns ± 5% -1.93% (p=0.011 n=25+30) FmtFprintfString-4 79.5ns ± 7% 80.2ns ± 3% +0.89% (p=0.001 n=28+29) FmtFprintfInt-4 90.5ns ± 2% 92.1ns ± 2% +1.82% (p=0.014 n=22+30) FmtFprintfIntInt-4 141ns ± 1% 144ns ± 3% +2.23% (p=0.013 n=22+30) FmtFprintfPrefixedInt-4 183ns ± 2% 184ns ± 3% ~ (p=0.080 n=21+30) FmtFprintfFloat-4 409ns ± 3% 412ns ± 3% +0.83% (p=0.040 n=30+30) FmtManyArgs-4 597ns ± 6% 607ns ± 4% +1.71% (p=0.006 n=30+30) GobDecode-4 7.21ms ± 5% 7.18ms ± 6% ~ (p=0.665 n=30+30) GobEncode-4 7.17ms ± 6% 7.09ms ± 7% ~ (p=0.117 n=29+30) Gzip-4 413ms ± 4% 399ms ± 4% -3.48% (p=0.000 n=30+30) Gunzip-4 41.3ms ± 4% 41.7ms ± 3% +1.05% (p=0.011 n=30+30) HTTPClientServer-4 63.5µs ± 3% 62.9µs ± 2% -0.97% (p=0.017 n=30+27) JSONEncode-4 20.3ms ± 5% 20.1ms ± 5% -1.16% (p=0.004 n=30+30) JSONDecode-4 66.2ms ± 4% 67.7ms ± 4% +2.21% (p=0.000 n=30+30) Mandelbrot200-4 5.16ms ± 3% 5.18ms ± 3% ~ (p=0.123 n=30+30) GoParse-4 3.23ms ± 2% 3.27ms ± 2% +1.08% (p=0.006 n=30+30) RegexpMatchEasy0_32-4 98.9ns ± 5% 97.1ns ± 4% -1.83% (p=0.006 n=30+30) RegexpMatchEasy0_1K-4 842ns ± 3% 842ns ± 3% ~ (p=0.550 n=30+30) RegexpMatchEasy1_32-4 107ns ± 4% 105ns ± 4% -1.93% (p=0.012 n=30+30) RegexpMatchEasy1_1K-4 1.03µs ± 4% 1.04µs ± 4% ~ (p=0.304 n=30+30) RegexpMatchMedium_32-4 132ns ± 2% 129ns ± 4% -2.02% (p=0.000 n=21+30) RegexpMatchMedium_1K-4 44.1µs ± 4% 43.8µs ± 3% ~ (p=0.641 n=30+30) RegexpMatchHard_32-4 2.26µs ± 4% 2.23µs ± 4% -1.28% (p=0.023 n=30+30) RegexpMatchHard_1K-4 68.1µs ± 3% 68.6µs ± 4% ~ (p=0.089 n=30+30) Revcomp-4 1.85s ± 2% 1.84s ± 2% ~ (p=0.072 n=30+30) Template-4 69.2ms ± 3% 68.5ms ± 3% -1.04% (p=0.012 n=30+30) TimeParse-4 441ns ± 3% 446ns ± 4% +1.21% (p=0.001 n=30+30) TimeFormat-4 415ns ± 3% 415ns ± 3% ~ (p=0.436 n=30+30) [Geo mean] 67.0µs 66.9µs -0.17% name old speed new speed delta GobDecode-4 107MB/s ± 5% 107MB/s ± 6% ~ (p=0.663 n=30+30) GobEncode-4 107MB/s ± 6% 108MB/s ± 7% ~ (p=0.117 n=29+30) Gzip-4 47.0MB/s ± 4% 48.7MB/s ± 4% +3.61% (p=0.000 n=30+30) Gunzip-4 470MB/s ± 4% 466MB/s ± 4% -1.05% (p=0.011 n=30+30) JSONEncode-4 95.6MB/s ± 5% 96.7MB/s ± 5% +1.16% (p=0.005 n=30+30) JSONDecode-4 29.3MB/s ± 4% 28.7MB/s ± 4% -2.17% (p=0.000 n=30+30) GoParse-4 17.9MB/s ± 2% 17.7MB/s ± 2% -1.06% (p=0.007 n=30+30) RegexpMatchEasy0_32-4 323MB/s ± 5% 329MB/s ± 4% +1.93% (p=0.006 n=30+30) RegexpMatchEasy0_1K-4 1.22GB/s ± 3% 1.22GB/s ± 3% ~ (p=0.496 n=30+30) RegexpMatchEasy1_32-4 298MB/s ± 4% 303MB/s ± 4% +1.84% (p=0.017 n=30+30) RegexpMatchEasy1_1K-4 995MB/s ± 4% 989MB/s ± 4% ~ (p=0.307 n=30+30) RegexpMatchMedium_32-4 7.56MB/s ± 4% 7.74MB/s ± 4% +2.46% (p=0.000 n=22+30) RegexpMatchMedium_1K-4 23.2MB/s ± 4% 23.4MB/s ± 3% ~ (p=0.651 n=30+30) RegexpMatchHard_32-4 14.2MB/s ± 4% 14.3MB/s ± 4% +1.29% (p=0.021 n=30+30) RegexpMatchHard_1K-4 15.0MB/s ± 3% 14.9MB/s ± 4% ~ (p=0.069 n=30+29) Revcomp-4 138MB/s ± 2% 138MB/s ± 2% ~ (p=0.072 n=30+30) Template-4 28.1MB/s ± 3% 28.4MB/s ± 3% +1.05% (p=0.012 n=30+30) [Geo mean] 79.7MB/s 80.2MB/s +0.60% Change-Id: I44a1dfc942c9a385904553c4fe1fa8e509c8aa31 Reviewed-on: https://go-review.googlesource.com/120916 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-06-26 02:58:54 +00:00
case ssa.Op386ADDLconstmodify:
sc := v.AuxValAndOff()
val := sc.Val()
if val == 1 || val == -1 {
var p *obj.Prog
if val == 1 {
p = s.Prog(x86.AINCL)
} else {
p = s.Prog(x86.ADECL)
}
off := sc.Off()
p.To.Type = obj.TYPE_MEM
p.To.Reg = v.Args[0].Reg()
gc.AddAux2(&p.To, v, off)
break
cmd/compile: implement "OPC $imm, (mem)" for 386 New read-modify-write operations are introduced in this CL for 386. 1. The total size of pkg/linux_386 decreases about 10KB (excluding cmd/compile). 2. The go1 benchmark shows little regression. name old time/op new time/op delta BinaryTree17-4 3.32s ± 4% 3.29s ± 2% ~ (p=0.059 n=30+30) Fannkuch11-4 3.49s ± 1% 3.46s ± 1% -0.92% (p=0.001 n=30+30) FmtFprintfEmpty-4 47.7ns ± 2% 46.8ns ± 5% -1.93% (p=0.011 n=25+30) FmtFprintfString-4 79.5ns ± 7% 80.2ns ± 3% +0.89% (p=0.001 n=28+29) FmtFprintfInt-4 90.5ns ± 2% 92.1ns ± 2% +1.82% (p=0.014 n=22+30) FmtFprintfIntInt-4 141ns ± 1% 144ns ± 3% +2.23% (p=0.013 n=22+30) FmtFprintfPrefixedInt-4 183ns ± 2% 184ns ± 3% ~ (p=0.080 n=21+30) FmtFprintfFloat-4 409ns ± 3% 412ns ± 3% +0.83% (p=0.040 n=30+30) FmtManyArgs-4 597ns ± 6% 607ns ± 4% +1.71% (p=0.006 n=30+30) GobDecode-4 7.21ms ± 5% 7.18ms ± 6% ~ (p=0.665 n=30+30) GobEncode-4 7.17ms ± 6% 7.09ms ± 7% ~ (p=0.117 n=29+30) Gzip-4 413ms ± 4% 399ms ± 4% -3.48% (p=0.000 n=30+30) Gunzip-4 41.3ms ± 4% 41.7ms ± 3% +1.05% (p=0.011 n=30+30) HTTPClientServer-4 63.5µs ± 3% 62.9µs ± 2% -0.97% (p=0.017 n=30+27) JSONEncode-4 20.3ms ± 5% 20.1ms ± 5% -1.16% (p=0.004 n=30+30) JSONDecode-4 66.2ms ± 4% 67.7ms ± 4% +2.21% (p=0.000 n=30+30) Mandelbrot200-4 5.16ms ± 3% 5.18ms ± 3% ~ (p=0.123 n=30+30) GoParse-4 3.23ms ± 2% 3.27ms ± 2% +1.08% (p=0.006 n=30+30) RegexpMatchEasy0_32-4 98.9ns ± 5% 97.1ns ± 4% -1.83% (p=0.006 n=30+30) RegexpMatchEasy0_1K-4 842ns ± 3% 842ns ± 3% ~ (p=0.550 n=30+30) RegexpMatchEasy1_32-4 107ns ± 4% 105ns ± 4% -1.93% (p=0.012 n=30+30) RegexpMatchEasy1_1K-4 1.03µs ± 4% 1.04µs ± 4% ~ (p=0.304 n=30+30) RegexpMatchMedium_32-4 132ns ± 2% 129ns ± 4% -2.02% (p=0.000 n=21+30) RegexpMatchMedium_1K-4 44.1µs ± 4% 43.8µs ± 3% ~ (p=0.641 n=30+30) RegexpMatchHard_32-4 2.26µs ± 4% 2.23µs ± 4% -1.28% (p=0.023 n=30+30) RegexpMatchHard_1K-4 68.1µs ± 3% 68.6µs ± 4% ~ (p=0.089 n=30+30) Revcomp-4 1.85s ± 2% 1.84s ± 2% ~ (p=0.072 n=30+30) Template-4 69.2ms ± 3% 68.5ms ± 3% -1.04% (p=0.012 n=30+30) TimeParse-4 441ns ± 3% 446ns ± 4% +1.21% (p=0.001 n=30+30) TimeFormat-4 415ns ± 3% 415ns ± 3% ~ (p=0.436 n=30+30) [Geo mean] 67.0µs 66.9µs -0.17% name old speed new speed delta GobDecode-4 107MB/s ± 5% 107MB/s ± 6% ~ (p=0.663 n=30+30) GobEncode-4 107MB/s ± 6% 108MB/s ± 7% ~ (p=0.117 n=29+30) Gzip-4 47.0MB/s ± 4% 48.7MB/s ± 4% +3.61% (p=0.000 n=30+30) Gunzip-4 470MB/s ± 4% 466MB/s ± 4% -1.05% (p=0.011 n=30+30) JSONEncode-4 95.6MB/s ± 5% 96.7MB/s ± 5% +1.16% (p=0.005 n=30+30) JSONDecode-4 29.3MB/s ± 4% 28.7MB/s ± 4% -2.17% (p=0.000 n=30+30) GoParse-4 17.9MB/s ± 2% 17.7MB/s ± 2% -1.06% (p=0.007 n=30+30) RegexpMatchEasy0_32-4 323MB/s ± 5% 329MB/s ± 4% +1.93% (p=0.006 n=30+30) RegexpMatchEasy0_1K-4 1.22GB/s ± 3% 1.22GB/s ± 3% ~ (p=0.496 n=30+30) RegexpMatchEasy1_32-4 298MB/s ± 4% 303MB/s ± 4% +1.84% (p=0.017 n=30+30) RegexpMatchEasy1_1K-4 995MB/s ± 4% 989MB/s ± 4% ~ (p=0.307 n=30+30) RegexpMatchMedium_32-4 7.56MB/s ± 4% 7.74MB/s ± 4% +2.46% (p=0.000 n=22+30) RegexpMatchMedium_1K-4 23.2MB/s ± 4% 23.4MB/s ± 3% ~ (p=0.651 n=30+30) RegexpMatchHard_32-4 14.2MB/s ± 4% 14.3MB/s ± 4% +1.29% (p=0.021 n=30+30) RegexpMatchHard_1K-4 15.0MB/s ± 3% 14.9MB/s ± 4% ~ (p=0.069 n=30+29) Revcomp-4 138MB/s ± 2% 138MB/s ± 2% ~ (p=0.072 n=30+30) Template-4 28.1MB/s ± 3% 28.4MB/s ± 3% +1.05% (p=0.012 n=30+30) [Geo mean] 79.7MB/s 80.2MB/s +0.60% Change-Id: I44a1dfc942c9a385904553c4fe1fa8e509c8aa31 Reviewed-on: https://go-review.googlesource.com/120916 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-06-26 02:58:54 +00:00
}
fallthrough
cmd/compile: implement "OPC $imm, (mem)" for 386 New read-modify-write operations are introduced in this CL for 386. 1. The total size of pkg/linux_386 decreases about 10KB (excluding cmd/compile). 2. The go1 benchmark shows little regression. name old time/op new time/op delta BinaryTree17-4 3.32s ± 4% 3.29s ± 2% ~ (p=0.059 n=30+30) Fannkuch11-4 3.49s ± 1% 3.46s ± 1% -0.92% (p=0.001 n=30+30) FmtFprintfEmpty-4 47.7ns ± 2% 46.8ns ± 5% -1.93% (p=0.011 n=25+30) FmtFprintfString-4 79.5ns ± 7% 80.2ns ± 3% +0.89% (p=0.001 n=28+29) FmtFprintfInt-4 90.5ns ± 2% 92.1ns ± 2% +1.82% (p=0.014 n=22+30) FmtFprintfIntInt-4 141ns ± 1% 144ns ± 3% +2.23% (p=0.013 n=22+30) FmtFprintfPrefixedInt-4 183ns ± 2% 184ns ± 3% ~ (p=0.080 n=21+30) FmtFprintfFloat-4 409ns ± 3% 412ns ± 3% +0.83% (p=0.040 n=30+30) FmtManyArgs-4 597ns ± 6% 607ns ± 4% +1.71% (p=0.006 n=30+30) GobDecode-4 7.21ms ± 5% 7.18ms ± 6% ~ (p=0.665 n=30+30) GobEncode-4 7.17ms ± 6% 7.09ms ± 7% ~ (p=0.117 n=29+30) Gzip-4 413ms ± 4% 399ms ± 4% -3.48% (p=0.000 n=30+30) Gunzip-4 41.3ms ± 4% 41.7ms ± 3% +1.05% (p=0.011 n=30+30) HTTPClientServer-4 63.5µs ± 3% 62.9µs ± 2% -0.97% (p=0.017 n=30+27) JSONEncode-4 20.3ms ± 5% 20.1ms ± 5% -1.16% (p=0.004 n=30+30) JSONDecode-4 66.2ms ± 4% 67.7ms ± 4% +2.21% (p=0.000 n=30+30) Mandelbrot200-4 5.16ms ± 3% 5.18ms ± 3% ~ (p=0.123 n=30+30) GoParse-4 3.23ms ± 2% 3.27ms ± 2% +1.08% (p=0.006 n=30+30) RegexpMatchEasy0_32-4 98.9ns ± 5% 97.1ns ± 4% -1.83% (p=0.006 n=30+30) RegexpMatchEasy0_1K-4 842ns ± 3% 842ns ± 3% ~ (p=0.550 n=30+30) RegexpMatchEasy1_32-4 107ns ± 4% 105ns ± 4% -1.93% (p=0.012 n=30+30) RegexpMatchEasy1_1K-4 1.03µs ± 4% 1.04µs ± 4% ~ (p=0.304 n=30+30) RegexpMatchMedium_32-4 132ns ± 2% 129ns ± 4% -2.02% (p=0.000 n=21+30) RegexpMatchMedium_1K-4 44.1µs ± 4% 43.8µs ± 3% ~ (p=0.641 n=30+30) RegexpMatchHard_32-4 2.26µs ± 4% 2.23µs ± 4% -1.28% (p=0.023 n=30+30) RegexpMatchHard_1K-4 68.1µs ± 3% 68.6µs ± 4% ~ (p=0.089 n=30+30) Revcomp-4 1.85s ± 2% 1.84s ± 2% ~ (p=0.072 n=30+30) Template-4 69.2ms ± 3% 68.5ms ± 3% -1.04% (p=0.012 n=30+30) TimeParse-4 441ns ± 3% 446ns ± 4% +1.21% (p=0.001 n=30+30) TimeFormat-4 415ns ± 3% 415ns ± 3% ~ (p=0.436 n=30+30) [Geo mean] 67.0µs 66.9µs -0.17% name old speed new speed delta GobDecode-4 107MB/s ± 5% 107MB/s ± 6% ~ (p=0.663 n=30+30) GobEncode-4 107MB/s ± 6% 108MB/s ± 7% ~ (p=0.117 n=29+30) Gzip-4 47.0MB/s ± 4% 48.7MB/s ± 4% +3.61% (p=0.000 n=30+30) Gunzip-4 470MB/s ± 4% 466MB/s ± 4% -1.05% (p=0.011 n=30+30) JSONEncode-4 95.6MB/s ± 5% 96.7MB/s ± 5% +1.16% (p=0.005 n=30+30) JSONDecode-4 29.3MB/s ± 4% 28.7MB/s ± 4% -2.17% (p=0.000 n=30+30) GoParse-4 17.9MB/s ± 2% 17.7MB/s ± 2% -1.06% (p=0.007 n=30+30) RegexpMatchEasy0_32-4 323MB/s ± 5% 329MB/s ± 4% +1.93% (p=0.006 n=30+30) RegexpMatchEasy0_1K-4 1.22GB/s ± 3% 1.22GB/s ± 3% ~ (p=0.496 n=30+30) RegexpMatchEasy1_32-4 298MB/s ± 4% 303MB/s ± 4% +1.84% (p=0.017 n=30+30) RegexpMatchEasy1_1K-4 995MB/s ± 4% 989MB/s ± 4% ~ (p=0.307 n=30+30) RegexpMatchMedium_32-4 7.56MB/s ± 4% 7.74MB/s ± 4% +2.46% (p=0.000 n=22+30) RegexpMatchMedium_1K-4 23.2MB/s ± 4% 23.4MB/s ± 3% ~ (p=0.651 n=30+30) RegexpMatchHard_32-4 14.2MB/s ± 4% 14.3MB/s ± 4% +1.29% (p=0.021 n=30+30) RegexpMatchHard_1K-4 15.0MB/s ± 3% 14.9MB/s ± 4% ~ (p=0.069 n=30+29) Revcomp-4 138MB/s ± 2% 138MB/s ± 2% ~ (p=0.072 n=30+30) Template-4 28.1MB/s ± 3% 28.4MB/s ± 3% +1.05% (p=0.012 n=30+30) [Geo mean] 79.7MB/s 80.2MB/s +0.60% Change-Id: I44a1dfc942c9a385904553c4fe1fa8e509c8aa31 Reviewed-on: https://go-review.googlesource.com/120916 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-06-26 02:58:54 +00:00
case ssa.Op386ANDLconstmodify, ssa.Op386ORLconstmodify, ssa.Op386XORLconstmodify:
sc := v.AuxValAndOff()
off := sc.Off()
val := sc.Val()
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
p.From.Offset = val
p.To.Type = obj.TYPE_MEM
p.To.Reg = v.Args[0].Reg()
gc.AddAux2(&p.To, v, off)
case ssa.Op386MOVBstoreidx1, ssa.Op386MOVWstoreidx1, ssa.Op386MOVLstoreidx1, ssa.Op386MOVSSstoreidx1, ssa.Op386MOVSDstoreidx1,
cmd/compile: add indexed form for several 386 instructions This CL implements indexed memory operands for the following instructions. (ADD|SUB|MUL|AND|OR|XOR)Lload -> (ADD|SUB|MUL|AND|OR|XOR)Lloadidx4 (ADD|SUB|AND|OR|XOR)Lmodify -> (ADD|SUB|AND|OR|XOR)Lmodifyidx4 (ADD|AND|OR|XOR)Lconstmodify -> (ADD|AND|OR|XOR)Lconstmodifyidx4 1. The total size of pkg/linux_386/ decreases about 2.5KB, excluding cmd/compile/ . 2. There is little regression in the go1 benchmark test, excluding noise. name old time/op new time/op delta BinaryTree17-4 3.25s ± 3% 3.25s ± 3% ~ (p=0.218 n=40+40) Fannkuch11-4 3.53s ± 1% 3.53s ± 1% ~ (p=0.303 n=40+40) FmtFprintfEmpty-4 44.9ns ± 3% 45.6ns ± 3% +1.48% (p=0.030 n=40+36) FmtFprintfString-4 78.7ns ± 5% 80.1ns ± 7% ~ (p=0.217 n=36+40) FmtFprintfInt-4 90.2ns ± 6% 89.8ns ± 5% ~ (p=0.659 n=40+38) FmtFprintfIntInt-4 140ns ± 5% 141ns ± 5% +1.00% (p=0.027 n=40+40) FmtFprintfPrefixedInt-4 185ns ± 3% 183ns ± 3% ~ (p=0.104 n=40+40) FmtFprintfFloat-4 411ns ± 4% 406ns ± 3% -1.37% (p=0.005 n=40+40) FmtManyArgs-4 590ns ± 4% 598ns ± 4% +1.35% (p=0.008 n=40+40) GobDecode-4 7.16ms ± 5% 7.10ms ± 5% ~ (p=0.335 n=40+40) GobEncode-4 6.85ms ± 7% 6.74ms ± 9% ~ (p=0.058 n=38+40) Gzip-4 400ms ± 4% 399ms ± 2% -0.34% (p=0.003 n=40+33) Gunzip-4 41.4ms ± 3% 41.4ms ± 4% -0.12% (p=0.020 n=40+40) HTTPClientServer-4 64.1µs ± 4% 63.5µs ± 2% -1.07% (p=0.000 n=39+37) JSONEncode-4 15.9ms ± 2% 15.9ms ± 3% ~ (p=0.103 n=40+40) JSONDecode-4 62.2ms ± 4% 61.6ms ± 3% -0.98% (p=0.006 n=39+40) Mandelbrot200-4 5.18ms ± 3% 5.14ms ± 4% ~ (p=0.125 n=40+40) GoParse-4 3.29ms ± 2% 3.27ms ± 2% -0.66% (p=0.006 n=40+40) RegexpMatchEasy0_32-4 103ns ± 4% 103ns ± 4% ~ (p=0.632 n=40+40) RegexpMatchEasy0_1K-4 830ns ± 3% 828ns ± 3% ~ (p=0.563 n=40+40) RegexpMatchEasy1_32-4 113ns ± 4% 113ns ± 4% ~ (p=0.494 n=40+40) RegexpMatchEasy1_1K-4 1.03µs ± 4% 1.03µs ± 4% ~ (p=0.665 n=40+40) RegexpMatchMedium_32-4 130ns ± 4% 129ns ± 3% ~ (p=0.458 n=40+40) RegexpMatchMedium_1K-4 39.4µs ± 3% 39.7µs ± 3% ~ (p=0.825 n=40+40) RegexpMatchHard_32-4 2.16µs ± 4% 2.15µs ± 4% ~ (p=0.137 n=40+40) RegexpMatchHard_1K-4 65.2µs ± 3% 65.4µs ± 4% ~ (p=0.160 n=40+40) Revcomp-4 1.87s ± 2% 1.87s ± 1% +0.17% (p=0.019 n=33+33) Template-4 69.4ms ± 3% 69.8ms ± 3% +0.60% (p=0.009 n=40+40) TimeParse-4 437ns ± 4% 438ns ± 4% ~ (p=0.234 n=40+40) TimeFormat-4 408ns ± 3% 408ns ± 3% ~ (p=0.904 n=40+40) [Geo mean] 65.7µs 65.6µs -0.08% name old speed new speed delta GobDecode-4 107MB/s ± 5% 108MB/s ± 5% ~ (p=0.336 n=40+40) GobEncode-4 112MB/s ± 6% 114MB/s ± 9% +1.95% (p=0.036 n=37+40) Gzip-4 48.5MB/s ± 4% 48.6MB/s ± 2% +0.28% (p=0.003 n=40+33) Gunzip-4 469MB/s ± 4% 469MB/s ± 4% +0.11% (p=0.021 n=40+40) JSONEncode-4 122MB/s ± 2% 122MB/s ± 3% ~ (p=0.105 n=40+40) JSONDecode-4 31.2MB/s ± 4% 31.5MB/s ± 4% +0.99% (p=0.007 n=39+40) GoParse-4 17.6MB/s ± 2% 17.7MB/s ± 2% +0.66% (p=0.007 n=40+40) RegexpMatchEasy0_32-4 310MB/s ± 4% 310MB/s ± 4% ~ (p=0.384 n=40+40) RegexpMatchEasy0_1K-4 1.23GB/s ± 3% 1.24GB/s ± 3% ~ (p=0.186 n=40+40) RegexpMatchEasy1_32-4 283MB/s ± 3% 281MB/s ± 4% ~ (p=0.855 n=40+40) RegexpMatchEasy1_1K-4 1.00GB/s ± 4% 1.00GB/s ± 4% ~ (p=0.665 n=40+40) RegexpMatchMedium_32-4 7.68MB/s ± 4% 7.73MB/s ± 3% ~ (p=0.359 n=40+40) RegexpMatchMedium_1K-4 26.0MB/s ± 3% 25.8MB/s ± 3% ~ (p=0.825 n=40+40) RegexpMatchHard_32-4 14.8MB/s ± 3% 14.9MB/s ± 4% ~ (p=0.136 n=40+40) RegexpMatchHard_1K-4 15.7MB/s ± 3% 15.7MB/s ± 4% ~ (p=0.150 n=40+40) Revcomp-4 136MB/s ± 1% 136MB/s ± 1% -0.09% (p=0.028 n=32+33) Template-4 28.0MB/s ± 3% 27.8MB/s ± 3% -0.59% (p=0.010 n=40+40) [Geo mean] 82.1MB/s 82.3MB/s +0.25% Change-Id: Ifa387a251056678326d3508aa02753b70bf7e5d0 Reviewed-on: https://go-review.googlesource.com/c/140303 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-10-06 13:13:48 +00:00
ssa.Op386MOVSDstoreidx8, ssa.Op386MOVSSstoreidx4, ssa.Op386MOVLstoreidx4, ssa.Op386MOVWstoreidx2,
ssa.Op386ADDLmodifyidx4, ssa.Op386SUBLmodifyidx4, ssa.Op386ANDLmodifyidx4, ssa.Op386ORLmodifyidx4, ssa.Op386XORLmodifyidx4:
r := v.Args[0].Reg()
i := v.Args[1].Reg()
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[2].Reg()
p.To.Type = obj.TYPE_MEM
switch v.Op {
case ssa.Op386MOVBstoreidx1, ssa.Op386MOVWstoreidx1, ssa.Op386MOVLstoreidx1, ssa.Op386MOVSSstoreidx1, ssa.Op386MOVSDstoreidx1:
if i == x86.REG_SP {
r, i = i, r
}
p.To.Scale = 1
case ssa.Op386MOVSDstoreidx8:
p.To.Scale = 8
cmd/compile: add indexed form for several 386 instructions This CL implements indexed memory operands for the following instructions. (ADD|SUB|MUL|AND|OR|XOR)Lload -> (ADD|SUB|MUL|AND|OR|XOR)Lloadidx4 (ADD|SUB|AND|OR|XOR)Lmodify -> (ADD|SUB|AND|OR|XOR)Lmodifyidx4 (ADD|AND|OR|XOR)Lconstmodify -> (ADD|AND|OR|XOR)Lconstmodifyidx4 1. The total size of pkg/linux_386/ decreases about 2.5KB, excluding cmd/compile/ . 2. There is little regression in the go1 benchmark test, excluding noise. name old time/op new time/op delta BinaryTree17-4 3.25s ± 3% 3.25s ± 3% ~ (p=0.218 n=40+40) Fannkuch11-4 3.53s ± 1% 3.53s ± 1% ~ (p=0.303 n=40+40) FmtFprintfEmpty-4 44.9ns ± 3% 45.6ns ± 3% +1.48% (p=0.030 n=40+36) FmtFprintfString-4 78.7ns ± 5% 80.1ns ± 7% ~ (p=0.217 n=36+40) FmtFprintfInt-4 90.2ns ± 6% 89.8ns ± 5% ~ (p=0.659 n=40+38) FmtFprintfIntInt-4 140ns ± 5% 141ns ± 5% +1.00% (p=0.027 n=40+40) FmtFprintfPrefixedInt-4 185ns ± 3% 183ns ± 3% ~ (p=0.104 n=40+40) FmtFprintfFloat-4 411ns ± 4% 406ns ± 3% -1.37% (p=0.005 n=40+40) FmtManyArgs-4 590ns ± 4% 598ns ± 4% +1.35% (p=0.008 n=40+40) GobDecode-4 7.16ms ± 5% 7.10ms ± 5% ~ (p=0.335 n=40+40) GobEncode-4 6.85ms ± 7% 6.74ms ± 9% ~ (p=0.058 n=38+40) Gzip-4 400ms ± 4% 399ms ± 2% -0.34% (p=0.003 n=40+33) Gunzip-4 41.4ms ± 3% 41.4ms ± 4% -0.12% (p=0.020 n=40+40) HTTPClientServer-4 64.1µs ± 4% 63.5µs ± 2% -1.07% (p=0.000 n=39+37) JSONEncode-4 15.9ms ± 2% 15.9ms ± 3% ~ (p=0.103 n=40+40) JSONDecode-4 62.2ms ± 4% 61.6ms ± 3% -0.98% (p=0.006 n=39+40) Mandelbrot200-4 5.18ms ± 3% 5.14ms ± 4% ~ (p=0.125 n=40+40) GoParse-4 3.29ms ± 2% 3.27ms ± 2% -0.66% (p=0.006 n=40+40) RegexpMatchEasy0_32-4 103ns ± 4% 103ns ± 4% ~ (p=0.632 n=40+40) RegexpMatchEasy0_1K-4 830ns ± 3% 828ns ± 3% ~ (p=0.563 n=40+40) RegexpMatchEasy1_32-4 113ns ± 4% 113ns ± 4% ~ (p=0.494 n=40+40) RegexpMatchEasy1_1K-4 1.03µs ± 4% 1.03µs ± 4% ~ (p=0.665 n=40+40) RegexpMatchMedium_32-4 130ns ± 4% 129ns ± 3% ~ (p=0.458 n=40+40) RegexpMatchMedium_1K-4 39.4µs ± 3% 39.7µs ± 3% ~ (p=0.825 n=40+40) RegexpMatchHard_32-4 2.16µs ± 4% 2.15µs ± 4% ~ (p=0.137 n=40+40) RegexpMatchHard_1K-4 65.2µs ± 3% 65.4µs ± 4% ~ (p=0.160 n=40+40) Revcomp-4 1.87s ± 2% 1.87s ± 1% +0.17% (p=0.019 n=33+33) Template-4 69.4ms ± 3% 69.8ms ± 3% +0.60% (p=0.009 n=40+40) TimeParse-4 437ns ± 4% 438ns ± 4% ~ (p=0.234 n=40+40) TimeFormat-4 408ns ± 3% 408ns ± 3% ~ (p=0.904 n=40+40) [Geo mean] 65.7µs 65.6µs -0.08% name old speed new speed delta GobDecode-4 107MB/s ± 5% 108MB/s ± 5% ~ (p=0.336 n=40+40) GobEncode-4 112MB/s ± 6% 114MB/s ± 9% +1.95% (p=0.036 n=37+40) Gzip-4 48.5MB/s ± 4% 48.6MB/s ± 2% +0.28% (p=0.003 n=40+33) Gunzip-4 469MB/s ± 4% 469MB/s ± 4% +0.11% (p=0.021 n=40+40) JSONEncode-4 122MB/s ± 2% 122MB/s ± 3% ~ (p=0.105 n=40+40) JSONDecode-4 31.2MB/s ± 4% 31.5MB/s ± 4% +0.99% (p=0.007 n=39+40) GoParse-4 17.6MB/s ± 2% 17.7MB/s ± 2% +0.66% (p=0.007 n=40+40) RegexpMatchEasy0_32-4 310MB/s ± 4% 310MB/s ± 4% ~ (p=0.384 n=40+40) RegexpMatchEasy0_1K-4 1.23GB/s ± 3% 1.24GB/s ± 3% ~ (p=0.186 n=40+40) RegexpMatchEasy1_32-4 283MB/s ± 3% 281MB/s ± 4% ~ (p=0.855 n=40+40) RegexpMatchEasy1_1K-4 1.00GB/s ± 4% 1.00GB/s ± 4% ~ (p=0.665 n=40+40) RegexpMatchMedium_32-4 7.68MB/s ± 4% 7.73MB/s ± 3% ~ (p=0.359 n=40+40) RegexpMatchMedium_1K-4 26.0MB/s ± 3% 25.8MB/s ± 3% ~ (p=0.825 n=40+40) RegexpMatchHard_32-4 14.8MB/s ± 3% 14.9MB/s ± 4% ~ (p=0.136 n=40+40) RegexpMatchHard_1K-4 15.7MB/s ± 3% 15.7MB/s ± 4% ~ (p=0.150 n=40+40) Revcomp-4 136MB/s ± 1% 136MB/s ± 1% -0.09% (p=0.028 n=32+33) Template-4 28.0MB/s ± 3% 27.8MB/s ± 3% -0.59% (p=0.010 n=40+40) [Geo mean] 82.1MB/s 82.3MB/s +0.25% Change-Id: Ifa387a251056678326d3508aa02753b70bf7e5d0 Reviewed-on: https://go-review.googlesource.com/c/140303 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-10-06 13:13:48 +00:00
case ssa.Op386MOVSSstoreidx4, ssa.Op386MOVLstoreidx4,
ssa.Op386ADDLmodifyidx4, ssa.Op386SUBLmodifyidx4, ssa.Op386ANDLmodifyidx4, ssa.Op386ORLmodifyidx4, ssa.Op386XORLmodifyidx4:
p.To.Scale = 4
case ssa.Op386MOVWstoreidx2:
p.To.Scale = 2
}
p.To.Reg = r
p.To.Index = i
gc.AddAux(&p.To, v)
case ssa.Op386MOVLstoreconst, ssa.Op386MOVWstoreconst, ssa.Op386MOVBstoreconst:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
sc := v.AuxValAndOff()
p.From.Offset = sc.Val()
p.To.Type = obj.TYPE_MEM
p.To.Reg = v.Args[0].Reg()
gc.AddAux2(&p.To, v, sc.Off())
cmd/compile: optimize 386's ADDLconstmodifyidx4 This CL optimize ADDLconstmodifyidx4 to INCL/DECL, when the constant is +1/-1. 1. The total size of pkg/linux_386/ decreases 28 bytes, excluding cmd/compile. 2. There is no regression in the go1 benchmark test, excluding noise. name old time/op new time/op delta BinaryTree17-4 3.25s ± 2% 3.23s ± 3% -0.70% (p=0.040 n=30+30) Fannkuch11-4 3.50s ± 1% 3.47s ± 1% -0.68% (p=0.000 n=30+30) FmtFprintfEmpty-4 44.6ns ± 3% 44.8ns ± 3% +0.46% (p=0.029 n=30+30) FmtFprintfString-4 79.0ns ± 3% 78.7ns ± 3% ~ (p=0.053 n=30+30) FmtFprintfInt-4 89.2ns ± 2% 89.4ns ± 3% ~ (p=0.665 n=30+29) FmtFprintfIntInt-4 142ns ± 3% 142ns ± 3% ~ (p=0.435 n=30+30) FmtFprintfPrefixedInt-4 182ns ± 2% 182ns ± 2% ~ (p=0.964 n=30+30) FmtFprintfFloat-4 407ns ± 3% 411ns ± 4% ~ (p=0.080 n=30+30) FmtManyArgs-4 597ns ± 3% 593ns ± 4% ~ (p=0.222 n=30+30) GobDecode-4 7.09ms ± 6% 7.07ms ± 7% ~ (p=0.633 n=30+30) GobEncode-4 6.81ms ± 9% 6.81ms ± 8% ~ (p=0.982 n=30+30) Gzip-4 398ms ± 4% 400ms ± 6% ~ (p=0.177 n=30+30) Gunzip-4 41.3ms ± 3% 40.6ms ± 4% -1.71% (p=0.005 n=30+30) HTTPClientServer-4 63.4µs ± 3% 63.4µs ± 4% ~ (p=0.646 n=30+28) JSONEncode-4 16.0ms ± 3% 16.1ms ± 3% ~ (p=0.057 n=30+30) JSONDecode-4 63.3ms ± 8% 63.1ms ± 7% ~ (p=0.786 n=30+30) Mandelbrot200-4 5.17ms ± 3% 5.15ms ± 8% ~ (p=0.654 n=30+30) GoParse-4 3.24ms ± 3% 3.23ms ± 2% ~ (p=0.091 n=30+30) RegexpMatchEasy0_32-4 103ns ± 4% 103ns ± 4% ~ (p=0.575 n=30+30) RegexpMatchEasy0_1K-4 823ns ± 2% 821ns ± 3% ~ (p=0.827 n=30+30) RegexpMatchEasy1_32-4 113ns ± 3% 112ns ± 3% ~ (p=0.076 n=30+30) RegexpMatchEasy1_1K-4 1.02µs ± 4% 1.01µs ± 5% ~ (p=0.087 n=30+30) RegexpMatchMedium_32-4 129ns ± 3% 127ns ± 4% -1.55% (p=0.009 n=30+30) RegexpMatchMedium_1K-4 39.3µs ± 4% 39.7µs ± 3% ~ (p=0.054 n=30+30) RegexpMatchHard_32-4 2.15µs ± 4% 2.15µs ± 4% ~ (p=0.712 n=30+30) RegexpMatchHard_1K-4 66.0µs ± 3% 65.1µs ± 3% -1.32% (p=0.002 n=30+30) Revcomp-4 1.85s ± 2% 1.85s ± 3% ~ (p=0.168 n=30+30) Template-4 69.5ms ± 7% 68.9ms ± 6% ~ (p=0.250 n=28+28) TimeParse-4 434ns ± 3% 432ns ± 4% ~ (p=0.629 n=30+30) TimeFormat-4 403ns ± 4% 408ns ± 3% +1.23% (p=0.019 n=30+29) [Geo mean] 65.5µs 65.3µs -0.20% name old speed new speed delta GobDecode-4 108MB/s ± 6% 109MB/s ± 6% ~ (p=0.636 n=30+30) GobEncode-4 113MB/s ±10% 113MB/s ± 9% ~ (p=0.982 n=30+30) Gzip-4 48.8MB/s ± 4% 48.6MB/s ± 5% ~ (p=0.178 n=30+30) Gunzip-4 470MB/s ± 3% 479MB/s ± 4% +1.72% (p=0.006 n=30+30) JSONEncode-4 121MB/s ± 3% 120MB/s ± 3% ~ (p=0.057 n=30+30) JSONDecode-4 30.7MB/s ± 8% 30.8MB/s ± 8% ~ (p=0.784 n=30+30) GoParse-4 17.9MB/s ± 3% 17.9MB/s ± 2% ~ (p=0.090 n=30+30) RegexpMatchEasy0_32-4 309MB/s ± 4% 309MB/s ± 3% ~ (p=0.530 n=30+30) RegexpMatchEasy0_1K-4 1.24GB/s ± 2% 1.25GB/s ± 3% ~ (p=0.976 n=30+30) RegexpMatchEasy1_32-4 282MB/s ± 3% 284MB/s ± 3% +0.81% (p=0.041 n=30+30) RegexpMatchEasy1_1K-4 1.00GB/s ± 3% 1.01GB/s ± 4% ~ (p=0.091 n=30+30) RegexpMatchMedium_32-4 7.71MB/s ± 3% 7.84MB/s ± 4% +1.71% (p=0.000 n=30+30) RegexpMatchMedium_1K-4 26.1MB/s ± 4% 25.8MB/s ± 3% ~ (p=0.051 n=30+30) RegexpMatchHard_32-4 14.9MB/s ± 4% 14.9MB/s ± 4% ~ (p=0.712 n=30+30) RegexpMatchHard_1K-4 15.5MB/s ± 3% 15.7MB/s ± 3% +1.34% (p=0.003 n=30+30) Revcomp-4 138MB/s ± 2% 137MB/s ± 3% ~ (p=0.174 n=30+30) Template-4 28.0MB/s ± 6% 28.2MB/s ± 6% ~ (p=0.251 n=28+28) [Geo mean] 82.3MB/s 82.6MB/s +0.36% Change-Id: I389829699ffe9500a013fcf31be58a97e98043e1 Reviewed-on: https://go-review.googlesource.com/c/140701 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-10-09 11:01:34 +00:00
case ssa.Op386ADDLconstmodifyidx4:
sc := v.AuxValAndOff()
val := sc.Val()
if val == 1 || val == -1 {
var p *obj.Prog
if val == 1 {
p = s.Prog(x86.AINCL)
} else {
p = s.Prog(x86.ADECL)
}
off := sc.Off()
p.To.Type = obj.TYPE_MEM
p.To.Reg = v.Args[0].Reg()
p.To.Scale = 4
p.To.Index = v.Args[1].Reg()
gc.AddAux2(&p.To, v, off)
break
}
fallthrough
cmd/compile: add indexed form for several 386 instructions This CL implements indexed memory operands for the following instructions. (ADD|SUB|MUL|AND|OR|XOR)Lload -> (ADD|SUB|MUL|AND|OR|XOR)Lloadidx4 (ADD|SUB|AND|OR|XOR)Lmodify -> (ADD|SUB|AND|OR|XOR)Lmodifyidx4 (ADD|AND|OR|XOR)Lconstmodify -> (ADD|AND|OR|XOR)Lconstmodifyidx4 1. The total size of pkg/linux_386/ decreases about 2.5KB, excluding cmd/compile/ . 2. There is little regression in the go1 benchmark test, excluding noise. name old time/op new time/op delta BinaryTree17-4 3.25s ± 3% 3.25s ± 3% ~ (p=0.218 n=40+40) Fannkuch11-4 3.53s ± 1% 3.53s ± 1% ~ (p=0.303 n=40+40) FmtFprintfEmpty-4 44.9ns ± 3% 45.6ns ± 3% +1.48% (p=0.030 n=40+36) FmtFprintfString-4 78.7ns ± 5% 80.1ns ± 7% ~ (p=0.217 n=36+40) FmtFprintfInt-4 90.2ns ± 6% 89.8ns ± 5% ~ (p=0.659 n=40+38) FmtFprintfIntInt-4 140ns ± 5% 141ns ± 5% +1.00% (p=0.027 n=40+40) FmtFprintfPrefixedInt-4 185ns ± 3% 183ns ± 3% ~ (p=0.104 n=40+40) FmtFprintfFloat-4 411ns ± 4% 406ns ± 3% -1.37% (p=0.005 n=40+40) FmtManyArgs-4 590ns ± 4% 598ns ± 4% +1.35% (p=0.008 n=40+40) GobDecode-4 7.16ms ± 5% 7.10ms ± 5% ~ (p=0.335 n=40+40) GobEncode-4 6.85ms ± 7% 6.74ms ± 9% ~ (p=0.058 n=38+40) Gzip-4 400ms ± 4% 399ms ± 2% -0.34% (p=0.003 n=40+33) Gunzip-4 41.4ms ± 3% 41.4ms ± 4% -0.12% (p=0.020 n=40+40) HTTPClientServer-4 64.1µs ± 4% 63.5µs ± 2% -1.07% (p=0.000 n=39+37) JSONEncode-4 15.9ms ± 2% 15.9ms ± 3% ~ (p=0.103 n=40+40) JSONDecode-4 62.2ms ± 4% 61.6ms ± 3% -0.98% (p=0.006 n=39+40) Mandelbrot200-4 5.18ms ± 3% 5.14ms ± 4% ~ (p=0.125 n=40+40) GoParse-4 3.29ms ± 2% 3.27ms ± 2% -0.66% (p=0.006 n=40+40) RegexpMatchEasy0_32-4 103ns ± 4% 103ns ± 4% ~ (p=0.632 n=40+40) RegexpMatchEasy0_1K-4 830ns ± 3% 828ns ± 3% ~ (p=0.563 n=40+40) RegexpMatchEasy1_32-4 113ns ± 4% 113ns ± 4% ~ (p=0.494 n=40+40) RegexpMatchEasy1_1K-4 1.03µs ± 4% 1.03µs ± 4% ~ (p=0.665 n=40+40) RegexpMatchMedium_32-4 130ns ± 4% 129ns ± 3% ~ (p=0.458 n=40+40) RegexpMatchMedium_1K-4 39.4µs ± 3% 39.7µs ± 3% ~ (p=0.825 n=40+40) RegexpMatchHard_32-4 2.16µs ± 4% 2.15µs ± 4% ~ (p=0.137 n=40+40) RegexpMatchHard_1K-4 65.2µs ± 3% 65.4µs ± 4% ~ (p=0.160 n=40+40) Revcomp-4 1.87s ± 2% 1.87s ± 1% +0.17% (p=0.019 n=33+33) Template-4 69.4ms ± 3% 69.8ms ± 3% +0.60% (p=0.009 n=40+40) TimeParse-4 437ns ± 4% 438ns ± 4% ~ (p=0.234 n=40+40) TimeFormat-4 408ns ± 3% 408ns ± 3% ~ (p=0.904 n=40+40) [Geo mean] 65.7µs 65.6µs -0.08% name old speed new speed delta GobDecode-4 107MB/s ± 5% 108MB/s ± 5% ~ (p=0.336 n=40+40) GobEncode-4 112MB/s ± 6% 114MB/s ± 9% +1.95% (p=0.036 n=37+40) Gzip-4 48.5MB/s ± 4% 48.6MB/s ± 2% +0.28% (p=0.003 n=40+33) Gunzip-4 469MB/s ± 4% 469MB/s ± 4% +0.11% (p=0.021 n=40+40) JSONEncode-4 122MB/s ± 2% 122MB/s ± 3% ~ (p=0.105 n=40+40) JSONDecode-4 31.2MB/s ± 4% 31.5MB/s ± 4% +0.99% (p=0.007 n=39+40) GoParse-4 17.6MB/s ± 2% 17.7MB/s ± 2% +0.66% (p=0.007 n=40+40) RegexpMatchEasy0_32-4 310MB/s ± 4% 310MB/s ± 4% ~ (p=0.384 n=40+40) RegexpMatchEasy0_1K-4 1.23GB/s ± 3% 1.24GB/s ± 3% ~ (p=0.186 n=40+40) RegexpMatchEasy1_32-4 283MB/s ± 3% 281MB/s ± 4% ~ (p=0.855 n=40+40) RegexpMatchEasy1_1K-4 1.00GB/s ± 4% 1.00GB/s ± 4% ~ (p=0.665 n=40+40) RegexpMatchMedium_32-4 7.68MB/s ± 4% 7.73MB/s ± 3% ~ (p=0.359 n=40+40) RegexpMatchMedium_1K-4 26.0MB/s ± 3% 25.8MB/s ± 3% ~ (p=0.825 n=40+40) RegexpMatchHard_32-4 14.8MB/s ± 3% 14.9MB/s ± 4% ~ (p=0.136 n=40+40) RegexpMatchHard_1K-4 15.7MB/s ± 3% 15.7MB/s ± 4% ~ (p=0.150 n=40+40) Revcomp-4 136MB/s ± 1% 136MB/s ± 1% -0.09% (p=0.028 n=32+33) Template-4 28.0MB/s ± 3% 27.8MB/s ± 3% -0.59% (p=0.010 n=40+40) [Geo mean] 82.1MB/s 82.3MB/s +0.25% Change-Id: Ifa387a251056678326d3508aa02753b70bf7e5d0 Reviewed-on: https://go-review.googlesource.com/c/140303 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-10-06 13:13:48 +00:00
case ssa.Op386MOVLstoreconstidx1, ssa.Op386MOVLstoreconstidx4, ssa.Op386MOVWstoreconstidx1, ssa.Op386MOVWstoreconstidx2, ssa.Op386MOVBstoreconstidx1,
cmd/compile: optimize 386's ADDLconstmodifyidx4 This CL optimize ADDLconstmodifyidx4 to INCL/DECL, when the constant is +1/-1. 1. The total size of pkg/linux_386/ decreases 28 bytes, excluding cmd/compile. 2. There is no regression in the go1 benchmark test, excluding noise. name old time/op new time/op delta BinaryTree17-4 3.25s ± 2% 3.23s ± 3% -0.70% (p=0.040 n=30+30) Fannkuch11-4 3.50s ± 1% 3.47s ± 1% -0.68% (p=0.000 n=30+30) FmtFprintfEmpty-4 44.6ns ± 3% 44.8ns ± 3% +0.46% (p=0.029 n=30+30) FmtFprintfString-4 79.0ns ± 3% 78.7ns ± 3% ~ (p=0.053 n=30+30) FmtFprintfInt-4 89.2ns ± 2% 89.4ns ± 3% ~ (p=0.665 n=30+29) FmtFprintfIntInt-4 142ns ± 3% 142ns ± 3% ~ (p=0.435 n=30+30) FmtFprintfPrefixedInt-4 182ns ± 2% 182ns ± 2% ~ (p=0.964 n=30+30) FmtFprintfFloat-4 407ns ± 3% 411ns ± 4% ~ (p=0.080 n=30+30) FmtManyArgs-4 597ns ± 3% 593ns ± 4% ~ (p=0.222 n=30+30) GobDecode-4 7.09ms ± 6% 7.07ms ± 7% ~ (p=0.633 n=30+30) GobEncode-4 6.81ms ± 9% 6.81ms ± 8% ~ (p=0.982 n=30+30) Gzip-4 398ms ± 4% 400ms ± 6% ~ (p=0.177 n=30+30) Gunzip-4 41.3ms ± 3% 40.6ms ± 4% -1.71% (p=0.005 n=30+30) HTTPClientServer-4 63.4µs ± 3% 63.4µs ± 4% ~ (p=0.646 n=30+28) JSONEncode-4 16.0ms ± 3% 16.1ms ± 3% ~ (p=0.057 n=30+30) JSONDecode-4 63.3ms ± 8% 63.1ms ± 7% ~ (p=0.786 n=30+30) Mandelbrot200-4 5.17ms ± 3% 5.15ms ± 8% ~ (p=0.654 n=30+30) GoParse-4 3.24ms ± 3% 3.23ms ± 2% ~ (p=0.091 n=30+30) RegexpMatchEasy0_32-4 103ns ± 4% 103ns ± 4% ~ (p=0.575 n=30+30) RegexpMatchEasy0_1K-4 823ns ± 2% 821ns ± 3% ~ (p=0.827 n=30+30) RegexpMatchEasy1_32-4 113ns ± 3% 112ns ± 3% ~ (p=0.076 n=30+30) RegexpMatchEasy1_1K-4 1.02µs ± 4% 1.01µs ± 5% ~ (p=0.087 n=30+30) RegexpMatchMedium_32-4 129ns ± 3% 127ns ± 4% -1.55% (p=0.009 n=30+30) RegexpMatchMedium_1K-4 39.3µs ± 4% 39.7µs ± 3% ~ (p=0.054 n=30+30) RegexpMatchHard_32-4 2.15µs ± 4% 2.15µs ± 4% ~ (p=0.712 n=30+30) RegexpMatchHard_1K-4 66.0µs ± 3% 65.1µs ± 3% -1.32% (p=0.002 n=30+30) Revcomp-4 1.85s ± 2% 1.85s ± 3% ~ (p=0.168 n=30+30) Template-4 69.5ms ± 7% 68.9ms ± 6% ~ (p=0.250 n=28+28) TimeParse-4 434ns ± 3% 432ns ± 4% ~ (p=0.629 n=30+30) TimeFormat-4 403ns ± 4% 408ns ± 3% +1.23% (p=0.019 n=30+29) [Geo mean] 65.5µs 65.3µs -0.20% name old speed new speed delta GobDecode-4 108MB/s ± 6% 109MB/s ± 6% ~ (p=0.636 n=30+30) GobEncode-4 113MB/s ±10% 113MB/s ± 9% ~ (p=0.982 n=30+30) Gzip-4 48.8MB/s ± 4% 48.6MB/s ± 5% ~ (p=0.178 n=30+30) Gunzip-4 470MB/s ± 3% 479MB/s ± 4% +1.72% (p=0.006 n=30+30) JSONEncode-4 121MB/s ± 3% 120MB/s ± 3% ~ (p=0.057 n=30+30) JSONDecode-4 30.7MB/s ± 8% 30.8MB/s ± 8% ~ (p=0.784 n=30+30) GoParse-4 17.9MB/s ± 3% 17.9MB/s ± 2% ~ (p=0.090 n=30+30) RegexpMatchEasy0_32-4 309MB/s ± 4% 309MB/s ± 3% ~ (p=0.530 n=30+30) RegexpMatchEasy0_1K-4 1.24GB/s ± 2% 1.25GB/s ± 3% ~ (p=0.976 n=30+30) RegexpMatchEasy1_32-4 282MB/s ± 3% 284MB/s ± 3% +0.81% (p=0.041 n=30+30) RegexpMatchEasy1_1K-4 1.00GB/s ± 3% 1.01GB/s ± 4% ~ (p=0.091 n=30+30) RegexpMatchMedium_32-4 7.71MB/s ± 3% 7.84MB/s ± 4% +1.71% (p=0.000 n=30+30) RegexpMatchMedium_1K-4 26.1MB/s ± 4% 25.8MB/s ± 3% ~ (p=0.051 n=30+30) RegexpMatchHard_32-4 14.9MB/s ± 4% 14.9MB/s ± 4% ~ (p=0.712 n=30+30) RegexpMatchHard_1K-4 15.5MB/s ± 3% 15.7MB/s ± 3% +1.34% (p=0.003 n=30+30) Revcomp-4 138MB/s ± 2% 137MB/s ± 3% ~ (p=0.174 n=30+30) Template-4 28.0MB/s ± 6% 28.2MB/s ± 6% ~ (p=0.251 n=28+28) [Geo mean] 82.3MB/s 82.6MB/s +0.36% Change-Id: I389829699ffe9500a013fcf31be58a97e98043e1 Reviewed-on: https://go-review.googlesource.com/c/140701 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-10-09 11:01:34 +00:00
ssa.Op386ANDLconstmodifyidx4, ssa.Op386ORLconstmodifyidx4, ssa.Op386XORLconstmodifyidx4:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
sc := v.AuxValAndOff()
p.From.Offset = sc.Val()
r := v.Args[0].Reg()
i := v.Args[1].Reg()
switch v.Op {
case ssa.Op386MOVBstoreconstidx1, ssa.Op386MOVWstoreconstidx1, ssa.Op386MOVLstoreconstidx1:
p.To.Scale = 1
if i == x86.REG_SP {
r, i = i, r
}
case ssa.Op386MOVWstoreconstidx2:
p.To.Scale = 2
cmd/compile: add indexed form for several 386 instructions This CL implements indexed memory operands for the following instructions. (ADD|SUB|MUL|AND|OR|XOR)Lload -> (ADD|SUB|MUL|AND|OR|XOR)Lloadidx4 (ADD|SUB|AND|OR|XOR)Lmodify -> (ADD|SUB|AND|OR|XOR)Lmodifyidx4 (ADD|AND|OR|XOR)Lconstmodify -> (ADD|AND|OR|XOR)Lconstmodifyidx4 1. The total size of pkg/linux_386/ decreases about 2.5KB, excluding cmd/compile/ . 2. There is little regression in the go1 benchmark test, excluding noise. name old time/op new time/op delta BinaryTree17-4 3.25s ± 3% 3.25s ± 3% ~ (p=0.218 n=40+40) Fannkuch11-4 3.53s ± 1% 3.53s ± 1% ~ (p=0.303 n=40+40) FmtFprintfEmpty-4 44.9ns ± 3% 45.6ns ± 3% +1.48% (p=0.030 n=40+36) FmtFprintfString-4 78.7ns ± 5% 80.1ns ± 7% ~ (p=0.217 n=36+40) FmtFprintfInt-4 90.2ns ± 6% 89.8ns ± 5% ~ (p=0.659 n=40+38) FmtFprintfIntInt-4 140ns ± 5% 141ns ± 5% +1.00% (p=0.027 n=40+40) FmtFprintfPrefixedInt-4 185ns ± 3% 183ns ± 3% ~ (p=0.104 n=40+40) FmtFprintfFloat-4 411ns ± 4% 406ns ± 3% -1.37% (p=0.005 n=40+40) FmtManyArgs-4 590ns ± 4% 598ns ± 4% +1.35% (p=0.008 n=40+40) GobDecode-4 7.16ms ± 5% 7.10ms ± 5% ~ (p=0.335 n=40+40) GobEncode-4 6.85ms ± 7% 6.74ms ± 9% ~ (p=0.058 n=38+40) Gzip-4 400ms ± 4% 399ms ± 2% -0.34% (p=0.003 n=40+33) Gunzip-4 41.4ms ± 3% 41.4ms ± 4% -0.12% (p=0.020 n=40+40) HTTPClientServer-4 64.1µs ± 4% 63.5µs ± 2% -1.07% (p=0.000 n=39+37) JSONEncode-4 15.9ms ± 2% 15.9ms ± 3% ~ (p=0.103 n=40+40) JSONDecode-4 62.2ms ± 4% 61.6ms ± 3% -0.98% (p=0.006 n=39+40) Mandelbrot200-4 5.18ms ± 3% 5.14ms ± 4% ~ (p=0.125 n=40+40) GoParse-4 3.29ms ± 2% 3.27ms ± 2% -0.66% (p=0.006 n=40+40) RegexpMatchEasy0_32-4 103ns ± 4% 103ns ± 4% ~ (p=0.632 n=40+40) RegexpMatchEasy0_1K-4 830ns ± 3% 828ns ± 3% ~ (p=0.563 n=40+40) RegexpMatchEasy1_32-4 113ns ± 4% 113ns ± 4% ~ (p=0.494 n=40+40) RegexpMatchEasy1_1K-4 1.03µs ± 4% 1.03µs ± 4% ~ (p=0.665 n=40+40) RegexpMatchMedium_32-4 130ns ± 4% 129ns ± 3% ~ (p=0.458 n=40+40) RegexpMatchMedium_1K-4 39.4µs ± 3% 39.7µs ± 3% ~ (p=0.825 n=40+40) RegexpMatchHard_32-4 2.16µs ± 4% 2.15µs ± 4% ~ (p=0.137 n=40+40) RegexpMatchHard_1K-4 65.2µs ± 3% 65.4µs ± 4% ~ (p=0.160 n=40+40) Revcomp-4 1.87s ± 2% 1.87s ± 1% +0.17% (p=0.019 n=33+33) Template-4 69.4ms ± 3% 69.8ms ± 3% +0.60% (p=0.009 n=40+40) TimeParse-4 437ns ± 4% 438ns ± 4% ~ (p=0.234 n=40+40) TimeFormat-4 408ns ± 3% 408ns ± 3% ~ (p=0.904 n=40+40) [Geo mean] 65.7µs 65.6µs -0.08% name old speed new speed delta GobDecode-4 107MB/s ± 5% 108MB/s ± 5% ~ (p=0.336 n=40+40) GobEncode-4 112MB/s ± 6% 114MB/s ± 9% +1.95% (p=0.036 n=37+40) Gzip-4 48.5MB/s ± 4% 48.6MB/s ± 2% +0.28% (p=0.003 n=40+33) Gunzip-4 469MB/s ± 4% 469MB/s ± 4% +0.11% (p=0.021 n=40+40) JSONEncode-4 122MB/s ± 2% 122MB/s ± 3% ~ (p=0.105 n=40+40) JSONDecode-4 31.2MB/s ± 4% 31.5MB/s ± 4% +0.99% (p=0.007 n=39+40) GoParse-4 17.6MB/s ± 2% 17.7MB/s ± 2% +0.66% (p=0.007 n=40+40) RegexpMatchEasy0_32-4 310MB/s ± 4% 310MB/s ± 4% ~ (p=0.384 n=40+40) RegexpMatchEasy0_1K-4 1.23GB/s ± 3% 1.24GB/s ± 3% ~ (p=0.186 n=40+40) RegexpMatchEasy1_32-4 283MB/s ± 3% 281MB/s ± 4% ~ (p=0.855 n=40+40) RegexpMatchEasy1_1K-4 1.00GB/s ± 4% 1.00GB/s ± 4% ~ (p=0.665 n=40+40) RegexpMatchMedium_32-4 7.68MB/s ± 4% 7.73MB/s ± 3% ~ (p=0.359 n=40+40) RegexpMatchMedium_1K-4 26.0MB/s ± 3% 25.8MB/s ± 3% ~ (p=0.825 n=40+40) RegexpMatchHard_32-4 14.8MB/s ± 3% 14.9MB/s ± 4% ~ (p=0.136 n=40+40) RegexpMatchHard_1K-4 15.7MB/s ± 3% 15.7MB/s ± 4% ~ (p=0.150 n=40+40) Revcomp-4 136MB/s ± 1% 136MB/s ± 1% -0.09% (p=0.028 n=32+33) Template-4 28.0MB/s ± 3% 27.8MB/s ± 3% -0.59% (p=0.010 n=40+40) [Geo mean] 82.1MB/s 82.3MB/s +0.25% Change-Id: Ifa387a251056678326d3508aa02753b70bf7e5d0 Reviewed-on: https://go-review.googlesource.com/c/140303 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-10-06 13:13:48 +00:00
case ssa.Op386MOVLstoreconstidx4,
ssa.Op386ADDLconstmodifyidx4, ssa.Op386ANDLconstmodifyidx4, ssa.Op386ORLconstmodifyidx4, ssa.Op386XORLconstmodifyidx4:
p.To.Scale = 4
}
p.To.Type = obj.TYPE_MEM
p.To.Reg = r
p.To.Index = i
gc.AddAux2(&p.To, v, sc.Off())
case ssa.Op386MOVWLSX, ssa.Op386MOVBLSX, ssa.Op386MOVWLZX, ssa.Op386MOVBLZX,
ssa.Op386CVTSL2SS, ssa.Op386CVTSL2SD,
ssa.Op386CVTTSS2SL, ssa.Op386CVTTSD2SL,
ssa.Op386CVTSS2SD, ssa.Op386CVTSD2SS:
opregreg(s, v.Op.Asm(), v.Reg(), v.Args[0].Reg())
case ssa.Op386DUFFZERO:
p := s.Prog(obj.ADUFFZERO)
p.To.Type = obj.TYPE_ADDR
p.To.Sym = gc.Duffzero
p.To.Offset = v.AuxInt
case ssa.Op386DUFFCOPY:
p := s.Prog(obj.ADUFFCOPY)
p.To.Type = obj.TYPE_ADDR
p.To.Sym = gc.Duffcopy
p.To.Offset = v.AuxInt
case ssa.OpCopy: // TODO: use MOVLreg for reg->reg copies instead of OpCopy?
if v.Type.IsMemory() {
return
}
x := v.Args[0].Reg()
y := v.Reg()
if x != y {
opregreg(s, moveByType(v.Type), y, x)
}
case ssa.OpLoadReg:
if v.Type.IsFlags() {
v.Fatalf("load flags not implemented: %v", v.LongString())
return
}
p := s.Prog(loadByType(v.Type))
gc.AddrAuto(&p.From, v.Args[0])
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpStoreReg:
if v.Type.IsFlags() {
v.Fatalf("store flags not implemented: %v", v.LongString())
return
}
p := s.Prog(storeByType(v.Type))
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
gc.AddrAuto(&p.To, v)
case ssa.Op386LoweredGetClosurePtr:
// Closure pointer is DX.
gc.CheckLoweredGetClosurePtr(v)
case ssa.Op386LoweredGetG:
r := v.Reg()
// See the comments in cmd/internal/obj/x86/obj6.go
// near CanUse1InsnTLS for a detailed explanation of these instructions.
if x86.CanUse1InsnTLS(gc.Ctxt) {
[dev.ssa] cmd/compile: fix PIC for SSA-generated code Access to globals requires a 2-instruction sequence on PIC 386. MOVL foo(SB), AX is translated by the obj package into: CALL getPCofNextInstructionInTempRegister(SB) MOVL (&foo-&thisInstruction)(tmpReg), AX The call returns the PC of the next instruction in a register. The next instruction then offsets from that register to get the address required. The tricky part is the allocation of the temp register. The legacy compiler always used CX, and forbid the register allocator from allocating CX when in PIC mode. We can't easily do that in SSA because CX is actually a required register for shift instructions. (I think the old backend got away with this because the register allocator never uses CX, only codegen knows that shifts must use CX.) Instead, we allow the temp register to be anything. When the destination of the MOV (or LEA) is an integer register, we can use that register. Otherwise, we make sure to compile the operation using an LEA to reference the global. So MOVL AX, foo(SB) is never generated directly. Instead, SSA generates: LEAL foo(SB), DX MOVL AX, (DX) which is then rewritten by the obj package to: CALL getPcInDX(SB) LEAL (&foo-&thisInstruction)(DX), AX MOVL AX, (DX) So this CL modifies the obj package to use different thunks to materialize the pc into different registers. We use the registers that regalloc chose so that SSA can still allocate the full set of registers. Change-Id: Ie095644f7164a026c62e95baf9d18a8bcaed0bba Reviewed-on: https://go-review.googlesource.com/25442 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-08-03 13:00:49 -07:00
// MOVL (TLS), r
p := s.Prog(x86.AMOVL)
p.From.Type = obj.TYPE_MEM
p.From.Reg = x86.REG_TLS
p.To.Type = obj.TYPE_REG
p.To.Reg = r
} else {
[dev.ssa] cmd/compile: fix PIC for SSA-generated code Access to globals requires a 2-instruction sequence on PIC 386. MOVL foo(SB), AX is translated by the obj package into: CALL getPCofNextInstructionInTempRegister(SB) MOVL (&foo-&thisInstruction)(tmpReg), AX The call returns the PC of the next instruction in a register. The next instruction then offsets from that register to get the address required. The tricky part is the allocation of the temp register. The legacy compiler always used CX, and forbid the register allocator from allocating CX when in PIC mode. We can't easily do that in SSA because CX is actually a required register for shift instructions. (I think the old backend got away with this because the register allocator never uses CX, only codegen knows that shifts must use CX.) Instead, we allow the temp register to be anything. When the destination of the MOV (or LEA) is an integer register, we can use that register. Otherwise, we make sure to compile the operation using an LEA to reference the global. So MOVL AX, foo(SB) is never generated directly. Instead, SSA generates: LEAL foo(SB), DX MOVL AX, (DX) which is then rewritten by the obj package to: CALL getPcInDX(SB) LEAL (&foo-&thisInstruction)(DX), AX MOVL AX, (DX) So this CL modifies the obj package to use different thunks to materialize the pc into different registers. We use the registers that regalloc chose so that SSA can still allocate the full set of registers. Change-Id: Ie095644f7164a026c62e95baf9d18a8bcaed0bba Reviewed-on: https://go-review.googlesource.com/25442 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-08-03 13:00:49 -07:00
// MOVL TLS, r
// MOVL (r)(TLS*1), r
p := s.Prog(x86.AMOVL)
p.From.Type = obj.TYPE_REG
p.From.Reg = x86.REG_TLS
p.To.Type = obj.TYPE_REG
p.To.Reg = r
q := s.Prog(x86.AMOVL)
q.From.Type = obj.TYPE_MEM
q.From.Reg = r
q.From.Index = x86.REG_TLS
q.From.Scale = 1
q.To.Type = obj.TYPE_REG
q.To.Reg = r
}
case ssa.Op386LoweredGetCallerPC:
p := s.Prog(x86.AMOVL)
p.From.Type = obj.TYPE_MEM
p.From.Offset = -4 // PC is stored 4 bytes below first parameter.
p.From.Name = obj.NAME_PARAM
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.Op386LoweredGetCallerSP:
// caller's SP is the address of the first arg
p := s.Prog(x86.AMOVL)
p.From.Type = obj.TYPE_ADDR
p.From.Offset = -gc.Ctxt.FixedFrameSize() // 0 on 386, just to be consistent with other architectures
p.From.Name = obj.NAME_PARAM
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.Op386LoweredWB:
p := s.Prog(obj.ACALL)
p.To.Type = obj.TYPE_MEM
p.To.Name = obj.NAME_EXTERN
p.To.Sym = v.Aux.(*obj.LSym)
case ssa.Op386LoweredPanicBoundsA, ssa.Op386LoweredPanicBoundsB, ssa.Op386LoweredPanicBoundsC:
p := s.Prog(obj.ACALL)
p.To.Type = obj.TYPE_MEM
p.To.Name = obj.NAME_EXTERN
p.To.Sym = gc.BoundsCheckFunc[v.AuxInt]
s.UseArgs(8) // space used in callee args area by assembly stubs
case ssa.Op386LoweredPanicExtendA, ssa.Op386LoweredPanicExtendB, ssa.Op386LoweredPanicExtendC:
p := s.Prog(obj.ACALL)
p.To.Type = obj.TYPE_MEM
p.To.Name = obj.NAME_EXTERN
p.To.Sym = gc.ExtendCheckFunc[v.AuxInt]
s.UseArgs(12) // space used in callee args area by assembly stubs
case ssa.Op386CALLstatic, ssa.Op386CALLclosure, ssa.Op386CALLinter:
s.Call(v)
case ssa.Op386NEGL,
ssa.Op386BSWAPL,
ssa.Op386NOTL:
r := v.Reg()
if r != v.Args[0].Reg() {
v.Fatalf("input[0] and output not in same register %s", v.LongString())
}
p := s.Prog(v.Op.Asm())
p.To.Type = obj.TYPE_REG
p.To.Reg = r
case ssa.Op386BSFL, ssa.Op386BSFW,
ssa.Op386BSRL, ssa.Op386BSRW,
ssa.Op386SQRTSD:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.Op386SETEQ, ssa.Op386SETNE,
ssa.Op386SETL, ssa.Op386SETLE,
ssa.Op386SETG, ssa.Op386SETGE,
ssa.Op386SETGF, ssa.Op386SETGEF,
ssa.Op386SETB, ssa.Op386SETBE,
ssa.Op386SETORD, ssa.Op386SETNAN,
ssa.Op386SETA, ssa.Op386SETAE,
ssa.Op386SETO:
p := s.Prog(v.Op.Asm())
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.Op386SETNEF:
p := s.Prog(v.Op.Asm())
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
q := s.Prog(x86.ASETPS)
q.To.Type = obj.TYPE_REG
q.To.Reg = x86.REG_AX
opregreg(s, x86.AORL, v.Reg(), x86.REG_AX)
case ssa.Op386SETEQF:
p := s.Prog(v.Op.Asm())
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
q := s.Prog(x86.ASETPC)
q.To.Type = obj.TYPE_REG
q.To.Reg = x86.REG_AX
opregreg(s, x86.AANDL, v.Reg(), x86.REG_AX)
case ssa.Op386InvertFlags:
v.Fatalf("InvertFlags should never make it to codegen %v", v.LongString())
case ssa.Op386FlagEQ, ssa.Op386FlagLT_ULT, ssa.Op386FlagLT_UGT, ssa.Op386FlagGT_ULT, ssa.Op386FlagGT_UGT:
v.Fatalf("Flag* ops should never make it to codegen %v", v.LongString())
case ssa.Op386REPSTOSL:
s.Prog(x86.AREP)
s.Prog(x86.ASTOSL)
case ssa.Op386REPMOVSL:
s.Prog(x86.AREP)
s.Prog(x86.AMOVSL)
case ssa.Op386LoweredNilCheck:
// Issue a load which will fault if the input is nil.
// TODO: We currently use the 2-byte instruction TESTB AX, (reg).
// Should we use the 3-byte TESTB $0, (reg) instead? It is larger
// but it doesn't have false dependency on AX.
// Or maybe allocate an output register and use MOVL (reg),reg2 ?
// That trades clobbering flags for clobbering a register.
p := s.Prog(x86.ATESTB)
p.From.Type = obj.TYPE_REG
p.From.Reg = x86.REG_AX
p.To.Type = obj.TYPE_MEM
p.To.Reg = v.Args[0].Reg()
gc.AddAux(&p.To, v)
if logopt.Enabled() {
logopt.LogOpt(v.Pos, "nilcheck", "genssa", v.Block.Func.Name)
}
if gc.Debug_checknil != 0 && v.Pos.Line() > 1 { // v.Pos.Line()==1 in generated wrappers
gc.Warnl(v.Pos, "generated nil check")
}
case ssa.Op386FCHS:
v.Fatalf("FCHS in non-387 mode")
case ssa.OpClobber:
p := s.Prog(x86.AMOVL)
p.From.Type = obj.TYPE_CONST
p.From.Offset = 0xdeaddead
p.To.Type = obj.TYPE_MEM
p.To.Reg = x86.REG_SP
gc.AddAux(&p.To, v)
default:
v.Fatalf("genValue not implemented: %s", v.LongString())
}
}
var blockJump = [...]struct {
asm, invasm obj.As
}{
ssa.Block386EQ: {x86.AJEQ, x86.AJNE},
ssa.Block386NE: {x86.AJNE, x86.AJEQ},
ssa.Block386LT: {x86.AJLT, x86.AJGE},
ssa.Block386GE: {x86.AJGE, x86.AJLT},
ssa.Block386LE: {x86.AJLE, x86.AJGT},
ssa.Block386GT: {x86.AJGT, x86.AJLE},
ssa.Block386OS: {x86.AJOS, x86.AJOC},
ssa.Block386OC: {x86.AJOC, x86.AJOS},
ssa.Block386ULT: {x86.AJCS, x86.AJCC},
ssa.Block386UGE: {x86.AJCC, x86.AJCS},
ssa.Block386UGT: {x86.AJHI, x86.AJLS},
ssa.Block386ULE: {x86.AJLS, x86.AJHI},
ssa.Block386ORD: {x86.AJPC, x86.AJPS},
ssa.Block386NAN: {x86.AJPS, x86.AJPC},
}
var eqfJumps = [2][2]gc.FloatingEQNEJump{
{{Jump: x86.AJNE, Index: 1}, {Jump: x86.AJPS, Index: 1}}, // next == b.Succs[0]
{{Jump: x86.AJNE, Index: 1}, {Jump: x86.AJPC, Index: 0}}, // next == b.Succs[1]
}
var nefJumps = [2][2]gc.FloatingEQNEJump{
{{Jump: x86.AJNE, Index: 0}, {Jump: x86.AJPC, Index: 1}}, // next == b.Succs[0]
{{Jump: x86.AJNE, Index: 0}, {Jump: x86.AJPS, Index: 0}}, // next == b.Succs[1]
}
func ssaGenBlock(s *gc.SSAGenState, b, next *ssa.Block) {
switch b.Kind {
case ssa.BlockPlain:
if b.Succs[0].Block() != next {
p := s.Prog(obj.AJMP)
p.To.Type = obj.TYPE_BRANCH
s.Branches = append(s.Branches, gc.Branch{P: p, B: b.Succs[0].Block()})
}
case ssa.BlockDefer:
// defer returns in rax:
// 0 if we should continue executing
// 1 if we should jump to deferreturn call
p := s.Prog(x86.ATESTL)
p.From.Type = obj.TYPE_REG
p.From.Reg = x86.REG_AX
p.To.Type = obj.TYPE_REG
p.To.Reg = x86.REG_AX
p = s.Prog(x86.AJNE)
p.To.Type = obj.TYPE_BRANCH
s.Branches = append(s.Branches, gc.Branch{P: p, B: b.Succs[1].Block()})
if b.Succs[0].Block() != next {
p := s.Prog(obj.AJMP)
p.To.Type = obj.TYPE_BRANCH
s.Branches = append(s.Branches, gc.Branch{P: p, B: b.Succs[0].Block()})
}
case ssa.BlockExit:
case ssa.BlockRet:
s.Prog(obj.ARET)
case ssa.BlockRetJmp:
p := s.Prog(obj.AJMP)
p.To.Type = obj.TYPE_MEM
p.To.Name = obj.NAME_EXTERN
p.To.Sym = b.Aux.(*obj.LSym)
case ssa.Block386EQF:
s.FPJump(b, next, &eqfJumps)
case ssa.Block386NEF:
s.FPJump(b, next, &nefJumps)
case ssa.Block386EQ, ssa.Block386NE,
ssa.Block386LT, ssa.Block386GE,
ssa.Block386LE, ssa.Block386GT,
ssa.Block386OS, ssa.Block386OC,
ssa.Block386ULT, ssa.Block386UGT,
ssa.Block386ULE, ssa.Block386UGE:
jmp := blockJump[b.Kind]
switch next {
case b.Succs[0].Block():
s.Br(jmp.invasm, b.Succs[1].Block())
case b.Succs[1].Block():
s.Br(jmp.asm, b.Succs[0].Block())
default:
if b.Likely != ssa.BranchUnlikely {
s.Br(jmp.asm, b.Succs[0].Block())
s.Br(obj.AJMP, b.Succs[1].Block())
} else {
s.Br(jmp.invasm, b.Succs[1].Block())
s.Br(obj.AJMP, b.Succs[0].Block())
}
}
default:
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
b.Fatalf("branch not implemented: %s", b.LongString())
}
}