go/src/cmd/internal/obj/x86/ytab.go

45 lines
1.2 KiB
Go
Raw Normal View History

cmd/internal/obj: change Prog.From3 to RestArgs ([]Addr) This change makes it easier to express instructions with arbitrary number of operands. Rationale: previous approach with operand "hiding" does not scale well, AVX and especially AVX512 have many instructions with 3+ operands. x86 asm backend is updated to handle up to 6 explicit operands. It also fixes issue with 4-th immediate operand type checks. All `ytab` tables are updated accordingly. Changes to non-x86 backends only include these patterns: `p.From3 = X` => `p.SetFrom3(X)` `p.From3.X = Y` => `p.GetFrom3().X = Y` Over time, other backends can adapt Prog.RestArgs and reduce the amount of workarounds. -- Performance -- x/benchmark/build: $ benchstat upstream.bench patched.bench name old time/op new time/op delta Build-48 21.7s ± 2% 21.8s ± 2% ~ (p=0.218 n=10+10) name old binary-size new binary-size delta Build-48 10.3M ± 0% 10.3M ± 0% ~ (all equal) name old build-time/op new build-time/op delta Build-48 21.7s ± 2% 21.8s ± 2% ~ (p=0.218 n=10+10) name old build-peak-RSS-bytes new build-peak-RSS-bytes delta Build-48 145MB ± 5% 148MB ± 5% ~ (p=0.218 n=10+10) name old build-user+sys-time/op new build-user+sys-time/op delta Build-48 21.0s ± 2% 21.2s ± 2% ~ (p=0.075 n=10+10) Microbenchmark shows a slight slowdown. name old time/op new time/op delta AMD64asm-4 49.5ms ± 1% 49.9ms ± 1% +0.67% (p=0.001 n=23+15) func BenchmarkAMD64asm(b *testing.B) { for i := 0; i < b.N; i++ { TestAMD64EndToEnd(nil) TestAMD64Encoder(nil) } } Change-Id: I4f1d37b5c2c966da3f2127705ccac9bff0038183 Reviewed-on: https://go-review.googlesource.com/63490 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-09-13 14:32:08 +03:00
// Copyright 2017 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package x86
// argListMax specifies upper arg count limit expected to be carried by obj.Prog.
// Max len(obj.Prog.RestArgs) can be inferred from this to be 4.
const argListMax int = 6
type argList [argListMax]uint8
cmd/internal/obj: change Prog.From3 to RestArgs ([]Addr) This change makes it easier to express instructions with arbitrary number of operands. Rationale: previous approach with operand "hiding" does not scale well, AVX and especially AVX512 have many instructions with 3+ operands. x86 asm backend is updated to handle up to 6 explicit operands. It also fixes issue with 4-th immediate operand type checks. All `ytab` tables are updated accordingly. Changes to non-x86 backends only include these patterns: `p.From3 = X` => `p.SetFrom3(X)` `p.From3.X = Y` => `p.GetFrom3().X = Y` Over time, other backends can adapt Prog.RestArgs and reduce the amount of workarounds. -- Performance -- x/benchmark/build: $ benchstat upstream.bench patched.bench name old time/op new time/op delta Build-48 21.7s ± 2% 21.8s ± 2% ~ (p=0.218 n=10+10) name old binary-size new binary-size delta Build-48 10.3M ± 0% 10.3M ± 0% ~ (all equal) name old build-time/op new build-time/op delta Build-48 21.7s ± 2% 21.8s ± 2% ~ (p=0.218 n=10+10) name old build-peak-RSS-bytes new build-peak-RSS-bytes delta Build-48 145MB ± 5% 148MB ± 5% ~ (p=0.218 n=10+10) name old build-user+sys-time/op new build-user+sys-time/op delta Build-48 21.0s ± 2% 21.2s ± 2% ~ (p=0.075 n=10+10) Microbenchmark shows a slight slowdown. name old time/op new time/op delta AMD64asm-4 49.5ms ± 1% 49.9ms ± 1% +0.67% (p=0.001 n=23+15) func BenchmarkAMD64asm(b *testing.B) { for i := 0; i < b.N; i++ { TestAMD64EndToEnd(nil) TestAMD64Encoder(nil) } } Change-Id: I4f1d37b5c2c966da3f2127705ccac9bff0038183 Reviewed-on: https://go-review.googlesource.com/63490 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-09-13 14:32:08 +03:00
type ytab struct {
zcase uint8
zoffset uint8
// Last arg is usually destination.
// For unary instructions unaryDst is used to determine
// if single argument is a source or destination.
args argList
}
// Returns true if yt is compatible with args.
//
// Elements from args and yt.args are used to
// to index ycover table like `ycover[args[i]+yt.args[i]]`.
// This means that args should contain values that already
// multiplied by Ymax.
func (yt *ytab) match(args []int) bool {
// Trailing Yxxx check is required to avoid a case
// where shorter arg list is matched.
// If we had exact yt.args length, it could be `yt.argc != len(args)`.
if len(args) < len(yt.args) && yt.args[len(args)] != Yxxx {
return false
}
for i := range args {
if ycover[args[i]+int(yt.args[i])] == 0 {
return false
}
}
return true
}