go/src/cmd/compile/internal/ssa/gen/genericOps.go

601 lines
34 KiB
Go
Raw Normal View History

// Copyright 2015 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// +build ignore
package main
// Generic opcodes typically specify a width. The inputs and outputs
// of that op are the given number of bits wide. There is no notion of
// "sign", so Add32 can be used both for signed and unsigned 32-bit
// addition.
// Signed/unsigned is explicit with the extension ops
// (SignExt*/ZeroExt*) and implicit as the arg to some opcodes
// (e.g. the second argument to shifts is unsigned). If not mentioned,
// all args take signed inputs, or don't care whether their inputs
// are signed or unsigned.
var genericOps = []opData{
// 2-input arithmetic
// Types must be consistent with Go typing. Add, for example, must take two values
// of the same type and produces that same type.
{name: "Add8", argLength: 2, commutative: true}, // arg0 + arg1
{name: "Add16", argLength: 2, commutative: true},
{name: "Add32", argLength: 2, commutative: true},
{name: "Add64", argLength: 2, commutative: true},
{name: "AddPtr", argLength: 2}, // For address calculations. arg0 is a pointer and arg1 is an int.
cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-03-30 03:30:22 +00:00
{name: "Add32F", argLength: 2, commutative: true},
{name: "Add64F", argLength: 2, commutative: true},
{name: "Sub8", argLength: 2}, // arg0 - arg1
{name: "Sub16", argLength: 2},
{name: "Sub32", argLength: 2},
{name: "Sub64", argLength: 2},
{name: "SubPtr", argLength: 2},
{name: "Sub32F", argLength: 2},
{name: "Sub64F", argLength: 2},
{name: "Mul8", argLength: 2, commutative: true}, // arg0 * arg1
{name: "Mul16", argLength: 2, commutative: true},
{name: "Mul32", argLength: 2, commutative: true},
{name: "Mul64", argLength: 2, commutative: true},
cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-03-30 03:30:22 +00:00
{name: "Mul32F", argLength: 2, commutative: true},
{name: "Mul64F", argLength: 2, commutative: true},
{name: "Div32F", argLength: 2}, // arg0 / arg1
{name: "Div64F", argLength: 2},
cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-03-30 03:30:22 +00:00
{name: "Hmul32", argLength: 2, commutative: true},
{name: "Hmul32u", argLength: 2, commutative: true},
{name: "Hmul64", argLength: 2, commutative: true},
{name: "Hmul64u", argLength: 2, commutative: true},
cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-03-30 03:30:22 +00:00
{name: "Mul32uhilo", argLength: 2, typ: "(UInt32,UInt32)", commutative: true}, // arg0 * arg1, returns (hi, lo)
{name: "Mul64uhilo", argLength: 2, typ: "(UInt64,UInt64)", commutative: true}, // arg0 * arg1, returns (hi, lo)
{name: "Mul32uover", argLength: 2, typ: "(UInt32,Bool)", commutative: true}, // Let x = arg0*arg1 (full 32x32-> 64 unsigned multiply), returns (uint32(x), (uint32(x) != x))
{name: "Mul64uover", argLength: 2, typ: "(UInt64,Bool)", commutative: true}, // Let x = arg0*arg1 (full 64x64->128 unsigned multiply), returns (uint64(x), (uint64(x) != x))
// Weird special instructions for use in the strength reduction of divides.
// These ops compute unsigned (arg0 + arg1) / 2, correct to all
// 32/64 bits, even when the intermediate result of the add has 33/65 bits.
// These ops can assume arg0 >= arg1.
cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-03-30 03:30:22 +00:00
// Note: these ops aren't commutative!
{name: "Avg32u", argLength: 2, typ: "UInt32"}, // 32-bit platforms only
{name: "Avg64u", argLength: 2, typ: "UInt64"}, // 64-bit platforms only
// For Div16, Div32 and Div64, AuxInt non-zero means that the divisor has been proved to be not -1
// or that the dividend is not the most negative value.
{name: "Div8", argLength: 2}, // arg0 / arg1, signed
{name: "Div8u", argLength: 2}, // arg0 / arg1, unsigned
{name: "Div16", argLength: 2, aux: "Bool"},
{name: "Div16u", argLength: 2},
{name: "Div32", argLength: 2, aux: "Bool"},
{name: "Div32u", argLength: 2},
{name: "Div64", argLength: 2, aux: "Bool"},
{name: "Div64u", argLength: 2},
{name: "Div128u", argLength: 3}, // arg0:arg1 / arg2 (128-bit divided by 64-bit), returns (q, r)
// For Mod16, Mod32 and Mod64, AuxInt non-zero means that the divisor has been proved to be not -1.
{name: "Mod8", argLength: 2}, // arg0 % arg1, signed
{name: "Mod8u", argLength: 2}, // arg0 % arg1, unsigned
{name: "Mod16", argLength: 2, aux: "Bool"},
{name: "Mod16u", argLength: 2},
{name: "Mod32", argLength: 2, aux: "Bool"},
{name: "Mod32u", argLength: 2},
{name: "Mod64", argLength: 2, aux: "Bool"},
{name: "Mod64u", argLength: 2},
{name: "And8", argLength: 2, commutative: true}, // arg0 & arg1
{name: "And16", argLength: 2, commutative: true},
{name: "And32", argLength: 2, commutative: true},
{name: "And64", argLength: 2, commutative: true},
{name: "Or8", argLength: 2, commutative: true}, // arg0 | arg1
{name: "Or16", argLength: 2, commutative: true},
{name: "Or32", argLength: 2, commutative: true},
{name: "Or64", argLength: 2, commutative: true},
{name: "Xor8", argLength: 2, commutative: true}, // arg0 ^ arg1
{name: "Xor16", argLength: 2, commutative: true},
{name: "Xor32", argLength: 2, commutative: true},
{name: "Xor64", argLength: 2, commutative: true},
// For shifts, AxB means the shifted value has A bits and the shift amount has B bits.
// Shift amounts are considered unsigned.
cmd/compile: simplify shifts using bounds from prove pass The prove pass sometimes has bounds information that later rewrite passes do not. Use this information to mark shifts as bounded, and then use that information to generate better code on amd64. It may prove to be helpful on other architectures, too. While here, coalesce the existing shift lowering rules. This triggers 35 times building std+cmd. The full list is below. Here's an example from runtime.heapBitsSetType: if nb < 8 { b |= uintptr(*p) << nb p = add1(p) } else { nb -= 8 } We now generate better code on amd64 for that left shift. Updates #25087 vendor/golang_org/x/crypto/curve25519/mont25519_amd64.go:48:20: Proved Rsh8Ux64 bounded runtime/mbitmap.go:1252:22: Proved Lsh64x64 bounded runtime/mbitmap.go:1265:16: Proved Lsh64x64 bounded runtime/mbitmap.go:1275:28: Proved Lsh64x64 bounded runtime/mbitmap.go:1645:25: Proved Lsh64x64 bounded runtime/mbitmap.go:1663:25: Proved Lsh64x64 bounded runtime/mbitmap.go:1808:41: Proved Lsh64x64 bounded runtime/mbitmap.go:1831:49: Proved Lsh64x64 bounded syscall/route_bsd.go:227:23: Proved Lsh32x64 bounded syscall/route_bsd.go:295:23: Proved Lsh32x64 bounded syscall/route_darwin.go:40:23: Proved Lsh32x64 bounded compress/bzip2/bzip2.go:384:26: Proved Lsh64x16 bounded vendor/golang_org/x/net/route/address.go:370:14: Proved Lsh64x64 bounded compress/flate/inflate.go:201:54: Proved Lsh64x64 bounded math/big/prime.go:50:25: Proved Lsh64x64 bounded vendor/golang_org/x/crypto/cryptobyte/asn1.go:464:43: Proved Lsh8x8 bounded net/ip.go:87:21: Proved Rsh8Ux64 bounded cmd/internal/goobj/read.go:267:23: Proved Lsh64x64 bounded cmd/vendor/golang.org/x/arch/arm64/arm64asm/decode.go:534:27: Proved Lsh32x32 bounded cmd/vendor/golang.org/x/arch/arm64/arm64asm/decode.go:544:27: Proved Lsh32x32 bounded cmd/internal/obj/arm/asm5.go:1044:16: Proved Lsh32x64 bounded cmd/internal/obj/arm/asm5.go:1065:10: Proved Lsh32x32 bounded cmd/internal/obj/mips/obj0.go:1311:21: Proved Lsh32x64 bounded cmd/compile/internal/syntax/scanner.go:352:23: Proved Lsh64x64 bounded go/types/expr.go:222:36: Proved Lsh64x64 bounded crypto/x509/x509.go:1626:9: Proved Rsh8Ux64 bounded cmd/link/internal/loadelf/ldelf.go:823:22: Proved Lsh8x64 bounded net/http/h2_bundle.go:1470:17: Proved Lsh8x8 bounded net/http/h2_bundle.go:1477:46: Proved Lsh8x8 bounded net/http/h2_bundle.go:1481:31: Proved Lsh64x8 bounded cmd/compile/internal/ssa/rewriteARM64.go:18759:17: Proved Lsh64x64 bounded cmd/compile/internal/ssa/sparsemap.go:70:23: Proved Lsh32x64 bounded cmd/compile/internal/ssa/sparsemap.go:73:45: Proved Lsh32x64 bounded Change-Id: I58bb72f3e6f12f6ac69be633ea7222c245438142 Reviewed-on: https://go-review.googlesource.com/109776 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-04-26 20:56:03 -07:00
// If arg1 is known to be less than the number of bits in arg0,
// then auxInt may be set to 1.
cmd/compile: simplify shifts using bounds from prove pass The prove pass sometimes has bounds information that later rewrite passes do not. Use this information to mark shifts as bounded, and then use that information to generate better code on amd64. It may prove to be helpful on other architectures, too. While here, coalesce the existing shift lowering rules. This triggers 35 times building std+cmd. The full list is below. Here's an example from runtime.heapBitsSetType: if nb < 8 { b |= uintptr(*p) << nb p = add1(p) } else { nb -= 8 } We now generate better code on amd64 for that left shift. Updates #25087 vendor/golang_org/x/crypto/curve25519/mont25519_amd64.go:48:20: Proved Rsh8Ux64 bounded runtime/mbitmap.go:1252:22: Proved Lsh64x64 bounded runtime/mbitmap.go:1265:16: Proved Lsh64x64 bounded runtime/mbitmap.go:1275:28: Proved Lsh64x64 bounded runtime/mbitmap.go:1645:25: Proved Lsh64x64 bounded runtime/mbitmap.go:1663:25: Proved Lsh64x64 bounded runtime/mbitmap.go:1808:41: Proved Lsh64x64 bounded runtime/mbitmap.go:1831:49: Proved Lsh64x64 bounded syscall/route_bsd.go:227:23: Proved Lsh32x64 bounded syscall/route_bsd.go:295:23: Proved Lsh32x64 bounded syscall/route_darwin.go:40:23: Proved Lsh32x64 bounded compress/bzip2/bzip2.go:384:26: Proved Lsh64x16 bounded vendor/golang_org/x/net/route/address.go:370:14: Proved Lsh64x64 bounded compress/flate/inflate.go:201:54: Proved Lsh64x64 bounded math/big/prime.go:50:25: Proved Lsh64x64 bounded vendor/golang_org/x/crypto/cryptobyte/asn1.go:464:43: Proved Lsh8x8 bounded net/ip.go:87:21: Proved Rsh8Ux64 bounded cmd/internal/goobj/read.go:267:23: Proved Lsh64x64 bounded cmd/vendor/golang.org/x/arch/arm64/arm64asm/decode.go:534:27: Proved Lsh32x32 bounded cmd/vendor/golang.org/x/arch/arm64/arm64asm/decode.go:544:27: Proved Lsh32x32 bounded cmd/internal/obj/arm/asm5.go:1044:16: Proved Lsh32x64 bounded cmd/internal/obj/arm/asm5.go:1065:10: Proved Lsh32x32 bounded cmd/internal/obj/mips/obj0.go:1311:21: Proved Lsh32x64 bounded cmd/compile/internal/syntax/scanner.go:352:23: Proved Lsh64x64 bounded go/types/expr.go:222:36: Proved Lsh64x64 bounded crypto/x509/x509.go:1626:9: Proved Rsh8Ux64 bounded cmd/link/internal/loadelf/ldelf.go:823:22: Proved Lsh8x64 bounded net/http/h2_bundle.go:1470:17: Proved Lsh8x8 bounded net/http/h2_bundle.go:1477:46: Proved Lsh8x8 bounded net/http/h2_bundle.go:1481:31: Proved Lsh64x8 bounded cmd/compile/internal/ssa/rewriteARM64.go:18759:17: Proved Lsh64x64 bounded cmd/compile/internal/ssa/sparsemap.go:70:23: Proved Lsh32x64 bounded cmd/compile/internal/ssa/sparsemap.go:73:45: Proved Lsh32x64 bounded Change-Id: I58bb72f3e6f12f6ac69be633ea7222c245438142 Reviewed-on: https://go-review.googlesource.com/109776 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-04-26 20:56:03 -07:00
// This enables better code generation on some platforms.
{name: "Lsh8x8", argLength: 2, aux: "Bool"}, // arg0 << arg1
{name: "Lsh8x16", argLength: 2, aux: "Bool"},
{name: "Lsh8x32", argLength: 2, aux: "Bool"},
{name: "Lsh8x64", argLength: 2, aux: "Bool"},
{name: "Lsh16x8", argLength: 2, aux: "Bool"},
{name: "Lsh16x16", argLength: 2, aux: "Bool"},
{name: "Lsh16x32", argLength: 2, aux: "Bool"},
{name: "Lsh16x64", argLength: 2, aux: "Bool"},
{name: "Lsh32x8", argLength: 2, aux: "Bool"},
{name: "Lsh32x16", argLength: 2, aux: "Bool"},
{name: "Lsh32x32", argLength: 2, aux: "Bool"},
{name: "Lsh32x64", argLength: 2, aux: "Bool"},
{name: "Lsh64x8", argLength: 2, aux: "Bool"},
{name: "Lsh64x16", argLength: 2, aux: "Bool"},
{name: "Lsh64x32", argLength: 2, aux: "Bool"},
{name: "Lsh64x64", argLength: 2, aux: "Bool"},
{name: "Rsh8x8", argLength: 2, aux: "Bool"}, // arg0 >> arg1, signed
{name: "Rsh8x16", argLength: 2, aux: "Bool"},
{name: "Rsh8x32", argLength: 2, aux: "Bool"},
{name: "Rsh8x64", argLength: 2, aux: "Bool"},
{name: "Rsh16x8", argLength: 2, aux: "Bool"},
{name: "Rsh16x16", argLength: 2, aux: "Bool"},
{name: "Rsh16x32", argLength: 2, aux: "Bool"},
{name: "Rsh16x64", argLength: 2, aux: "Bool"},
{name: "Rsh32x8", argLength: 2, aux: "Bool"},
{name: "Rsh32x16", argLength: 2, aux: "Bool"},
{name: "Rsh32x32", argLength: 2, aux: "Bool"},
{name: "Rsh32x64", argLength: 2, aux: "Bool"},
{name: "Rsh64x8", argLength: 2, aux: "Bool"},
{name: "Rsh64x16", argLength: 2, aux: "Bool"},
{name: "Rsh64x32", argLength: 2, aux: "Bool"},
{name: "Rsh64x64", argLength: 2, aux: "Bool"},
{name: "Rsh8Ux8", argLength: 2, aux: "Bool"}, // arg0 >> arg1, unsigned
{name: "Rsh8Ux16", argLength: 2, aux: "Bool"},
{name: "Rsh8Ux32", argLength: 2, aux: "Bool"},
{name: "Rsh8Ux64", argLength: 2, aux: "Bool"},
{name: "Rsh16Ux8", argLength: 2, aux: "Bool"},
{name: "Rsh16Ux16", argLength: 2, aux: "Bool"},
{name: "Rsh16Ux32", argLength: 2, aux: "Bool"},
{name: "Rsh16Ux64", argLength: 2, aux: "Bool"},
{name: "Rsh32Ux8", argLength: 2, aux: "Bool"},
{name: "Rsh32Ux16", argLength: 2, aux: "Bool"},
{name: "Rsh32Ux32", argLength: 2, aux: "Bool"},
{name: "Rsh32Ux64", argLength: 2, aux: "Bool"},
{name: "Rsh64Ux8", argLength: 2, aux: "Bool"},
{name: "Rsh64Ux16", argLength: 2, aux: "Bool"},
{name: "Rsh64Ux32", argLength: 2, aux: "Bool"},
{name: "Rsh64Ux64", argLength: 2, aux: "Bool"},
// 2-input comparisons
{name: "Eq8", argLength: 2, commutative: true, typ: "Bool"}, // arg0 == arg1
{name: "Eq16", argLength: 2, commutative: true, typ: "Bool"},
{name: "Eq32", argLength: 2, commutative: true, typ: "Bool"},
{name: "Eq64", argLength: 2, commutative: true, typ: "Bool"},
{name: "EqPtr", argLength: 2, commutative: true, typ: "Bool"},
{name: "EqInter", argLength: 2, typ: "Bool"}, // arg0 or arg1 is nil; other cases handled by frontend
{name: "EqSlice", argLength: 2, typ: "Bool"}, // arg0 or arg1 is nil; other cases handled by frontend
cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-03-30 03:30:22 +00:00
{name: "Eq32F", argLength: 2, commutative: true, typ: "Bool"},
{name: "Eq64F", argLength: 2, commutative: true, typ: "Bool"},
{name: "Neq8", argLength: 2, commutative: true, typ: "Bool"}, // arg0 != arg1
{name: "Neq16", argLength: 2, commutative: true, typ: "Bool"},
{name: "Neq32", argLength: 2, commutative: true, typ: "Bool"},
{name: "Neq64", argLength: 2, commutative: true, typ: "Bool"},
{name: "NeqPtr", argLength: 2, commutative: true, typ: "Bool"},
{name: "NeqInter", argLength: 2, typ: "Bool"}, // arg0 or arg1 is nil; other cases handled by frontend
{name: "NeqSlice", argLength: 2, typ: "Bool"}, // arg0 or arg1 is nil; other cases handled by frontend
cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-03-30 03:30:22 +00:00
{name: "Neq32F", argLength: 2, commutative: true, typ: "Bool"},
{name: "Neq64F", argLength: 2, commutative: true, typ: "Bool"},
{name: "Less8", argLength: 2, typ: "Bool"}, // arg0 < arg1, signed
{name: "Less8U", argLength: 2, typ: "Bool"}, // arg0 < arg1, unsigned
{name: "Less16", argLength: 2, typ: "Bool"},
{name: "Less16U", argLength: 2, typ: "Bool"},
{name: "Less32", argLength: 2, typ: "Bool"},
{name: "Less32U", argLength: 2, typ: "Bool"},
{name: "Less64", argLength: 2, typ: "Bool"},
{name: "Less64U", argLength: 2, typ: "Bool"},
{name: "Less32F", argLength: 2, typ: "Bool"},
{name: "Less64F", argLength: 2, typ: "Bool"},
{name: "Leq8", argLength: 2, typ: "Bool"}, // arg0 <= arg1, signed
{name: "Leq8U", argLength: 2, typ: "Bool"}, // arg0 <= arg1, unsigned
{name: "Leq16", argLength: 2, typ: "Bool"},
{name: "Leq16U", argLength: 2, typ: "Bool"},
{name: "Leq32", argLength: 2, typ: "Bool"},
{name: "Leq32U", argLength: 2, typ: "Bool"},
{name: "Leq64", argLength: 2, typ: "Bool"},
{name: "Leq64U", argLength: 2, typ: "Bool"},
{name: "Leq32F", argLength: 2, typ: "Bool"},
{name: "Leq64F", argLength: 2, typ: "Bool"},
cmd/compile/internal/ssa: emit csel on arm64 Introduce a new SSA pass to generate CondSelect intstrutions, and add CondSelect lowering rules for arm64. In order to make the CSEL instruction easier to optimize, and to simplify the introduction of CSNEG, CSINC, and CSINV in the future, modify the CSEL instruction to accept a condition code in the aux field. Notably, this change makes the go1 Gzip benchmark more than 10% faster. Benchmarks on a Cavium ThunderX: name old time/op new time/op delta BinaryTree17-96 15.9s ± 6% 16.0s ± 4% ~ (p=0.968 n=10+9) Fannkuch11-96 7.17s ± 0% 7.00s ± 0% -2.43% (p=0.000 n=8+9) FmtFprintfEmpty-96 208ns ± 1% 207ns ± 0% ~ (p=0.152 n=10+8) FmtFprintfString-96 379ns ± 0% 375ns ± 0% -0.95% (p=0.000 n=10+9) FmtFprintfInt-96 385ns ± 0% 383ns ± 0% -0.52% (p=0.000 n=9+10) FmtFprintfIntInt-96 591ns ± 0% 586ns ± 0% -0.85% (p=0.006 n=7+9) FmtFprintfPrefixedInt-96 656ns ± 0% 667ns ± 0% +1.71% (p=0.000 n=10+10) FmtFprintfFloat-96 967ns ± 0% 984ns ± 0% +1.78% (p=0.000 n=10+10) FmtManyArgs-96 2.35µs ± 0% 2.25µs ± 0% -4.63% (p=0.000 n=9+8) GobDecode-96 31.0ms ± 0% 30.8ms ± 0% -0.36% (p=0.006 n=9+9) GobEncode-96 24.4ms ± 0% 24.5ms ± 0% +0.30% (p=0.000 n=9+9) Gzip-96 1.60s ± 0% 1.43s ± 0% -10.58% (p=0.000 n=9+10) Gunzip-96 167ms ± 0% 169ms ± 0% +0.83% (p=0.000 n=8+9) HTTPClientServer-96 311µs ± 1% 308µs ± 0% -0.75% (p=0.000 n=10+10) JSONEncode-96 65.0ms ± 0% 64.8ms ± 0% -0.25% (p=0.000 n=9+8) JSONDecode-96 262ms ± 1% 261ms ± 1% ~ (p=0.579 n=10+10) Mandelbrot200-96 18.0ms ± 0% 18.1ms ± 0% +0.17% (p=0.000 n=8+10) GoParse-96 14.0ms ± 0% 14.1ms ± 1% +0.42% (p=0.003 n=9+10) RegexpMatchEasy0_32-96 644ns ± 2% 645ns ± 2% ~ (p=0.836 n=10+10) RegexpMatchEasy0_1K-96 3.70µs ± 0% 3.49µs ± 0% -5.58% (p=0.000 n=10+10) RegexpMatchEasy1_32-96 662ns ± 2% 657ns ± 2% ~ (p=0.137 n=10+10) RegexpMatchEasy1_1K-96 4.47µs ± 0% 4.31µs ± 0% -3.48% (p=0.000 n=10+10) RegexpMatchMedium_32-96 844ns ± 2% 849ns ± 1% ~ (p=0.208 n=10+10) RegexpMatchMedium_1K-96 179µs ± 0% 182µs ± 0% +1.20% (p=0.000 n=10+10) RegexpMatchHard_32-96 10.0µs ± 0% 10.1µs ± 0% +0.48% (p=0.000 n=10+9) RegexpMatchHard_1K-96 297µs ± 0% 297µs ± 0% -0.14% (p=0.000 n=10+10) Revcomp-96 3.08s ± 0% 3.13s ± 0% +1.56% (p=0.000 n=9+9) Template-96 276ms ± 2% 275ms ± 1% ~ (p=0.393 n=10+10) TimeParse-96 1.37µs ± 0% 1.36µs ± 0% -0.53% (p=0.000 n=10+7) TimeFormat-96 1.40µs ± 0% 1.42µs ± 0% +0.97% (p=0.000 n=10+10) [Geo mean] 264µs 262µs -0.77% Change-Id: Ie54eee4b3092af53e6da3baa6d1755098f57f3a2 Reviewed-on: https://go-review.googlesource.com/55670 Run-TryBot: Philip Hofer <phofer@umich.edu> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2017-08-13 22:36:47 +00:00
// the type of a CondSelect is the same as the type of its first
// two arguments, which should be register-width scalars; the third
// argument should be a boolean
{name: "CondSelect", argLength: 3}, // arg2 ? arg0 : arg1
// boolean ops
{name: "AndB", argLength: 2, commutative: true, typ: "Bool"}, // arg0 && arg1 (not shortcircuited)
{name: "OrB", argLength: 2, commutative: true, typ: "Bool"}, // arg0 || arg1 (not shortcircuited)
{name: "EqB", argLength: 2, commutative: true, typ: "Bool"}, // arg0 == arg1
{name: "NeqB", argLength: 2, commutative: true, typ: "Bool"}, // arg0 != arg1
{name: "Not", argLength: 1, typ: "Bool"}, // !arg0, boolean
// 1-input ops
{name: "Neg8", argLength: 1}, // -arg0
{name: "Neg16", argLength: 1},
{name: "Neg32", argLength: 1},
{name: "Neg64", argLength: 1},
{name: "Neg32F", argLength: 1},
{name: "Neg64F", argLength: 1},
{name: "Com8", argLength: 1}, // ^arg0
{name: "Com16", argLength: 1},
{name: "Com32", argLength: 1},
{name: "Com64", argLength: 1},
{name: "Ctz8", argLength: 1}, // Count trailing (low order) zeroes (returns 0-8)
{name: "Ctz16", argLength: 1}, // Count trailing (low order) zeroes (returns 0-16)
{name: "Ctz32", argLength: 1}, // Count trailing (low order) zeroes (returns 0-32)
{name: "Ctz64", argLength: 1}, // Count trailing (low order) zeroes (returns 0-64)
{name: "Ctz8NonZero", argLength: 1}, // same as above, but arg[0] known to be non-zero, returns 0-7
{name: "Ctz16NonZero", argLength: 1}, // same as above, but arg[0] known to be non-zero, returns 0-15
{name: "Ctz32NonZero", argLength: 1}, // same as above, but arg[0] known to be non-zero, returns 0-31
{name: "Ctz64NonZero", argLength: 1}, // same as above, but arg[0] known to be non-zero, returns 0-63
{name: "BitLen8", argLength: 1}, // Number of bits in arg[0] (returns 0-8)
{name: "BitLen16", argLength: 1}, // Number of bits in arg[0] (returns 0-16)
{name: "BitLen32", argLength: 1}, // Number of bits in arg[0] (returns 0-32)
{name: "BitLen64", argLength: 1}, // Number of bits in arg[0] (returns 0-64)
{name: "Bswap32", argLength: 1}, // Swap bytes
{name: "Bswap64", argLength: 1}, // Swap bytes
{name: "BitRev8", argLength: 1}, // Reverse the bits in arg[0]
{name: "BitRev16", argLength: 1}, // Reverse the bits in arg[0]
{name: "BitRev32", argLength: 1}, // Reverse the bits in arg[0]
{name: "BitRev64", argLength: 1}, // Reverse the bits in arg[0]
{name: "PopCount8", argLength: 1}, // Count bits in arg[0]
{name: "PopCount16", argLength: 1}, // Count bits in arg[0]
{name: "PopCount32", argLength: 1}, // Count bits in arg[0]
{name: "PopCount64", argLength: 1}, // Count bits in arg[0]
{name: "RotateLeft8", argLength: 2}, // Rotate bits in arg[0] left by arg[1]
{name: "RotateLeft16", argLength: 2}, // Rotate bits in arg[0] left by arg[1]
{name: "RotateLeft32", argLength: 2}, // Rotate bits in arg[0] left by arg[1]
{name: "RotateLeft64", argLength: 2}, // Rotate bits in arg[0] left by arg[1]
// Square root, float64 only.
// Special cases:
// +∞ → +∞
// ±0 → ±0 (sign preserved)
// x<0 → NaN
// NaN → NaN
{name: "Sqrt", argLength: 1}, // √arg0
// Round to integer, float64 only.
// Special cases:
// ±∞ → ±∞ (sign preserved)
// ±0 → ±0 (sign preserved)
// NaN → NaN
{name: "Floor", argLength: 1}, // round arg0 toward -∞
{name: "Ceil", argLength: 1}, // round arg0 toward +∞
{name: "Trunc", argLength: 1}, // round arg0 toward 0
{name: "Round", argLength: 1}, // round arg0 to nearest, ties away from 0
{name: "RoundToEven", argLength: 1}, // round arg0 to nearest, ties to even
// Modify the sign bit
{name: "Abs", argLength: 1}, // absolute value arg0
{name: "Copysign", argLength: 2}, // copy sign from arg0 to arg1
// 3-input opcode.
// Fused-multiply-add, float64 only.
// When a*b+c is exactly zero (before rounding), then the result is +0 or -0.
// The 0's sign is determined according to the standard rules for the
// addition (-0 if both a*b and c are -0, +0 otherwise).
//
// Otherwise, when a*b+c rounds to zero, then the resulting 0's sign is
// determined by the sign of the exact result a*b+c.
// See section 6.3 in ieee754.
//
// When the multiply is an infinity times a zero, the result is NaN.
// See section 7.2 in ieee754.
math, cmd/compile: rename Fma to FMA This API was added for #25819, where it was discussed as math.FMA. The commit adding it used math.Fma, presumably for consistency with the rest of the unusual names in package math (Sincos, Acosh, Erfcinv, Float32bits, etc). I believe that using an idiomatic Go name is more important here than consistency with these other names, most of which are historical baggage from C's standard library. Early additions like Float32frombits happened before "uppercase for export" (so they were originally like "float32frombits") and they were not properly reconsidered when we uppercased the symbols to export them. That's a mistake we live with. The names of functions we have added since then, and even a few that were legacy, are more properly Go-cased, such as IsNaN, IsInf, and RoundToEven, rather than Isnan, Isinf, and Roundtoeven. And also constants like MaxFloat32. For new API, we should keep using proper Go-cased symbols instead of minimally-upper-cased-C symbols. So math.FMA, not math.Fma. This API has not yet been released, so this change does not break the compatibility promise. This CL also modifies cmd/compile, since the compiler knows the name of the function. I could have stopped at changing the string constants, but it seemed to make more sense to use a consistent casing everywhere. Change-Id: I0f6f3407f41e99bfa8239467345c33945088896e Reviewed-on: https://go-review.googlesource.com/c/go/+/205317 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-11-04 19:43:45 -05:00
{name: "FMA", argLength: 3}, // compute (a*b)+c without intermediate rounding
// Data movement. Max argument length for Phi is indefinite.
{name: "Phi", argLength: -1, zeroWidth: true}, // select an argument based on which predecessor block we came from
{name: "Copy", argLength: 1}, // output = arg0
// Convert converts between pointers and integers.
// We have a special op for this so as to not confuse GC
// (particularly stack maps). It takes a memory arg so it
// gets correctly ordered with respect to GC safepoints.
// It gets compiled to nothing, so its result must in the same
// register as its argument. regalloc knows it can use any
// allocatable integer register for OpConvert.
// arg0=ptr/int arg1=mem, output=int/ptr
{name: "Convert", argLength: 2, zeroWidth: true, resultInArg0: true},
// constants. Constant values are stored in the aux or
// auxint fields.
{name: "ConstBool", aux: "Bool"}, // auxint is 0 for false and 1 for true
{name: "ConstString", aux: "String"}, // value is aux.(string)
{name: "ConstNil", typ: "BytePtr"}, // nil pointer
{name: "Const8", aux: "Int8"}, // auxint is sign-extended 8 bits
{name: "Const16", aux: "Int16"}, // auxint is sign-extended 16 bits
{name: "Const32", aux: "Int32"}, // auxint is sign-extended 32 bits
// Note: ConstX are sign-extended even when the type of the value is unsigned.
// For instance, uint8(0xaa) is stored as auxint=0xffffffffffffffaa.
{name: "Const64", aux: "Int64"}, // value is auxint
// Note: for both Const32F and Const64F, we disallow encoding NaNs.
// Signaling NaNs are tricky because if you do anything with them, they become quiet.
// Particularly, converting a 32 bit sNaN to 64 bit and back converts it to a qNaN.
// See issue 36399 and 36400.
// Encodings of +inf, -inf, and -0 are fine.
{name: "Const32F", aux: "Float32"}, // value is math.Float64frombits(uint64(auxint)) and is exactly representable as float 32
{name: "Const64F", aux: "Float64"}, // value is math.Float64frombits(uint64(auxint))
{name: "ConstInterface"}, // nil interface
{name: "ConstSlice"}, // nil slice
// Constant-like things
{name: "InitMem", zeroWidth: true}, // memory input to the function.
{name: "Arg", aux: "SymOff", symEffect: "Read", zeroWidth: true}, // argument to the function. aux=GCNode of arg, off = offset in that arg.
// The address of a variable. arg0 is the base pointer.
// If the variable is a global, the base pointer will be SB and
// the Aux field will be a *obj.LSym.
// If the variable is a local, the base pointer will be SP and
// the Aux field will be a *gc.Node.
{name: "Addr", argLength: 1, aux: "Sym", symEffect: "Addr"}, // Address of a variable. Arg0=SB. Aux identifies the variable.
{name: "LocalAddr", argLength: 2, aux: "Sym", symEffect: "Addr"}, // Address of a variable. Arg0=SP. Arg1=mem. Aux identifies the variable.
{name: "SP", zeroWidth: true}, // stack pointer
{name: "SB", typ: "Uintptr", zeroWidth: true}, // static base pointer (a.k.a. globals pointer)
{name: "Invalid"}, // unused value
// Memory operations
{name: "Load", argLength: 2}, // Load from arg0. arg1=memory
{name: "Store", argLength: 3, typ: "Mem", aux: "Typ"}, // Store arg1 to arg0. arg2=memory, aux=type. Returns memory.
// The source and destination of Move may overlap in some cases. See e.g.
// memmove inlining in generic.rules. When inlineablememmovesize (in ../rewrite.go)
// returns true, we must do all loads before all stores, when lowering Move.
// The type of Move is used for the write barrier pass to insert write barriers
// and for alignment on some architectures.
// For pointerless types, it is possible for the type to be inaccurate.
// For type alignment and pointer information, use the type in Aux;
// for type size, use the size in AuxInt.
// The "inline runtime.memmove" rewrite rule generates Moves with inaccurate types,
// such as type byte instead of the more accurate type [8]byte.
{name: "Move", argLength: 3, typ: "Mem", aux: "TypSize"}, // arg0=destptr, arg1=srcptr, arg2=mem, auxint=size, aux=type. Returns memory.
{name: "Zero", argLength: 2, typ: "Mem", aux: "TypSize"}, // arg0=destptr, arg1=mem, auxint=size, aux=type. Returns memory.
// Memory operations with write barriers.
// Expand to runtime calls. Write barrier will be removed if write on stack.
{name: "StoreWB", argLength: 3, typ: "Mem", aux: "Typ"}, // Store arg1 to arg0. arg2=memory, aux=type. Returns memory.
{name: "MoveWB", argLength: 3, typ: "Mem", aux: "TypSize"}, // arg0=destptr, arg1=srcptr, arg2=mem, auxint=size, aux=type. Returns memory.
{name: "ZeroWB", argLength: 2, typ: "Mem", aux: "TypSize"}, // arg0=destptr, arg1=mem, auxint=size, aux=type. Returns memory.
cmd/compile: compiler support for buffered write barrier This CL implements the compiler support for calling the buffered write barrier added by the previous CL. Since the buffered write barrier is only implemented on amd64 right now, this still supports the old, eager write barrier as well. There's little overhead to supporting both and this way a few tests in test/fixedbugs that expect to have liveness maps at write barrier calls can easily opt-in to the old, eager barrier. This significantly improves the performance of the write barrier: name old time/op new time/op delta WriteBarrier-12 73.5ns ±20% 19.2ns ±27% -73.90% (p=0.000 n=19+18) It also reduces the size of binaries because the write barrier call is more compact: name old object-bytes new object-bytes delta Template 398k ± 0% 393k ± 0% -1.14% (p=0.008 n=5+5) Unicode 208k ± 0% 206k ± 0% -1.00% (p=0.008 n=5+5) GoTypes 1.18M ± 0% 1.15M ± 0% -2.00% (p=0.008 n=5+5) Compiler 4.05M ± 0% 3.88M ± 0% -4.26% (p=0.008 n=5+5) SSA 8.25M ± 0% 8.11M ± 0% -1.59% (p=0.008 n=5+5) Flate 228k ± 0% 224k ± 0% -1.83% (p=0.008 n=5+5) GoParser 295k ± 0% 284k ± 0% -3.62% (p=0.008 n=5+5) Reflect 1.00M ± 0% 0.99M ± 0% -0.70% (p=0.008 n=5+5) Tar 339k ± 0% 333k ± 0% -1.67% (p=0.008 n=5+5) XML 404k ± 0% 395k ± 0% -2.10% (p=0.008 n=5+5) [Geo mean] 704k 690k -2.00% name old exe-bytes new exe-bytes delta HelloSize 1.05M ± 0% 1.04M ± 0% -1.55% (p=0.008 n=5+5) https://perf.golang.org/search?q=upload:20171027.1 (Amusingly, this also reduces compiler allocations by 0.75%, which, combined with the better write barrier, speeds up the compiler overall by 2.10%. See the perf link.) It slightly improves the performance of most of the go1 benchmarks and improves the performance of the x/benchmarks: name old time/op new time/op delta BinaryTree17-12 2.40s ± 1% 2.47s ± 1% +2.69% (p=0.000 n=19+19) Fannkuch11-12 2.95s ± 0% 2.95s ± 0% +0.21% (p=0.000 n=20+19) FmtFprintfEmpty-12 41.8ns ± 4% 41.4ns ± 2% -1.03% (p=0.014 n=20+20) FmtFprintfString-12 68.7ns ± 2% 67.5ns ± 1% -1.75% (p=0.000 n=20+17) FmtFprintfInt-12 79.0ns ± 3% 77.1ns ± 1% -2.40% (p=0.000 n=19+17) FmtFprintfIntInt-12 127ns ± 1% 123ns ± 3% -3.42% (p=0.000 n=20+20) FmtFprintfPrefixedInt-12 152ns ± 1% 150ns ± 1% -1.02% (p=0.000 n=18+17) FmtFprintfFloat-12 211ns ± 1% 209ns ± 0% -0.99% (p=0.000 n=20+16) FmtManyArgs-12 500ns ± 0% 496ns ± 0% -0.73% (p=0.000 n=17+20) GobDecode-12 6.44ms ± 1% 6.53ms ± 0% +1.28% (p=0.000 n=20+19) GobEncode-12 5.46ms ± 0% 5.46ms ± 1% ~ (p=0.550 n=19+20) Gzip-12 220ms ± 1% 216ms ± 0% -1.75% (p=0.000 n=19+19) Gunzip-12 38.8ms ± 0% 38.6ms ± 0% -0.30% (p=0.000 n=18+19) HTTPClientServer-12 79.0µs ± 1% 78.2µs ± 1% -1.01% (p=0.000 n=20+20) JSONEncode-12 11.9ms ± 0% 11.9ms ± 0% -0.29% (p=0.000 n=20+19) JSONDecode-12 52.6ms ± 0% 52.2ms ± 0% -0.68% (p=0.000 n=19+20) Mandelbrot200-12 3.69ms ± 0% 3.68ms ± 0% -0.36% (p=0.000 n=20+20) GoParse-12 3.13ms ± 1% 3.18ms ± 1% +1.67% (p=0.000 n=19+20) RegexpMatchEasy0_32-12 73.2ns ± 1% 72.3ns ± 1% -1.19% (p=0.000 n=19+18) RegexpMatchEasy0_1K-12 241ns ± 0% 239ns ± 0% -0.83% (p=0.000 n=17+16) RegexpMatchEasy1_32-12 68.6ns ± 1% 69.0ns ± 1% +0.47% (p=0.015 n=18+16) RegexpMatchEasy1_1K-12 364ns ± 0% 361ns ± 0% -0.67% (p=0.000 n=16+17) RegexpMatchMedium_32-12 104ns ± 1% 103ns ± 1% -0.79% (p=0.001 n=20+15) RegexpMatchMedium_1K-12 33.8µs ± 3% 34.0µs ± 2% ~ (p=0.267 n=20+19) RegexpMatchHard_32-12 1.64µs ± 1% 1.62µs ± 2% -1.25% (p=0.000 n=19+18) RegexpMatchHard_1K-12 49.2µs ± 0% 48.7µs ± 1% -0.93% (p=0.000 n=19+18) Revcomp-12 391ms ± 5% 396ms ± 7% ~ (p=0.154 n=19+19) Template-12 63.1ms ± 0% 59.5ms ± 0% -5.76% (p=0.000 n=18+19) TimeParse-12 307ns ± 0% 306ns ± 0% -0.39% (p=0.000 n=19+17) TimeFormat-12 325ns ± 0% 323ns ± 0% -0.50% (p=0.000 n=19+19) [Geo mean] 47.3µs 46.9µs -0.67% https://perf.golang.org/search?q=upload:20171026.1 name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.25ms ± 1% 2.20ms ± 1% -2.31% (p=0.000 n=18+18) HTTP-12 12.6µs ± 0% 12.6µs ± 0% -0.72% (p=0.000 n=18+17) JSON-12 11.0ms ± 0% 11.0ms ± 1% -0.68% (p=0.000 n=17+19) https://perf.golang.org/search?q=upload:20171026.2 Updates #14951. Updates #22460. Change-Id: Id4c0932890a1d41020071bec73b8522b1367d3e7 Reviewed-on: https://go-review.googlesource.com/73712 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-26 12:33:04 -04:00
// WB invokes runtime.gcWriteBarrier. This is not a normal
// call: it takes arguments in registers, doesn't clobber
// general-purpose registers (the exact clobber set is
// arch-dependent), and is not a safe-point.
{name: "WB", argLength: 3, typ: "Mem", aux: "Sym", symEffect: "None"}, // arg0=destptr, arg1=srcptr, arg2=mem, aux=runtime.gcWriteBarrier
{name: "HasCPUFeature", argLength: 0, typ: "bool", aux: "Sym", symEffect: "None"}, // aux=place that this feature flag can be loaded from
// PanicBounds and PanicExtend generate a runtime panic.
// Their arguments provide index values to use in panic messages.
// Both PanicBounds and PanicExtend have an AuxInt value from the BoundsKind type (in ../op.go).
// PanicBounds' index is int sized.
// PanicExtend's index is int64 sized. (PanicExtend is only used on 32-bit archs.)
{name: "PanicBounds", argLength: 3, aux: "Int64", typ: "Mem"}, // arg0=idx, arg1=len, arg2=mem, returns memory.
{name: "PanicExtend", argLength: 4, aux: "Int64", typ: "Mem"}, // arg0=idxHi, arg1=idxLo, arg2=len, arg3=mem, returns memory.
// Function calls. Arguments to the call have already been written to the stack.
// Return values appear on the stack. The method receiver, if any, is treated
// as a phantom first argument.
// TODO(josharian): ClosureCall and InterCall should have Int32 aux
// to match StaticCall's 32 bit arg size limit.
{name: "ClosureCall", argLength: 3, aux: "Int64", call: true}, // arg0=code pointer, arg1=context ptr, arg2=memory. auxint=arg size. Returns memory.
{name: "StaticCall", argLength: 1, aux: "SymOff", call: true, symEffect: "None"}, // call function aux.(*obj.LSym), arg0=memory. auxint=arg size. Returns memory.
{name: "InterCall", argLength: 2, aux: "Int64", call: true}, // interface call. arg0=code pointer, arg1=memory, auxint=arg size. Returns memory.
// Conversions: signed extensions, zero (unsigned) extensions, truncations
{name: "SignExt8to16", argLength: 1, typ: "Int16"},
{name: "SignExt8to32", argLength: 1, typ: "Int32"},
{name: "SignExt8to64", argLength: 1, typ: "Int64"},
{name: "SignExt16to32", argLength: 1, typ: "Int32"},
{name: "SignExt16to64", argLength: 1, typ: "Int64"},
{name: "SignExt32to64", argLength: 1, typ: "Int64"},
{name: "ZeroExt8to16", argLength: 1, typ: "UInt16"},
{name: "ZeroExt8to32", argLength: 1, typ: "UInt32"},
{name: "ZeroExt8to64", argLength: 1, typ: "UInt64"},
{name: "ZeroExt16to32", argLength: 1, typ: "UInt32"},
{name: "ZeroExt16to64", argLength: 1, typ: "UInt64"},
{name: "ZeroExt32to64", argLength: 1, typ: "UInt64"},
{name: "Trunc16to8", argLength: 1},
{name: "Trunc32to8", argLength: 1},
{name: "Trunc32to16", argLength: 1},
{name: "Trunc64to8", argLength: 1},
{name: "Trunc64to16", argLength: 1},
{name: "Trunc64to32", argLength: 1},
{name: "Cvt32to32F", argLength: 1},
{name: "Cvt32to64F", argLength: 1},
{name: "Cvt64to32F", argLength: 1},
{name: "Cvt64to64F", argLength: 1},
{name: "Cvt32Fto32", argLength: 1},
{name: "Cvt32Fto64", argLength: 1},
{name: "Cvt64Fto32", argLength: 1},
{name: "Cvt64Fto64", argLength: 1},
{name: "Cvt32Fto64F", argLength: 1},
{name: "Cvt64Fto32F", argLength: 1},
{name: "CvtBoolToUint8", argLength: 1},
// Force rounding to precision of type.
{name: "Round32F", argLength: 1},
{name: "Round64F", argLength: 1},
// Automatically inserted safety checks
{name: "IsNonNil", argLength: 1, typ: "Bool"}, // arg0 != nil
{name: "IsInBounds", argLength: 2, typ: "Bool"}, // 0 <= arg0 < arg1. arg1 is guaranteed >= 0.
{name: "IsSliceInBounds", argLength: 2, typ: "Bool"}, // 0 <= arg0 <= arg1. arg1 is guaranteed >= 0.
{name: "NilCheck", argLength: 2, typ: "Void"}, // arg0=ptr, arg1=mem. Panics if arg0 is nil. Returns void.
// Pseudo-ops
{name: "GetG", argLength: 1, zeroWidth: true}, // runtime.getg() (read g pointer). arg0=mem
{name: "GetClosurePtr"}, // get closure pointer from dedicated register
{name: "GetCallerPC"}, // for getcallerpc intrinsic
{name: "GetCallerSP"}, // for getcallersp intrinsic
// Indexing operations
{name: "PtrIndex", argLength: 2}, // arg0=ptr, arg1=index. Computes ptr+sizeof(*v.type)*index, where index is extended to ptrwidth type
{name: "OffPtr", argLength: 1, aux: "Int64"}, // arg0 + auxint (arg0 and result are pointers)
// Slices
{name: "SliceMake", argLength: 3}, // arg0=ptr, arg1=len, arg2=cap
{name: "SlicePtr", argLength: 1, typ: "BytePtr"}, // ptr(arg0)
{name: "SliceLen", argLength: 1}, // len(arg0)
{name: "SliceCap", argLength: 1}, // cap(arg0)
// Complex (part/whole)
{name: "ComplexMake", argLength: 2}, // arg0=real, arg1=imag
{name: "ComplexReal", argLength: 1}, // real(arg0)
{name: "ComplexImag", argLength: 1}, // imag(arg0)
// Strings
{name: "StringMake", argLength: 2}, // arg0=ptr, arg1=len
{name: "StringPtr", argLength: 1, typ: "BytePtr"}, // ptr(arg0)
{name: "StringLen", argLength: 1, typ: "Int"}, // len(arg0)
// Interfaces
{name: "IMake", argLength: 2}, // arg0=itab, arg1=data
{name: "ITab", argLength: 1, typ: "Uintptr"}, // arg0=interface, returns itable field
{name: "IData", argLength: 1}, // arg0=interface, returns data field
// Structs
{name: "StructMake0"}, // Returns struct with 0 fields.
{name: "StructMake1", argLength: 1}, // arg0=field0. Returns struct.
{name: "StructMake2", argLength: 2}, // arg0,arg1=field0,field1. Returns struct.
{name: "StructMake3", argLength: 3}, // arg0..2=field0..2. Returns struct.
{name: "StructMake4", argLength: 4}, // arg0..3=field0..3. Returns struct.
{name: "StructSelect", argLength: 1, aux: "Int64"}, // arg0=struct, auxint=field index. Returns the auxint'th field.
// Arrays
{name: "ArrayMake0"}, // Returns array with 0 elements
{name: "ArrayMake1", argLength: 1}, // Returns array with 1 element
{name: "ArraySelect", argLength: 1, aux: "Int64"}, // arg0=array, auxint=index. Returns a[i].
// Spill&restore ops for the register allocator. These are
// semantically identical to OpCopy; they do not take/return
// stores like regular memory ops do. We can get away without memory
// args because we know there is no aliasing of spill slots on the stack.
{name: "StoreReg", argLength: 1},
{name: "LoadReg", argLength: 1},
// Used during ssa construction. Like Copy, but the arg has not been specified yet.
{name: "FwdRef", aux: "Sym", symEffect: "None"},
// Unknown value. Used for Values whose values don't matter because they are dead code.
{name: "Unknown"},
{name: "VarDef", argLength: 1, aux: "Sym", typ: "Mem", symEffect: "None", zeroWidth: true}, // aux is a *gc.Node of a variable that is about to be initialized. arg0=mem, returns mem
{name: "VarKill", argLength: 1, aux: "Sym", symEffect: "None"}, // aux is a *gc.Node of a variable that is known to be dead. arg0=mem, returns mem
// TODO: what's the difference between VarLive and KeepAlive?
{name: "VarLive", argLength: 1, aux: "Sym", symEffect: "Read", zeroWidth: true}, // aux is a *gc.Node of a variable that must be kept live. arg0=mem, returns mem
{name: "KeepAlive", argLength: 2, typ: "Mem", zeroWidth: true}, // arg[0] is a value that must be kept alive until this mark. arg[1]=mem, returns mem
// InlMark marks the start of an inlined function body. Its AuxInt field
// distinguishes which entry in the local inline tree it is marking.
{name: "InlMark", argLength: 1, aux: "Int32", typ: "Void"}, // arg[0]=mem, returns void.
// Ops for breaking 64-bit operations on 32-bit architectures
{name: "Int64Make", argLength: 2, typ: "UInt64"}, // arg0=hi, arg1=lo
{name: "Int64Hi", argLength: 1, typ: "UInt32"}, // high 32-bit of arg0
{name: "Int64Lo", argLength: 1, typ: "UInt32"}, // low 32-bit of arg0
{name: "Add32carry", argLength: 2, commutative: true, typ: "(UInt32,Flags)"}, // arg0 + arg1, returns (value, carry)
{name: "Add32withcarry", argLength: 3, commutative: true}, // arg0 + arg1 + arg2, arg2=carry (0 or 1)
{name: "Sub32carry", argLength: 2, typ: "(UInt32,Flags)"}, // arg0 - arg1, returns (value, carry)
{name: "Sub32withcarry", argLength: 3}, // arg0 - arg1 - arg2, arg2=carry (0 or 1)
{name: "Add64carry", argLength: 3, commutative: true, typ: "(UInt64,UInt64)"}, // arg0 + arg1 + arg2, arg2 must be 0 or 1. returns (value, value>>64)
{name: "Sub64borrow", argLength: 3, typ: "(UInt64,UInt64)"}, // arg0 - (arg1 + arg2), arg2 must be 0 or 1. returns (value, value>>64&1)
{name: "Signmask", argLength: 1, typ: "Int32"}, // 0 if arg0 >= 0, -1 if arg0 < 0
{name: "Zeromask", argLength: 1, typ: "UInt32"}, // 0 if arg0 == 0, 0xffffffff if arg0 != 0
{name: "Slicemask", argLength: 1}, // 0 if arg0 == 0, -1 if arg0 > 0, undef if arg0<0. Type is native int size.
{name: "SpectreIndex", argLength: 2}, // arg0 if 0 <= arg0 < arg1, 0 otherwise. Type is native int size.
{name: "SpectreSliceIndex", argLength: 2}, // arg0 if 0 <= arg0 <= arg1, 0 otherwise. Type is native int size.
{name: "Cvt32Uto32F", argLength: 1}, // uint32 -> float32, only used on 32-bit arch
{name: "Cvt32Uto64F", argLength: 1}, // uint32 -> float64, only used on 32-bit arch
{name: "Cvt32Fto32U", argLength: 1}, // float32 -> uint32, only used on 32-bit arch
{name: "Cvt64Fto32U", argLength: 1}, // float64 -> uint32, only used on 32-bit arch
{name: "Cvt64Uto32F", argLength: 1}, // uint64 -> float32, only used on archs that has the instruction
{name: "Cvt64Uto64F", argLength: 1}, // uint64 -> float64, only used on archs that has the instruction
{name: "Cvt32Fto64U", argLength: 1}, // float32 -> uint64, only used on archs that has the instruction
{name: "Cvt64Fto64U", argLength: 1}, // float64 -> uint64, only used on archs that has the instruction
// pseudo-ops for breaking Tuple
{name: "Select0", argLength: 1, zeroWidth: true}, // the first component of a tuple
{name: "Select1", argLength: 1, zeroWidth: true}, // the second component of a tuple
// Atomic operations used for semantically inlining runtime/internal/atomic.
// Atomic loads return a new memory so that the loads are properly ordered
// with respect to other loads and stores.
// TODO: use for sync/atomic at some point.
{name: "AtomicLoad8", argLength: 2, typ: "(UInt8,Mem)"}, // Load from arg0. arg1=memory. Returns loaded value and new memory.
{name: "AtomicLoad32", argLength: 2, typ: "(UInt32,Mem)"}, // Load from arg0. arg1=memory. Returns loaded value and new memory.
{name: "AtomicLoad64", argLength: 2, typ: "(UInt64,Mem)"}, // Load from arg0. arg1=memory. Returns loaded value and new memory.
{name: "AtomicLoadPtr", argLength: 2, typ: "(BytePtr,Mem)"}, // Load from arg0. arg1=memory. Returns loaded value and new memory.
{name: "AtomicLoadAcq32", argLength: 2, typ: "(UInt32,Mem)"}, // Load from arg0. arg1=memory. Lock acquisition, returns loaded value and new memory.
{name: "AtomicStore8", argLength: 3, typ: "Mem", hasSideEffects: true}, // Store arg1 to *arg0. arg2=memory. Returns memory.
{name: "AtomicStore32", argLength: 3, typ: "Mem", hasSideEffects: true}, // Store arg1 to *arg0. arg2=memory. Returns memory.
{name: "AtomicStore64", argLength: 3, typ: "Mem", hasSideEffects: true}, // Store arg1 to *arg0. arg2=memory. Returns memory.
{name: "AtomicStorePtrNoWB", argLength: 3, typ: "Mem", hasSideEffects: true}, // Store arg1 to *arg0. arg2=memory. Returns memory.
{name: "AtomicStoreRel32", argLength: 3, typ: "Mem", hasSideEffects: true}, // Store arg1 to *arg0. arg2=memory. Lock release, returns memory.
{name: "AtomicExchange32", argLength: 3, typ: "(UInt32,Mem)", hasSideEffects: true}, // Store arg1 to *arg0. arg2=memory. Returns old contents of *arg0 and new memory.
{name: "AtomicExchange64", argLength: 3, typ: "(UInt64,Mem)", hasSideEffects: true}, // Store arg1 to *arg0. arg2=memory. Returns old contents of *arg0 and new memory.
{name: "AtomicAdd32", argLength: 3, typ: "(UInt32,Mem)", hasSideEffects: true}, // Do *arg0 += arg1. arg2=memory. Returns sum and new memory.
{name: "AtomicAdd64", argLength: 3, typ: "(UInt64,Mem)", hasSideEffects: true}, // Do *arg0 += arg1. arg2=memory. Returns sum and new memory.
{name: "AtomicCompareAndSwap32", argLength: 4, typ: "(Bool,Mem)", hasSideEffects: true}, // if *arg0==arg1, then set *arg0=arg2. Returns true if store happens and new memory.
{name: "AtomicCompareAndSwap64", argLength: 4, typ: "(Bool,Mem)", hasSideEffects: true}, // if *arg0==arg1, then set *arg0=arg2. Returns true if store happens and new memory.
{name: "AtomicCompareAndSwapRel32", argLength: 4, typ: "(Bool,Mem)", hasSideEffects: true}, // if *arg0==arg1, then set *arg0=arg2. Lock release, reports whether store happens and new memory.
{name: "AtomicAnd8", argLength: 3, typ: "Mem", hasSideEffects: true}, // *arg0 &= arg1. arg2=memory. Returns memory.
{name: "AtomicOr8", argLength: 3, typ: "Mem", hasSideEffects: true}, // *arg0 |= arg1. arg2=memory. Returns memory.
// Atomic operation variants
// These variants have the same semantics as above atomic operations.
// But they are used for generating more efficient code on certain modern machines, with run-time CPU feature detection.
// Currently, they are used on ARM64 only.
{name: "AtomicAdd32Variant", argLength: 3, typ: "(UInt32,Mem)", hasSideEffects: true}, // Do *arg0 += arg1. arg2=memory. Returns sum and new memory.
{name: "AtomicAdd64Variant", argLength: 3, typ: "(UInt64,Mem)", hasSideEffects: true}, // Do *arg0 += arg1. arg2=memory. Returns sum and new memory.
// Clobber experiment op
{name: "Clobber", argLength: 0, typ: "Void", aux: "SymOff", symEffect: "None"}, // write an invalid pointer value to the given pointer slot of a stack variable
}
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
// kind controls successors implicit exit
// ----------------------------------------------------------
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
// Exit [return mem] [] yes
// Ret [return mem] [] yes
// RetJmp [return mem] [] yes
// Plain [] [next]
// If [boolean Value] [then, else]
// First [] [always, never]
var genericBlocks = []blockData{
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
{name: "Plain"}, // a single successor
{name: "If", controls: 1}, // if Controls[0] goto Succs[0] else goto Succs[1]
{name: "Defer", controls: 1}, // Succs[0]=defer queued, Succs[1]=defer recovered. Controls[0] is call op (of memory type)
{name: "Ret", controls: 1}, // no successors, Controls[0] value is memory result
{name: "RetJmp", controls: 1}, // no successors, Controls[0] value is memory result, jumps to b.Aux.(*gc.Sym)
{name: "Exit", controls: 1}, // no successors, Controls[0] value generates a panic
// transient block state used for dead code removal
{name: "First"}, // 2 successors, always takes the first one (second is dead)
}
func init() {
archs = append(archs, arch{
name: "generic",
ops: genericOps,
blocks: genericBlocks,
generic: true,
})
}