go/src/cmd/compile/internal/ssa/gen/rulegen.go

1888 lines
49 KiB
Go
Raw Normal View History

// Copyright 2015 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
//go:build gen
// +build gen
// This program generates Go code that applies rewrite rules to a Value.
// The generated code implements a function of type func (v *Value) bool
// which reports whether if did something.
// Ideas stolen from Swift: http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-2000-2.html
package main
import (
"bufio"
"bytes"
"flag"
"fmt"
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
"go/ast"
"go/format"
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
"go/parser"
"go/printer"
"go/token"
"io"
"log"
"os"
cmd/compile: stop using go/types in rulegen Using go/types to get rid of all unused variables in CL 189798 was a neat idea, but it was pretty expensive. go/types is a full typechecker, which does a lot more work than we actually need. Moreover, we had to run it multiple times, to catch variables that became unused after removing existing unused variables. Instead, write our own little detector for unused imports and variables. It doesn't use ast.Walk, as we need to know what fields we're inspecting. For example, in "foo := bar", "foo" is declared, and "bar" is used, yet they both appear as simple *ast.Ident cases under ast.Walk. The code is documented to explain how unused variables are detected in a single syntax tree pass. Since this happens after we've generated a complete go/ast.File, we don't need to worry about our own simplified node types. The generated code is the same, but rulegen is much faster and uses less memory at its peak, so it should scale better with time. With 'benchcmd Rulegen go run *.go' on perflock, we get: name old time/op new time/op delta Rulegen 4.00s ± 0% 3.41s ± 1% -14.70% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 14.1s ± 1% 10.6s ± 1% -24.62% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 318ms ±26% 263ms ± 9% ~ (p=0.056 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 231MB ± 4% 181MB ± 3% -21.69% (p=0.008 n=5+5) Change-Id: I8387d52818f6131357868ad348dac8c96d926191 Reviewed-on: https://go-review.googlesource.com/c/go/+/191782 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27 22:47:37 +02:00
"path"
"regexp"
"sort"
cmd/compile: stop using go/types in rulegen Using go/types to get rid of all unused variables in CL 189798 was a neat idea, but it was pretty expensive. go/types is a full typechecker, which does a lot more work than we actually need. Moreover, we had to run it multiple times, to catch variables that became unused after removing existing unused variables. Instead, write our own little detector for unused imports and variables. It doesn't use ast.Walk, as we need to know what fields we're inspecting. For example, in "foo := bar", "foo" is declared, and "bar" is used, yet they both appear as simple *ast.Ident cases under ast.Walk. The code is documented to explain how unused variables are detected in a single syntax tree pass. Since this happens after we've generated a complete go/ast.File, we don't need to worry about our own simplified node types. The generated code is the same, but rulegen is much faster and uses less memory at its peak, so it should scale better with time. With 'benchcmd Rulegen go run *.go' on perflock, we get: name old time/op new time/op delta Rulegen 4.00s ± 0% 3.41s ± 1% -14.70% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 14.1s ± 1% 10.6s ± 1% -24.62% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 318ms ±26% 263ms ± 9% ~ (p=0.056 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 231MB ± 4% 181MB ± 3% -21.69% (p=0.008 n=5+5) Change-Id: I8387d52818f6131357868ad348dac8c96d926191 Reviewed-on: https://go-review.googlesource.com/c/go/+/191782 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27 22:47:37 +02:00
"strconv"
"strings"
cmd/compile: teach rulegen to remove unused decls First, add cpu and memory profiling flags, as these are useful to see where rulegen is spending its time. It now takes many seconds to run on a recent laptop, so we have to keep an eye on what it's doing. Second, stop writing '_ = var' lines to keep imports and variables used at all times. Now that rulegen removes all such unused names, they're unnecessary. To perform the removal, lean on go/types to first detect what names are unused. We can configure it to give us all the type-checking errors in a file, so we can collect all "declared but not used" errors in a single pass. We then use astutil.Apply to remove the relevant nodes based on the line information from each unused error. This allows us to apply the changes without having to do extra parser+printer roundtrips to plaintext, which are far too expensive. We need to do multiple such passes, as removing an unused variable declaration might then make another declaration unused. Two passes are enough to clean every file at the moment, so add a limit of three passes for now to avoid eating cpu uncontrollably by accident. The resulting performance of the changes above is a ~30% loss across the table, since go/types is fairly expensive. The numbers were obtained with 'benchcmd Rulegen go run *.go', which involves compiling rulegen itself, but that seems reflective of how the program is used. name old time/op new time/op delta Rulegen 5.61s ± 0% 7.36s ± 0% +31.17% (p=0.016 n=5+4) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 9.92s ± 1% +37.76% (p=0.016 n=5+4) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 169ms ±17% +25.66% (p=0.032 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 85.6MB ± 2% +20.56% (p=0.008 n=5+5) We can live with a bit more resource usage, but the time/op getting close to 10s isn't good. To win that back, introduce concurrency in main.go. This further increases resource usage a bit, but the real time on this quad-core laptop is greatly reduced. The final benchstat is as follows: name old time/op new time/op delta Rulegen 5.61s ± 0% 3.97s ± 1% -29.26% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 13.91s ± 1% +93.09% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 269ms ± 9% +99.17% (p=0.008 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 226.3MB ± 1% +218.72% (p=0.008 n=5+5) It might be possible to reduce the cpu or memory usage in the future, such as configuring go/types to do less work, or taking shortcuts to avoid having to run it many times. For now, ~2x cpu and ~4x memory usage seems like a fair trade for a faster and better rulegen. Finally, we can remove the old code that tried to remove some unused variables in a hacky and unmaintainable way. Change-Id: Iff9e83e3f253babf5a1bd48cc993033b8550cee6 Reviewed-on: https://go-review.googlesource.com/c/go/+/189798 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-10 19:27:45 +02:00
"golang.org/x/tools/go/ast/astutil"
)
// rule syntax:
// sexpr [&& extra conditions] => [@block] sexpr
//
// sexpr are s-expressions (lisp-like parenthesized groupings)
cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-03-30 03:30:22 +00:00
// sexpr ::= [variable:](opcode sexpr*)
// | variable
// | <type>
// | [auxint]
// | {aux}
//
// aux ::= variable | {code}
// type ::= variable | {code}
// variable ::= some token
cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-03-30 03:30:22 +00:00
// opcode ::= one of the opcodes from the *Ops.go files
// special rules: trailing ellipsis "..." (in the outermost sexpr?) must match on both sides of a rule.
// trailing three underscore "___" in the outermost match sexpr indicate the presence of
// extra ignored args that need not appear in the replacement
// extra conditions is just a chunk of Go that evaluates to a boolean. It may use
// variables declared in the matching tsexpr. The variable "v" is predefined to be
// the value matched by the entire rule.
// If multiple rules match, the first one in file order is selected.
var (
genLog = flag.Bool("log", false, "generate code that logs; for debugging only")
addLine = flag.Bool("line", false, "add line number comment to generated rules; for debugging only")
)
type Rule struct {
Rule string
Loc string // file name & line number
}
func (r Rule) String() string {
return fmt.Sprintf("rule %q at %s", r.Rule, r.Loc)
}
func normalizeSpaces(s string) string {
return strings.Join(strings.Fields(strings.TrimSpace(s)), " ")
}
// parse returns the matching part of the rule, additional conditions, and the result.
func (r Rule) parse() (match, cond, result string) {
s := strings.Split(r.Rule, "=>")
match = normalizeSpaces(s[0])
result = normalizeSpaces(s[1])
cond = ""
if i := strings.Index(match, "&&"); i >= 0 {
cond = normalizeSpaces(match[i+2:])
match = normalizeSpaces(match[:i])
}
return match, cond, result
}
func genRules(arch arch) { genRulesSuffix(arch, "") }
func genSplitLoadRules(arch arch) { genRulesSuffix(arch, "splitload") }
func genRulesSuffix(arch arch, suff string) {
// Open input file.
text, err := os.Open(arch.name + suff + ".rules")
if err != nil {
if suff == "" {
// All architectures must have a plain rules file.
log.Fatalf("can't read rule file: %v", err)
}
// Some architectures have bonus rules files that others don't share. That's fine.
return
}
// oprules contains a list of rules for each block and opcode
blockrules := map[string][]Rule{}
oprules := map[string][]Rule{}
// read rule file
scanner := bufio.NewScanner(text)
rule := ""
var lineno int
var ruleLineno int // line number of "=>"
for scanner.Scan() {
lineno++
line := scanner.Text()
if i := strings.Index(line, "//"); i >= 0 {
// Remove comments. Note that this isn't string safe, so
// it will truncate lines with // inside strings. Oh well.
line = line[:i]
}
rule += " " + line
rule = strings.TrimSpace(rule)
if rule == "" {
continue
}
if !strings.Contains(rule, "=>") {
continue
}
if ruleLineno == 0 {
ruleLineno = lineno
}
if strings.HasSuffix(rule, "=>") {
continue // continue on the next line
}
if n := balance(rule); n > 0 {
continue // open parentheses remain, continue on the next line
} else if n < 0 {
break // continuing the line can't help, and it will only make errors worse
}
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
loc := fmt.Sprintf("%s%s.rules:%d", arch.name, suff, ruleLineno)
for _, rule2 := range expandOr(rule) {
r := Rule{Rule: rule2, Loc: loc}
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
if rawop := strings.Split(rule2, " ")[0][1:]; isBlock(rawop, arch) {
blockrules[rawop] = append(blockrules[rawop], r)
continue
cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-03-30 03:30:22 +00:00
}
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
// Do fancier value op matching.
match, _, _ := r.parse()
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
op, oparch, _, _, _, _ := parseValue(match, arch, loc)
opname := fmt.Sprintf("Op%s%s", oparch, op.name)
oprules[opname] = append(oprules[opname], r)
}
rule = ""
ruleLineno = 0
}
if err := scanner.Err(); err != nil {
log.Fatalf("scanner failed: %v\n", err)
}
if balance(rule) != 0 {
log.Fatalf("%s.rules:%d: unbalanced rule: %v\n", arch.name, lineno, rule)
}
// Order all the ops.
var ops []string
for op := range oprules {
ops = append(ops, op)
}
sort.Strings(ops)
genFile := &File{Arch: arch, Suffix: suff}
// Main rewrite routine is a switch on v.Op.
fn := &Func{Kind: "Value", ArgLen: -1}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
sw := &Switch{Expr: exprf("v.Op")}
for _, op := range ops {
eop, ok := parseEllipsisRules(oprules[op], arch)
if ok {
if strings.Contains(oprules[op][0].Rule, "=>") && opByName(arch, op).aux != opByName(arch, eop).aux {
panic(fmt.Sprintf("can't use ... for ops that have different aux types: %s and %s", op, eop))
}
swc := &Case{Expr: exprf("%s", op)}
swc.add(stmtf("v.Op = %s", eop))
swc.add(stmtf("return true"))
sw.add(swc)
continue
}
swc := &Case{Expr: exprf("%s", op)}
cmd/compile: remove chunking of rewrite rules We added chunking of rewrite rules to speed up compiling package SSA. This series of changes has significantly shrunk the number of rewrite rules, and they are no longer being added nearly as fast. Now that we are sharing v.Args across multiple rewrite rules, there is additional benefit to having more rules in a single function. Removing chunking now has an incidental impact on compiling package SSA, marginally speeds up other compilation, shrinks the cmd/compile binary, and simplifies the code. name old time/op new time/op delta Template 211ms ± 2% 210ms ± 2% -0.50% (p=0.000 n=91+97) Unicode 81.9ms ± 3% 81.8ms ± 3% ~ (p=0.179 n=96+91) GoTypes 731ms ± 2% 731ms ± 1% ~ (p=0.442 n=94+96) Compiler 3.43s ± 2% 3.41s ± 2% -0.36% (p=0.001 n=98+94) SSA 8.30s ± 2% 8.32s ± 2% +0.19% (p=0.034 n=94+95) Flate 135ms ± 2% 134ms ± 1% -0.30% (p=0.006 n=98+94) GoParser 167ms ± 1% 167ms ± 1% -0.22% (p=0.001 n=92+94) Reflect 453ms ± 2% 453ms ± 3% ~ (p=0.306 n=98+97) Tar 184ms ± 2% 183ms ± 2% -0.31% (p=0.012 n=94+94) XML 249ms ± 2% 248ms ± 1% -0.26% (p=0.002 n=96+92) [Geo mean] 419ms 418ms -0.21% name old user-time/op new user-time/op delta Template 273ms ± 2% 272ms ± 2% -0.46% (p=0.000 n=93+96) Unicode 116ms ± 4% 117ms ± 4% ~ (p=0.433 n=98+98) GoTypes 977ms ± 2% 977ms ± 1% ~ (p=0.971 n=92+99) Compiler 4.56s ± 6% 4.53s ± 6% ~ (p=0.081 n=100+100) SSA 11.1s ± 2% 11.1s ± 2% ~ (p=0.064 n=99+96) Flate 167ms ± 2% 167ms ± 1% -0.24% (p=0.004 n=95+96) GoParser 203ms ± 1% 203ms ± 2% -0.14% (p=0.049 n=96+97) Reflect 595ms ± 2% 595ms ± 2% ~ (p=0.544 n=95+92) Tar 225ms ± 2% 224ms ± 2% ~ (p=0.562 n=99+99) XML 312ms ± 2% 311ms ± 1% ~ (p=0.050 n=97+93) [Geo mean] 543ms 542ms -0.13% Change-Id: I8d34ab59f154b28f20c6f9e416b976bfce339baa Reviewed-on: https://go-review.googlesource.com/c/go/+/216220 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-21 20:53:30 -08:00
swc.add(stmtf("return rewriteValue%s%s_%s(v)", arch.name, suff, op))
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
sw.add(swc)
}
if len(sw.List) > 0 { // skip if empty
fn.add(sw)
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
fn.add(stmtf("return false"))
cmd/compile: teach rulegen to remove unused decls First, add cpu and memory profiling flags, as these are useful to see where rulegen is spending its time. It now takes many seconds to run on a recent laptop, so we have to keep an eye on what it's doing. Second, stop writing '_ = var' lines to keep imports and variables used at all times. Now that rulegen removes all such unused names, they're unnecessary. To perform the removal, lean on go/types to first detect what names are unused. We can configure it to give us all the type-checking errors in a file, so we can collect all "declared but not used" errors in a single pass. We then use astutil.Apply to remove the relevant nodes based on the line information from each unused error. This allows us to apply the changes without having to do extra parser+printer roundtrips to plaintext, which are far too expensive. We need to do multiple such passes, as removing an unused variable declaration might then make another declaration unused. Two passes are enough to clean every file at the moment, so add a limit of three passes for now to avoid eating cpu uncontrollably by accident. The resulting performance of the changes above is a ~30% loss across the table, since go/types is fairly expensive. The numbers were obtained with 'benchcmd Rulegen go run *.go', which involves compiling rulegen itself, but that seems reflective of how the program is used. name old time/op new time/op delta Rulegen 5.61s ± 0% 7.36s ± 0% +31.17% (p=0.016 n=5+4) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 9.92s ± 1% +37.76% (p=0.016 n=5+4) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 169ms ±17% +25.66% (p=0.032 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 85.6MB ± 2% +20.56% (p=0.008 n=5+5) We can live with a bit more resource usage, but the time/op getting close to 10s isn't good. To win that back, introduce concurrency in main.go. This further increases resource usage a bit, but the real time on this quad-core laptop is greatly reduced. The final benchstat is as follows: name old time/op new time/op delta Rulegen 5.61s ± 0% 3.97s ± 1% -29.26% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 13.91s ± 1% +93.09% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 269ms ± 9% +99.17% (p=0.008 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 226.3MB ± 1% +218.72% (p=0.008 n=5+5) It might be possible to reduce the cpu or memory usage in the future, such as configuring go/types to do less work, or taking shortcuts to avoid having to run it many times. For now, ~2x cpu and ~4x memory usage seems like a fair trade for a faster and better rulegen. Finally, we can remove the old code that tried to remove some unused variables in a hacky and unmaintainable way. Change-Id: Iff9e83e3f253babf5a1bd48cc993033b8550cee6 Reviewed-on: https://go-review.googlesource.com/c/go/+/189798 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-10 19:27:45 +02:00
genFile.add(fn)
// Generate a routine per op. Note that we don't make one giant routine
// because it is too big for some compilers.
for _, op := range ops {
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
rules := oprules[op]
_, ok := parseEllipsisRules(oprules[op], arch)
if ok {
continue
}
cmd/compile: remove chunking of rewrite rules We added chunking of rewrite rules to speed up compiling package SSA. This series of changes has significantly shrunk the number of rewrite rules, and they are no longer being added nearly as fast. Now that we are sharing v.Args across multiple rewrite rules, there is additional benefit to having more rules in a single function. Removing chunking now has an incidental impact on compiling package SSA, marginally speeds up other compilation, shrinks the cmd/compile binary, and simplifies the code. name old time/op new time/op delta Template 211ms ± 2% 210ms ± 2% -0.50% (p=0.000 n=91+97) Unicode 81.9ms ± 3% 81.8ms ± 3% ~ (p=0.179 n=96+91) GoTypes 731ms ± 2% 731ms ± 1% ~ (p=0.442 n=94+96) Compiler 3.43s ± 2% 3.41s ± 2% -0.36% (p=0.001 n=98+94) SSA 8.30s ± 2% 8.32s ± 2% +0.19% (p=0.034 n=94+95) Flate 135ms ± 2% 134ms ± 1% -0.30% (p=0.006 n=98+94) GoParser 167ms ± 1% 167ms ± 1% -0.22% (p=0.001 n=92+94) Reflect 453ms ± 2% 453ms ± 3% ~ (p=0.306 n=98+97) Tar 184ms ± 2% 183ms ± 2% -0.31% (p=0.012 n=94+94) XML 249ms ± 2% 248ms ± 1% -0.26% (p=0.002 n=96+92) [Geo mean] 419ms 418ms -0.21% name old user-time/op new user-time/op delta Template 273ms ± 2% 272ms ± 2% -0.46% (p=0.000 n=93+96) Unicode 116ms ± 4% 117ms ± 4% ~ (p=0.433 n=98+98) GoTypes 977ms ± 2% 977ms ± 1% ~ (p=0.971 n=92+99) Compiler 4.56s ± 6% 4.53s ± 6% ~ (p=0.081 n=100+100) SSA 11.1s ± 2% 11.1s ± 2% ~ (p=0.064 n=99+96) Flate 167ms ± 2% 167ms ± 1% -0.24% (p=0.004 n=95+96) GoParser 203ms ± 1% 203ms ± 2% -0.14% (p=0.049 n=96+97) Reflect 595ms ± 2% 595ms ± 2% ~ (p=0.544 n=95+92) Tar 225ms ± 2% 224ms ± 2% ~ (p=0.562 n=99+99) XML 312ms ± 2% 311ms ± 1% ~ (p=0.050 n=97+93) [Geo mean] 543ms 542ms -0.13% Change-Id: I8d34ab59f154b28f20c6f9e416b976bfce339baa Reviewed-on: https://go-review.googlesource.com/c/go/+/216220 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-21 20:53:30 -08:00
// rr is kept between iterations, so that each rule can check
// that the previous rule wasn't unconditional.
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
var rr *RuleRewrite
cmd/compile: remove chunking of rewrite rules We added chunking of rewrite rules to speed up compiling package SSA. This series of changes has significantly shrunk the number of rewrite rules, and they are no longer being added nearly as fast. Now that we are sharing v.Args across multiple rewrite rules, there is additional benefit to having more rules in a single function. Removing chunking now has an incidental impact on compiling package SSA, marginally speeds up other compilation, shrinks the cmd/compile binary, and simplifies the code. name old time/op new time/op delta Template 211ms ± 2% 210ms ± 2% -0.50% (p=0.000 n=91+97) Unicode 81.9ms ± 3% 81.8ms ± 3% ~ (p=0.179 n=96+91) GoTypes 731ms ± 2% 731ms ± 1% ~ (p=0.442 n=94+96) Compiler 3.43s ± 2% 3.41s ± 2% -0.36% (p=0.001 n=98+94) SSA 8.30s ± 2% 8.32s ± 2% +0.19% (p=0.034 n=94+95) Flate 135ms ± 2% 134ms ± 1% -0.30% (p=0.006 n=98+94) GoParser 167ms ± 1% 167ms ± 1% -0.22% (p=0.001 n=92+94) Reflect 453ms ± 2% 453ms ± 3% ~ (p=0.306 n=98+97) Tar 184ms ± 2% 183ms ± 2% -0.31% (p=0.012 n=94+94) XML 249ms ± 2% 248ms ± 1% -0.26% (p=0.002 n=96+92) [Geo mean] 419ms 418ms -0.21% name old user-time/op new user-time/op delta Template 273ms ± 2% 272ms ± 2% -0.46% (p=0.000 n=93+96) Unicode 116ms ± 4% 117ms ± 4% ~ (p=0.433 n=98+98) GoTypes 977ms ± 2% 977ms ± 1% ~ (p=0.971 n=92+99) Compiler 4.56s ± 6% 4.53s ± 6% ~ (p=0.081 n=100+100) SSA 11.1s ± 2% 11.1s ± 2% ~ (p=0.064 n=99+96) Flate 167ms ± 2% 167ms ± 1% -0.24% (p=0.004 n=95+96) GoParser 203ms ± 1% 203ms ± 2% -0.14% (p=0.049 n=96+97) Reflect 595ms ± 2% 595ms ± 2% ~ (p=0.544 n=95+92) Tar 225ms ± 2% 224ms ± 2% ~ (p=0.562 n=99+99) XML 312ms ± 2% 311ms ± 1% ~ (p=0.050 n=97+93) [Geo mean] 543ms 542ms -0.13% Change-Id: I8d34ab59f154b28f20c6f9e416b976bfce339baa Reviewed-on: https://go-review.googlesource.com/c/go/+/216220 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-21 20:53:30 -08:00
fn := &Func{
Kind: "Value",
Suffix: fmt.Sprintf("_%s", op),
ArgLen: opByName(arch, op).argLength,
cmd/compile: remove chunking of rewrite rules We added chunking of rewrite rules to speed up compiling package SSA. This series of changes has significantly shrunk the number of rewrite rules, and they are no longer being added nearly as fast. Now that we are sharing v.Args across multiple rewrite rules, there is additional benefit to having more rules in a single function. Removing chunking now has an incidental impact on compiling package SSA, marginally speeds up other compilation, shrinks the cmd/compile binary, and simplifies the code. name old time/op new time/op delta Template 211ms ± 2% 210ms ± 2% -0.50% (p=0.000 n=91+97) Unicode 81.9ms ± 3% 81.8ms ± 3% ~ (p=0.179 n=96+91) GoTypes 731ms ± 2% 731ms ± 1% ~ (p=0.442 n=94+96) Compiler 3.43s ± 2% 3.41s ± 2% -0.36% (p=0.001 n=98+94) SSA 8.30s ± 2% 8.32s ± 2% +0.19% (p=0.034 n=94+95) Flate 135ms ± 2% 134ms ± 1% -0.30% (p=0.006 n=98+94) GoParser 167ms ± 1% 167ms ± 1% -0.22% (p=0.001 n=92+94) Reflect 453ms ± 2% 453ms ± 3% ~ (p=0.306 n=98+97) Tar 184ms ± 2% 183ms ± 2% -0.31% (p=0.012 n=94+94) XML 249ms ± 2% 248ms ± 1% -0.26% (p=0.002 n=96+92) [Geo mean] 419ms 418ms -0.21% name old user-time/op new user-time/op delta Template 273ms ± 2% 272ms ± 2% -0.46% (p=0.000 n=93+96) Unicode 116ms ± 4% 117ms ± 4% ~ (p=0.433 n=98+98) GoTypes 977ms ± 2% 977ms ± 1% ~ (p=0.971 n=92+99) Compiler 4.56s ± 6% 4.53s ± 6% ~ (p=0.081 n=100+100) SSA 11.1s ± 2% 11.1s ± 2% ~ (p=0.064 n=99+96) Flate 167ms ± 2% 167ms ± 1% -0.24% (p=0.004 n=95+96) GoParser 203ms ± 1% 203ms ± 2% -0.14% (p=0.049 n=96+97) Reflect 595ms ± 2% 595ms ± 2% ~ (p=0.544 n=95+92) Tar 225ms ± 2% 224ms ± 2% ~ (p=0.562 n=99+99) XML 312ms ± 2% 311ms ± 1% ~ (p=0.050 n=97+93) [Geo mean] 543ms 542ms -0.13% Change-Id: I8d34ab59f154b28f20c6f9e416b976bfce339baa Reviewed-on: https://go-review.googlesource.com/c/go/+/216220 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-21 20:53:30 -08:00
}
cmd/compile: disallow rewrite rules from declaring reserved names If I change a rule in ARM64.rules to use the variable name "b" in a conflicting way, rulegen would previously not complain, and the compiler would later give a confusing error: $ go run *.go && go build cmd/compile/internal/ssa # cmd/compile/internal/ssa ../rewriteARM64.go:24236:10: b.NewValue0 undefined (type int64 has no field or method NewValue0) Make rulegen complain early about those cases. Sometimes they might happen to be harmless, but in general they can easily cause confusion or unintended effect due to shadowing. After the change, with the same conflicting rule: $ go run *.go && go build cmd/compile/internal/ssa 2021/03/22 11:31:49 rule ARM64.rules:495 uses the reserved name b exit status 1 Note that 24 existing rules were using reserved names. It seems like the shadowing was harmless, as it wasn't causing typechecking issues nor did it seem to cause unintended behavior when the rule rewrite code ran. The bool values "b" were renamed "t", since that seems to have a precedent in other rules and in the fmt package. Sequential values like "a b c" were renamed to "x y z", since "b" is reserved. Finally, "typ" was renamed to "_typ", since there doesn't seem to be an obviously better answer. Passes all three of: $ GOARCH=amd64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=arm64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=mips64 go build -toolexec 'toolstash -cmp' -a std Fixes #45154. Change-Id: I1cce194dc7b477886a9c218c17973e996bcedccf Reviewed-on: https://go-review.googlesource.com/c/go/+/303549 Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2021-03-22 11:35:02 +01:00
fn.add(declReserved("b", "v.Block"))
fn.add(declReserved("config", "b.Func.Config"))
fn.add(declReserved("fe", "b.Func.fe"))
fn.add(declReserved("typ", "&b.Func.Config.Types"))
cmd/compile: remove chunking of rewrite rules We added chunking of rewrite rules to speed up compiling package SSA. This series of changes has significantly shrunk the number of rewrite rules, and they are no longer being added nearly as fast. Now that we are sharing v.Args across multiple rewrite rules, there is additional benefit to having more rules in a single function. Removing chunking now has an incidental impact on compiling package SSA, marginally speeds up other compilation, shrinks the cmd/compile binary, and simplifies the code. name old time/op new time/op delta Template 211ms ± 2% 210ms ± 2% -0.50% (p=0.000 n=91+97) Unicode 81.9ms ± 3% 81.8ms ± 3% ~ (p=0.179 n=96+91) GoTypes 731ms ± 2% 731ms ± 1% ~ (p=0.442 n=94+96) Compiler 3.43s ± 2% 3.41s ± 2% -0.36% (p=0.001 n=98+94) SSA 8.30s ± 2% 8.32s ± 2% +0.19% (p=0.034 n=94+95) Flate 135ms ± 2% 134ms ± 1% -0.30% (p=0.006 n=98+94) GoParser 167ms ± 1% 167ms ± 1% -0.22% (p=0.001 n=92+94) Reflect 453ms ± 2% 453ms ± 3% ~ (p=0.306 n=98+97) Tar 184ms ± 2% 183ms ± 2% -0.31% (p=0.012 n=94+94) XML 249ms ± 2% 248ms ± 1% -0.26% (p=0.002 n=96+92) [Geo mean] 419ms 418ms -0.21% name old user-time/op new user-time/op delta Template 273ms ± 2% 272ms ± 2% -0.46% (p=0.000 n=93+96) Unicode 116ms ± 4% 117ms ± 4% ~ (p=0.433 n=98+98) GoTypes 977ms ± 2% 977ms ± 1% ~ (p=0.971 n=92+99) Compiler 4.56s ± 6% 4.53s ± 6% ~ (p=0.081 n=100+100) SSA 11.1s ± 2% 11.1s ± 2% ~ (p=0.064 n=99+96) Flate 167ms ± 2% 167ms ± 1% -0.24% (p=0.004 n=95+96) GoParser 203ms ± 1% 203ms ± 2% -0.14% (p=0.049 n=96+97) Reflect 595ms ± 2% 595ms ± 2% ~ (p=0.544 n=95+92) Tar 225ms ± 2% 224ms ± 2% ~ (p=0.562 n=99+99) XML 312ms ± 2% 311ms ± 1% ~ (p=0.050 n=97+93) [Geo mean] 543ms 542ms -0.13% Change-Id: I8d34ab59f154b28f20c6f9e416b976bfce339baa Reviewed-on: https://go-review.googlesource.com/c/go/+/216220 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-21 20:53:30 -08:00
for _, rule := range rules {
if rr != nil && !rr.CanFail {
log.Fatalf("unconditional rule %s is followed by other rules", rr.Match)
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
rr = &RuleRewrite{Loc: rule.Loc}
rr.Match, rr.Cond, rr.Result = rule.parse()
pos, _ := genMatch(rr, arch, rr.Match, fn.ArgLen >= 0)
cmd/compile: remove chunking of rewrite rules We added chunking of rewrite rules to speed up compiling package SSA. This series of changes has significantly shrunk the number of rewrite rules, and they are no longer being added nearly as fast. Now that we are sharing v.Args across multiple rewrite rules, there is additional benefit to having more rules in a single function. Removing chunking now has an incidental impact on compiling package SSA, marginally speeds up other compilation, shrinks the cmd/compile binary, and simplifies the code. name old time/op new time/op delta Template 211ms ± 2% 210ms ± 2% -0.50% (p=0.000 n=91+97) Unicode 81.9ms ± 3% 81.8ms ± 3% ~ (p=0.179 n=96+91) GoTypes 731ms ± 2% 731ms ± 1% ~ (p=0.442 n=94+96) Compiler 3.43s ± 2% 3.41s ± 2% -0.36% (p=0.001 n=98+94) SSA 8.30s ± 2% 8.32s ± 2% +0.19% (p=0.034 n=94+95) Flate 135ms ± 2% 134ms ± 1% -0.30% (p=0.006 n=98+94) GoParser 167ms ± 1% 167ms ± 1% -0.22% (p=0.001 n=92+94) Reflect 453ms ± 2% 453ms ± 3% ~ (p=0.306 n=98+97) Tar 184ms ± 2% 183ms ± 2% -0.31% (p=0.012 n=94+94) XML 249ms ± 2% 248ms ± 1% -0.26% (p=0.002 n=96+92) [Geo mean] 419ms 418ms -0.21% name old user-time/op new user-time/op delta Template 273ms ± 2% 272ms ± 2% -0.46% (p=0.000 n=93+96) Unicode 116ms ± 4% 117ms ± 4% ~ (p=0.433 n=98+98) GoTypes 977ms ± 2% 977ms ± 1% ~ (p=0.971 n=92+99) Compiler 4.56s ± 6% 4.53s ± 6% ~ (p=0.081 n=100+100) SSA 11.1s ± 2% 11.1s ± 2% ~ (p=0.064 n=99+96) Flate 167ms ± 2% 167ms ± 1% -0.24% (p=0.004 n=95+96) GoParser 203ms ± 1% 203ms ± 2% -0.14% (p=0.049 n=96+97) Reflect 595ms ± 2% 595ms ± 2% ~ (p=0.544 n=95+92) Tar 225ms ± 2% 224ms ± 2% ~ (p=0.562 n=99+99) XML 312ms ± 2% 311ms ± 1% ~ (p=0.050 n=97+93) [Geo mean] 543ms 542ms -0.13% Change-Id: I8d34ab59f154b28f20c6f9e416b976bfce339baa Reviewed-on: https://go-review.googlesource.com/c/go/+/216220 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-21 20:53:30 -08:00
if pos == "" {
pos = "v.Pos"
}
if rr.Cond != "" {
rr.add(breakf("!(%s)", rr.Cond))
}
genResult(rr, arch, rr.Result, pos)
cmd/compile: remove chunking of rewrite rules We added chunking of rewrite rules to speed up compiling package SSA. This series of changes has significantly shrunk the number of rewrite rules, and they are no longer being added nearly as fast. Now that we are sharing v.Args across multiple rewrite rules, there is additional benefit to having more rules in a single function. Removing chunking now has an incidental impact on compiling package SSA, marginally speeds up other compilation, shrinks the cmd/compile binary, and simplifies the code. name old time/op new time/op delta Template 211ms ± 2% 210ms ± 2% -0.50% (p=0.000 n=91+97) Unicode 81.9ms ± 3% 81.8ms ± 3% ~ (p=0.179 n=96+91) GoTypes 731ms ± 2% 731ms ± 1% ~ (p=0.442 n=94+96) Compiler 3.43s ± 2% 3.41s ± 2% -0.36% (p=0.001 n=98+94) SSA 8.30s ± 2% 8.32s ± 2% +0.19% (p=0.034 n=94+95) Flate 135ms ± 2% 134ms ± 1% -0.30% (p=0.006 n=98+94) GoParser 167ms ± 1% 167ms ± 1% -0.22% (p=0.001 n=92+94) Reflect 453ms ± 2% 453ms ± 3% ~ (p=0.306 n=98+97) Tar 184ms ± 2% 183ms ± 2% -0.31% (p=0.012 n=94+94) XML 249ms ± 2% 248ms ± 1% -0.26% (p=0.002 n=96+92) [Geo mean] 419ms 418ms -0.21% name old user-time/op new user-time/op delta Template 273ms ± 2% 272ms ± 2% -0.46% (p=0.000 n=93+96) Unicode 116ms ± 4% 117ms ± 4% ~ (p=0.433 n=98+98) GoTypes 977ms ± 2% 977ms ± 1% ~ (p=0.971 n=92+99) Compiler 4.56s ± 6% 4.53s ± 6% ~ (p=0.081 n=100+100) SSA 11.1s ± 2% 11.1s ± 2% ~ (p=0.064 n=99+96) Flate 167ms ± 2% 167ms ± 1% -0.24% (p=0.004 n=95+96) GoParser 203ms ± 1% 203ms ± 2% -0.14% (p=0.049 n=96+97) Reflect 595ms ± 2% 595ms ± 2% ~ (p=0.544 n=95+92) Tar 225ms ± 2% 224ms ± 2% ~ (p=0.562 n=99+99) XML 312ms ± 2% 311ms ± 1% ~ (p=0.050 n=97+93) [Geo mean] 543ms 542ms -0.13% Change-Id: I8d34ab59f154b28f20c6f9e416b976bfce339baa Reviewed-on: https://go-review.googlesource.com/c/go/+/216220 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-21 20:53:30 -08:00
if *genLog {
rr.add(stmtf("logRule(%q)", rule.Loc))
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
cmd/compile: remove chunking of rewrite rules We added chunking of rewrite rules to speed up compiling package SSA. This series of changes has significantly shrunk the number of rewrite rules, and they are no longer being added nearly as fast. Now that we are sharing v.Args across multiple rewrite rules, there is additional benefit to having more rules in a single function. Removing chunking now has an incidental impact on compiling package SSA, marginally speeds up other compilation, shrinks the cmd/compile binary, and simplifies the code. name old time/op new time/op delta Template 211ms ± 2% 210ms ± 2% -0.50% (p=0.000 n=91+97) Unicode 81.9ms ± 3% 81.8ms ± 3% ~ (p=0.179 n=96+91) GoTypes 731ms ± 2% 731ms ± 1% ~ (p=0.442 n=94+96) Compiler 3.43s ± 2% 3.41s ± 2% -0.36% (p=0.001 n=98+94) SSA 8.30s ± 2% 8.32s ± 2% +0.19% (p=0.034 n=94+95) Flate 135ms ± 2% 134ms ± 1% -0.30% (p=0.006 n=98+94) GoParser 167ms ± 1% 167ms ± 1% -0.22% (p=0.001 n=92+94) Reflect 453ms ± 2% 453ms ± 3% ~ (p=0.306 n=98+97) Tar 184ms ± 2% 183ms ± 2% -0.31% (p=0.012 n=94+94) XML 249ms ± 2% 248ms ± 1% -0.26% (p=0.002 n=96+92) [Geo mean] 419ms 418ms -0.21% name old user-time/op new user-time/op delta Template 273ms ± 2% 272ms ± 2% -0.46% (p=0.000 n=93+96) Unicode 116ms ± 4% 117ms ± 4% ~ (p=0.433 n=98+98) GoTypes 977ms ± 2% 977ms ± 1% ~ (p=0.971 n=92+99) Compiler 4.56s ± 6% 4.53s ± 6% ~ (p=0.081 n=100+100) SSA 11.1s ± 2% 11.1s ± 2% ~ (p=0.064 n=99+96) Flate 167ms ± 2% 167ms ± 1% -0.24% (p=0.004 n=95+96) GoParser 203ms ± 1% 203ms ± 2% -0.14% (p=0.049 n=96+97) Reflect 595ms ± 2% 595ms ± 2% ~ (p=0.544 n=95+92) Tar 225ms ± 2% 224ms ± 2% ~ (p=0.562 n=99+99) XML 312ms ± 2% 311ms ± 1% ~ (p=0.050 n=97+93) [Geo mean] 543ms 542ms -0.13% Change-Id: I8d34ab59f154b28f20c6f9e416b976bfce339baa Reviewed-on: https://go-review.googlesource.com/c/go/+/216220 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-21 20:53:30 -08:00
fn.add(rr)
}
if rr.CanFail {
cmd/compile: remove chunking of rewrite rules We added chunking of rewrite rules to speed up compiling package SSA. This series of changes has significantly shrunk the number of rewrite rules, and they are no longer being added nearly as fast. Now that we are sharing v.Args across multiple rewrite rules, there is additional benefit to having more rules in a single function. Removing chunking now has an incidental impact on compiling package SSA, marginally speeds up other compilation, shrinks the cmd/compile binary, and simplifies the code. name old time/op new time/op delta Template 211ms ± 2% 210ms ± 2% -0.50% (p=0.000 n=91+97) Unicode 81.9ms ± 3% 81.8ms ± 3% ~ (p=0.179 n=96+91) GoTypes 731ms ± 2% 731ms ± 1% ~ (p=0.442 n=94+96) Compiler 3.43s ± 2% 3.41s ± 2% -0.36% (p=0.001 n=98+94) SSA 8.30s ± 2% 8.32s ± 2% +0.19% (p=0.034 n=94+95) Flate 135ms ± 2% 134ms ± 1% -0.30% (p=0.006 n=98+94) GoParser 167ms ± 1% 167ms ± 1% -0.22% (p=0.001 n=92+94) Reflect 453ms ± 2% 453ms ± 3% ~ (p=0.306 n=98+97) Tar 184ms ± 2% 183ms ± 2% -0.31% (p=0.012 n=94+94) XML 249ms ± 2% 248ms ± 1% -0.26% (p=0.002 n=96+92) [Geo mean] 419ms 418ms -0.21% name old user-time/op new user-time/op delta Template 273ms ± 2% 272ms ± 2% -0.46% (p=0.000 n=93+96) Unicode 116ms ± 4% 117ms ± 4% ~ (p=0.433 n=98+98) GoTypes 977ms ± 2% 977ms ± 1% ~ (p=0.971 n=92+99) Compiler 4.56s ± 6% 4.53s ± 6% ~ (p=0.081 n=100+100) SSA 11.1s ± 2% 11.1s ± 2% ~ (p=0.064 n=99+96) Flate 167ms ± 2% 167ms ± 1% -0.24% (p=0.004 n=95+96) GoParser 203ms ± 1% 203ms ± 2% -0.14% (p=0.049 n=96+97) Reflect 595ms ± 2% 595ms ± 2% ~ (p=0.544 n=95+92) Tar 225ms ± 2% 224ms ± 2% ~ (p=0.562 n=99+99) XML 312ms ± 2% 311ms ± 1% ~ (p=0.050 n=97+93) [Geo mean] 543ms 542ms -0.13% Change-Id: I8d34ab59f154b28f20c6f9e416b976bfce339baa Reviewed-on: https://go-review.googlesource.com/c/go/+/216220 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-21 20:53:30 -08:00
fn.add(stmtf("return false"))
}
cmd/compile: remove chunking of rewrite rules We added chunking of rewrite rules to speed up compiling package SSA. This series of changes has significantly shrunk the number of rewrite rules, and they are no longer being added nearly as fast. Now that we are sharing v.Args across multiple rewrite rules, there is additional benefit to having more rules in a single function. Removing chunking now has an incidental impact on compiling package SSA, marginally speeds up other compilation, shrinks the cmd/compile binary, and simplifies the code. name old time/op new time/op delta Template 211ms ± 2% 210ms ± 2% -0.50% (p=0.000 n=91+97) Unicode 81.9ms ± 3% 81.8ms ± 3% ~ (p=0.179 n=96+91) GoTypes 731ms ± 2% 731ms ± 1% ~ (p=0.442 n=94+96) Compiler 3.43s ± 2% 3.41s ± 2% -0.36% (p=0.001 n=98+94) SSA 8.30s ± 2% 8.32s ± 2% +0.19% (p=0.034 n=94+95) Flate 135ms ± 2% 134ms ± 1% -0.30% (p=0.006 n=98+94) GoParser 167ms ± 1% 167ms ± 1% -0.22% (p=0.001 n=92+94) Reflect 453ms ± 2% 453ms ± 3% ~ (p=0.306 n=98+97) Tar 184ms ± 2% 183ms ± 2% -0.31% (p=0.012 n=94+94) XML 249ms ± 2% 248ms ± 1% -0.26% (p=0.002 n=96+92) [Geo mean] 419ms 418ms -0.21% name old user-time/op new user-time/op delta Template 273ms ± 2% 272ms ± 2% -0.46% (p=0.000 n=93+96) Unicode 116ms ± 4% 117ms ± 4% ~ (p=0.433 n=98+98) GoTypes 977ms ± 2% 977ms ± 1% ~ (p=0.971 n=92+99) Compiler 4.56s ± 6% 4.53s ± 6% ~ (p=0.081 n=100+100) SSA 11.1s ± 2% 11.1s ± 2% ~ (p=0.064 n=99+96) Flate 167ms ± 2% 167ms ± 1% -0.24% (p=0.004 n=95+96) GoParser 203ms ± 1% 203ms ± 2% -0.14% (p=0.049 n=96+97) Reflect 595ms ± 2% 595ms ± 2% ~ (p=0.544 n=95+92) Tar 225ms ± 2% 224ms ± 2% ~ (p=0.562 n=99+99) XML 312ms ± 2% 311ms ± 1% ~ (p=0.050 n=97+93) [Geo mean] 543ms 542ms -0.13% Change-Id: I8d34ab59f154b28f20c6f9e416b976bfce339baa Reviewed-on: https://go-review.googlesource.com/c/go/+/216220 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-21 20:53:30 -08:00
genFile.add(fn)
}
// Generate block rewrite function. There are only a few block types
// so we can make this one function with a switch.
fn = &Func{Kind: "Block"}
cmd/compile: disallow rewrite rules from declaring reserved names If I change a rule in ARM64.rules to use the variable name "b" in a conflicting way, rulegen would previously not complain, and the compiler would later give a confusing error: $ go run *.go && go build cmd/compile/internal/ssa # cmd/compile/internal/ssa ../rewriteARM64.go:24236:10: b.NewValue0 undefined (type int64 has no field or method NewValue0) Make rulegen complain early about those cases. Sometimes they might happen to be harmless, but in general they can easily cause confusion or unintended effect due to shadowing. After the change, with the same conflicting rule: $ go run *.go && go build cmd/compile/internal/ssa 2021/03/22 11:31:49 rule ARM64.rules:495 uses the reserved name b exit status 1 Note that 24 existing rules were using reserved names. It seems like the shadowing was harmless, as it wasn't causing typechecking issues nor did it seem to cause unintended behavior when the rule rewrite code ran. The bool values "b" were renamed "t", since that seems to have a precedent in other rules and in the fmt package. Sequential values like "a b c" were renamed to "x y z", since "b" is reserved. Finally, "typ" was renamed to "_typ", since there doesn't seem to be an obviously better answer. Passes all three of: $ GOARCH=amd64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=arm64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=mips64 go build -toolexec 'toolstash -cmp' -a std Fixes #45154. Change-Id: I1cce194dc7b477886a9c218c17973e996bcedccf Reviewed-on: https://go-review.googlesource.com/c/go/+/303549 Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2021-03-22 11:35:02 +01:00
fn.add(declReserved("config", "b.Func.Config"))
fn.add(declReserved("typ", "&b.Func.Config.Types"))
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
sw = &Switch{Expr: exprf("b.Kind")}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
ops = ops[:0]
for op := range blockrules {
ops = append(ops, op)
}
sort.Strings(ops)
for _, op := range ops {
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
name, data := getBlockInfo(op, arch)
swc := &Case{Expr: exprf("%s", name)}
for _, rule := range blockrules[op] {
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
swc.add(genBlockRewrite(rule, arch, data))
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
sw.add(swc)
}
if len(sw.List) > 0 { // skip if empty
fn.add(sw)
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
fn.add(stmtf("return false"))
cmd/compile: teach rulegen to remove unused decls First, add cpu and memory profiling flags, as these are useful to see where rulegen is spending its time. It now takes many seconds to run on a recent laptop, so we have to keep an eye on what it's doing. Second, stop writing '_ = var' lines to keep imports and variables used at all times. Now that rulegen removes all such unused names, they're unnecessary. To perform the removal, lean on go/types to first detect what names are unused. We can configure it to give us all the type-checking errors in a file, so we can collect all "declared but not used" errors in a single pass. We then use astutil.Apply to remove the relevant nodes based on the line information from each unused error. This allows us to apply the changes without having to do extra parser+printer roundtrips to plaintext, which are far too expensive. We need to do multiple such passes, as removing an unused variable declaration might then make another declaration unused. Two passes are enough to clean every file at the moment, so add a limit of three passes for now to avoid eating cpu uncontrollably by accident. The resulting performance of the changes above is a ~30% loss across the table, since go/types is fairly expensive. The numbers were obtained with 'benchcmd Rulegen go run *.go', which involves compiling rulegen itself, but that seems reflective of how the program is used. name old time/op new time/op delta Rulegen 5.61s ± 0% 7.36s ± 0% +31.17% (p=0.016 n=5+4) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 9.92s ± 1% +37.76% (p=0.016 n=5+4) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 169ms ±17% +25.66% (p=0.032 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 85.6MB ± 2% +20.56% (p=0.008 n=5+5) We can live with a bit more resource usage, but the time/op getting close to 10s isn't good. To win that back, introduce concurrency in main.go. This further increases resource usage a bit, but the real time on this quad-core laptop is greatly reduced. The final benchstat is as follows: name old time/op new time/op delta Rulegen 5.61s ± 0% 3.97s ± 1% -29.26% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 13.91s ± 1% +93.09% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 269ms ± 9% +99.17% (p=0.008 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 226.3MB ± 1% +218.72% (p=0.008 n=5+5) It might be possible to reduce the cpu or memory usage in the future, such as configuring go/types to do less work, or taking shortcuts to avoid having to run it many times. For now, ~2x cpu and ~4x memory usage seems like a fair trade for a faster and better rulegen. Finally, we can remove the old code that tried to remove some unused variables in a hacky and unmaintainable way. Change-Id: Iff9e83e3f253babf5a1bd48cc993033b8550cee6 Reviewed-on: https://go-review.googlesource.com/c/go/+/189798 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-10 19:27:45 +02:00
genFile.add(fn)
cmd/compile: teach rulegen to remove unused decls First, add cpu and memory profiling flags, as these are useful to see where rulegen is spending its time. It now takes many seconds to run on a recent laptop, so we have to keep an eye on what it's doing. Second, stop writing '_ = var' lines to keep imports and variables used at all times. Now that rulegen removes all such unused names, they're unnecessary. To perform the removal, lean on go/types to first detect what names are unused. We can configure it to give us all the type-checking errors in a file, so we can collect all "declared but not used" errors in a single pass. We then use astutil.Apply to remove the relevant nodes based on the line information from each unused error. This allows us to apply the changes without having to do extra parser+printer roundtrips to plaintext, which are far too expensive. We need to do multiple such passes, as removing an unused variable declaration might then make another declaration unused. Two passes are enough to clean every file at the moment, so add a limit of three passes for now to avoid eating cpu uncontrollably by accident. The resulting performance of the changes above is a ~30% loss across the table, since go/types is fairly expensive. The numbers were obtained with 'benchcmd Rulegen go run *.go', which involves compiling rulegen itself, but that seems reflective of how the program is used. name old time/op new time/op delta Rulegen 5.61s ± 0% 7.36s ± 0% +31.17% (p=0.016 n=5+4) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 9.92s ± 1% +37.76% (p=0.016 n=5+4) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 169ms ±17% +25.66% (p=0.032 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 85.6MB ± 2% +20.56% (p=0.008 n=5+5) We can live with a bit more resource usage, but the time/op getting close to 10s isn't good. To win that back, introduce concurrency in main.go. This further increases resource usage a bit, but the real time on this quad-core laptop is greatly reduced. The final benchstat is as follows: name old time/op new time/op delta Rulegen 5.61s ± 0% 3.97s ± 1% -29.26% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 13.91s ± 1% +93.09% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 269ms ± 9% +99.17% (p=0.008 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 226.3MB ± 1% +218.72% (p=0.008 n=5+5) It might be possible to reduce the cpu or memory usage in the future, such as configuring go/types to do less work, or taking shortcuts to avoid having to run it many times. For now, ~2x cpu and ~4x memory usage seems like a fair trade for a faster and better rulegen. Finally, we can remove the old code that tried to remove some unused variables in a hacky and unmaintainable way. Change-Id: Iff9e83e3f253babf5a1bd48cc993033b8550cee6 Reviewed-on: https://go-review.googlesource.com/c/go/+/189798 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-10 19:27:45 +02:00
// Remove unused imports and variables.
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
buf := new(bytes.Buffer)
cmd/compile: teach rulegen to remove unused decls First, add cpu and memory profiling flags, as these are useful to see where rulegen is spending its time. It now takes many seconds to run on a recent laptop, so we have to keep an eye on what it's doing. Second, stop writing '_ = var' lines to keep imports and variables used at all times. Now that rulegen removes all such unused names, they're unnecessary. To perform the removal, lean on go/types to first detect what names are unused. We can configure it to give us all the type-checking errors in a file, so we can collect all "declared but not used" errors in a single pass. We then use astutil.Apply to remove the relevant nodes based on the line information from each unused error. This allows us to apply the changes without having to do extra parser+printer roundtrips to plaintext, which are far too expensive. We need to do multiple such passes, as removing an unused variable declaration might then make another declaration unused. Two passes are enough to clean every file at the moment, so add a limit of three passes for now to avoid eating cpu uncontrollably by accident. The resulting performance of the changes above is a ~30% loss across the table, since go/types is fairly expensive. The numbers were obtained with 'benchcmd Rulegen go run *.go', which involves compiling rulegen itself, but that seems reflective of how the program is used. name old time/op new time/op delta Rulegen 5.61s ± 0% 7.36s ± 0% +31.17% (p=0.016 n=5+4) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 9.92s ± 1% +37.76% (p=0.016 n=5+4) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 169ms ±17% +25.66% (p=0.032 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 85.6MB ± 2% +20.56% (p=0.008 n=5+5) We can live with a bit more resource usage, but the time/op getting close to 10s isn't good. To win that back, introduce concurrency in main.go. This further increases resource usage a bit, but the real time on this quad-core laptop is greatly reduced. The final benchstat is as follows: name old time/op new time/op delta Rulegen 5.61s ± 0% 3.97s ± 1% -29.26% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 13.91s ± 1% +93.09% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 269ms ± 9% +99.17% (p=0.008 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 226.3MB ± 1% +218.72% (p=0.008 n=5+5) It might be possible to reduce the cpu or memory usage in the future, such as configuring go/types to do less work, or taking shortcuts to avoid having to run it many times. For now, ~2x cpu and ~4x memory usage seems like a fair trade for a faster and better rulegen. Finally, we can remove the old code that tried to remove some unused variables in a hacky and unmaintainable way. Change-Id: Iff9e83e3f253babf5a1bd48cc993033b8550cee6 Reviewed-on: https://go-review.googlesource.com/c/go/+/189798 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-10 19:27:45 +02:00
fprint(buf, genFile)
fset := token.NewFileSet()
file, err := parser.ParseFile(fset, "", buf, parser.ParseComments)
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
if err != nil {
filename := fmt.Sprintf("%s_broken.go", arch.name)
if err := os.WriteFile(filename, buf.Bytes(), 0644); err != nil {
log.Printf("failed to dump broken code to %s: %v", filename, err)
} else {
log.Printf("dumped broken code to %s", filename)
}
log.Fatalf("failed to parse generated code for arch %s: %v", arch.name, err)
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
cmd/compile: teach rulegen to remove unused decls First, add cpu and memory profiling flags, as these are useful to see where rulegen is spending its time. It now takes many seconds to run on a recent laptop, so we have to keep an eye on what it's doing. Second, stop writing '_ = var' lines to keep imports and variables used at all times. Now that rulegen removes all such unused names, they're unnecessary. To perform the removal, lean on go/types to first detect what names are unused. We can configure it to give us all the type-checking errors in a file, so we can collect all "declared but not used" errors in a single pass. We then use astutil.Apply to remove the relevant nodes based on the line information from each unused error. This allows us to apply the changes without having to do extra parser+printer roundtrips to plaintext, which are far too expensive. We need to do multiple such passes, as removing an unused variable declaration might then make another declaration unused. Two passes are enough to clean every file at the moment, so add a limit of three passes for now to avoid eating cpu uncontrollably by accident. The resulting performance of the changes above is a ~30% loss across the table, since go/types is fairly expensive. The numbers were obtained with 'benchcmd Rulegen go run *.go', which involves compiling rulegen itself, but that seems reflective of how the program is used. name old time/op new time/op delta Rulegen 5.61s ± 0% 7.36s ± 0% +31.17% (p=0.016 n=5+4) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 9.92s ± 1% +37.76% (p=0.016 n=5+4) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 169ms ±17% +25.66% (p=0.032 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 85.6MB ± 2% +20.56% (p=0.008 n=5+5) We can live with a bit more resource usage, but the time/op getting close to 10s isn't good. To win that back, introduce concurrency in main.go. This further increases resource usage a bit, but the real time on this quad-core laptop is greatly reduced. The final benchstat is as follows: name old time/op new time/op delta Rulegen 5.61s ± 0% 3.97s ± 1% -29.26% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 13.91s ± 1% +93.09% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 269ms ± 9% +99.17% (p=0.008 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 226.3MB ± 1% +218.72% (p=0.008 n=5+5) It might be possible to reduce the cpu or memory usage in the future, such as configuring go/types to do less work, or taking shortcuts to avoid having to run it many times. For now, ~2x cpu and ~4x memory usage seems like a fair trade for a faster and better rulegen. Finally, we can remove the old code that tried to remove some unused variables in a hacky and unmaintainable way. Change-Id: Iff9e83e3f253babf5a1bd48cc993033b8550cee6 Reviewed-on: https://go-review.googlesource.com/c/go/+/189798 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-10 19:27:45 +02:00
tfile := fset.File(file.Pos())
cmd/compile: stop using go/types in rulegen Using go/types to get rid of all unused variables in CL 189798 was a neat idea, but it was pretty expensive. go/types is a full typechecker, which does a lot more work than we actually need. Moreover, we had to run it multiple times, to catch variables that became unused after removing existing unused variables. Instead, write our own little detector for unused imports and variables. It doesn't use ast.Walk, as we need to know what fields we're inspecting. For example, in "foo := bar", "foo" is declared, and "bar" is used, yet they both appear as simple *ast.Ident cases under ast.Walk. The code is documented to explain how unused variables are detected in a single syntax tree pass. Since this happens after we've generated a complete go/ast.File, we don't need to worry about our own simplified node types. The generated code is the same, but rulegen is much faster and uses less memory at its peak, so it should scale better with time. With 'benchcmd Rulegen go run *.go' on perflock, we get: name old time/op new time/op delta Rulegen 4.00s ± 0% 3.41s ± 1% -14.70% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 14.1s ± 1% 10.6s ± 1% -24.62% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 318ms ±26% 263ms ± 9% ~ (p=0.056 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 231MB ± 4% 181MB ± 3% -21.69% (p=0.008 n=5+5) Change-Id: I8387d52818f6131357868ad348dac8c96d926191 Reviewed-on: https://go-review.googlesource.com/c/go/+/191782 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27 22:47:37 +02:00
// First, use unusedInspector to find the unused declarations by their
// start position.
u := unusedInspector{unused: make(map[token.Pos]bool)}
u.node(file)
// Then, delete said nodes via astutil.Apply.
pre := func(c *astutil.Cursor) bool {
node := c.Node()
if node == nil {
cmd/compile: teach rulegen to remove unused decls First, add cpu and memory profiling flags, as these are useful to see where rulegen is spending its time. It now takes many seconds to run on a recent laptop, so we have to keep an eye on what it's doing. Second, stop writing '_ = var' lines to keep imports and variables used at all times. Now that rulegen removes all such unused names, they're unnecessary. To perform the removal, lean on go/types to first detect what names are unused. We can configure it to give us all the type-checking errors in a file, so we can collect all "declared but not used" errors in a single pass. We then use astutil.Apply to remove the relevant nodes based on the line information from each unused error. This allows us to apply the changes without having to do extra parser+printer roundtrips to plaintext, which are far too expensive. We need to do multiple such passes, as removing an unused variable declaration might then make another declaration unused. Two passes are enough to clean every file at the moment, so add a limit of three passes for now to avoid eating cpu uncontrollably by accident. The resulting performance of the changes above is a ~30% loss across the table, since go/types is fairly expensive. The numbers were obtained with 'benchcmd Rulegen go run *.go', which involves compiling rulegen itself, but that seems reflective of how the program is used. name old time/op new time/op delta Rulegen 5.61s ± 0% 7.36s ± 0% +31.17% (p=0.016 n=5+4) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 9.92s ± 1% +37.76% (p=0.016 n=5+4) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 169ms ±17% +25.66% (p=0.032 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 85.6MB ± 2% +20.56% (p=0.008 n=5+5) We can live with a bit more resource usage, but the time/op getting close to 10s isn't good. To win that back, introduce concurrency in main.go. This further increases resource usage a bit, but the real time on this quad-core laptop is greatly reduced. The final benchstat is as follows: name old time/op new time/op delta Rulegen 5.61s ± 0% 3.97s ± 1% -29.26% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 13.91s ± 1% +93.09% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 269ms ± 9% +99.17% (p=0.008 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 226.3MB ± 1% +218.72% (p=0.008 n=5+5) It might be possible to reduce the cpu or memory usage in the future, such as configuring go/types to do less work, or taking shortcuts to avoid having to run it many times. For now, ~2x cpu and ~4x memory usage seems like a fair trade for a faster and better rulegen. Finally, we can remove the old code that tried to remove some unused variables in a hacky and unmaintainable way. Change-Id: Iff9e83e3f253babf5a1bd48cc993033b8550cee6 Reviewed-on: https://go-review.googlesource.com/c/go/+/189798 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-10 19:27:45 +02:00
return true
}
cmd/compile: stop using go/types in rulegen Using go/types to get rid of all unused variables in CL 189798 was a neat idea, but it was pretty expensive. go/types is a full typechecker, which does a lot more work than we actually need. Moreover, we had to run it multiple times, to catch variables that became unused after removing existing unused variables. Instead, write our own little detector for unused imports and variables. It doesn't use ast.Walk, as we need to know what fields we're inspecting. For example, in "foo := bar", "foo" is declared, and "bar" is used, yet they both appear as simple *ast.Ident cases under ast.Walk. The code is documented to explain how unused variables are detected in a single syntax tree pass. Since this happens after we've generated a complete go/ast.File, we don't need to worry about our own simplified node types. The generated code is the same, but rulegen is much faster and uses less memory at its peak, so it should scale better with time. With 'benchcmd Rulegen go run *.go' on perflock, we get: name old time/op new time/op delta Rulegen 4.00s ± 0% 3.41s ± 1% -14.70% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 14.1s ± 1% 10.6s ± 1% -24.62% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 318ms ±26% 263ms ± 9% ~ (p=0.056 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 231MB ± 4% 181MB ± 3% -21.69% (p=0.008 n=5+5) Change-Id: I8387d52818f6131357868ad348dac8c96d926191 Reviewed-on: https://go-review.googlesource.com/c/go/+/191782 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27 22:47:37 +02:00
if u.unused[node.Pos()] {
c.Delete()
// Unused imports and declarations use exactly
// one line. Prevent leaving an empty line.
tfile.MergeLine(tfile.Position(node.Pos()).Line)
return false
}
return true
}
post := func(c *astutil.Cursor) bool {
switch node := c.Node().(type) {
case *ast.GenDecl:
if len(node.Specs) == 0 {
// Don't leave a broken or empty GenDecl behind,
// such as "import ()".
c.Delete()
cmd/compile: teach rulegen to remove unused decls First, add cpu and memory profiling flags, as these are useful to see where rulegen is spending its time. It now takes many seconds to run on a recent laptop, so we have to keep an eye on what it's doing. Second, stop writing '_ = var' lines to keep imports and variables used at all times. Now that rulegen removes all such unused names, they're unnecessary. To perform the removal, lean on go/types to first detect what names are unused. We can configure it to give us all the type-checking errors in a file, so we can collect all "declared but not used" errors in a single pass. We then use astutil.Apply to remove the relevant nodes based on the line information from each unused error. This allows us to apply the changes without having to do extra parser+printer roundtrips to plaintext, which are far too expensive. We need to do multiple such passes, as removing an unused variable declaration might then make another declaration unused. Two passes are enough to clean every file at the moment, so add a limit of three passes for now to avoid eating cpu uncontrollably by accident. The resulting performance of the changes above is a ~30% loss across the table, since go/types is fairly expensive. The numbers were obtained with 'benchcmd Rulegen go run *.go', which involves compiling rulegen itself, but that seems reflective of how the program is used. name old time/op new time/op delta Rulegen 5.61s ± 0% 7.36s ± 0% +31.17% (p=0.016 n=5+4) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 9.92s ± 1% +37.76% (p=0.016 n=5+4) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 169ms ±17% +25.66% (p=0.032 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 85.6MB ± 2% +20.56% (p=0.008 n=5+5) We can live with a bit more resource usage, but the time/op getting close to 10s isn't good. To win that back, introduce concurrency in main.go. This further increases resource usage a bit, but the real time on this quad-core laptop is greatly reduced. The final benchstat is as follows: name old time/op new time/op delta Rulegen 5.61s ± 0% 3.97s ± 1% -29.26% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 13.91s ± 1% +93.09% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 269ms ± 9% +99.17% (p=0.008 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 226.3MB ± 1% +218.72% (p=0.008 n=5+5) It might be possible to reduce the cpu or memory usage in the future, such as configuring go/types to do less work, or taking shortcuts to avoid having to run it many times. For now, ~2x cpu and ~4x memory usage seems like a fair trade for a faster and better rulegen. Finally, we can remove the old code that tried to remove some unused variables in a hacky and unmaintainable way. Change-Id: Iff9e83e3f253babf5a1bd48cc993033b8550cee6 Reviewed-on: https://go-review.googlesource.com/c/go/+/189798 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-10 19:27:45 +02:00
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
cmd/compile: stop using go/types in rulegen Using go/types to get rid of all unused variables in CL 189798 was a neat idea, but it was pretty expensive. go/types is a full typechecker, which does a lot more work than we actually need. Moreover, we had to run it multiple times, to catch variables that became unused after removing existing unused variables. Instead, write our own little detector for unused imports and variables. It doesn't use ast.Walk, as we need to know what fields we're inspecting. For example, in "foo := bar", "foo" is declared, and "bar" is used, yet they both appear as simple *ast.Ident cases under ast.Walk. The code is documented to explain how unused variables are detected in a single syntax tree pass. Since this happens after we've generated a complete go/ast.File, we don't need to worry about our own simplified node types. The generated code is the same, but rulegen is much faster and uses less memory at its peak, so it should scale better with time. With 'benchcmd Rulegen go run *.go' on perflock, we get: name old time/op new time/op delta Rulegen 4.00s ± 0% 3.41s ± 1% -14.70% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 14.1s ± 1% 10.6s ± 1% -24.62% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 318ms ±26% 263ms ± 9% ~ (p=0.056 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 231MB ± 4% 181MB ± 3% -21.69% (p=0.008 n=5+5) Change-Id: I8387d52818f6131357868ad348dac8c96d926191 Reviewed-on: https://go-review.googlesource.com/c/go/+/191782 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27 22:47:37 +02:00
return true
cmd/compile: teach rulegen to remove unused decls First, add cpu and memory profiling flags, as these are useful to see where rulegen is spending its time. It now takes many seconds to run on a recent laptop, so we have to keep an eye on what it's doing. Second, stop writing '_ = var' lines to keep imports and variables used at all times. Now that rulegen removes all such unused names, they're unnecessary. To perform the removal, lean on go/types to first detect what names are unused. We can configure it to give us all the type-checking errors in a file, so we can collect all "declared but not used" errors in a single pass. We then use astutil.Apply to remove the relevant nodes based on the line information from each unused error. This allows us to apply the changes without having to do extra parser+printer roundtrips to plaintext, which are far too expensive. We need to do multiple such passes, as removing an unused variable declaration might then make another declaration unused. Two passes are enough to clean every file at the moment, so add a limit of three passes for now to avoid eating cpu uncontrollably by accident. The resulting performance of the changes above is a ~30% loss across the table, since go/types is fairly expensive. The numbers were obtained with 'benchcmd Rulegen go run *.go', which involves compiling rulegen itself, but that seems reflective of how the program is used. name old time/op new time/op delta Rulegen 5.61s ± 0% 7.36s ± 0% +31.17% (p=0.016 n=5+4) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 9.92s ± 1% +37.76% (p=0.016 n=5+4) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 169ms ±17% +25.66% (p=0.032 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 85.6MB ± 2% +20.56% (p=0.008 n=5+5) We can live with a bit more resource usage, but the time/op getting close to 10s isn't good. To win that back, introduce concurrency in main.go. This further increases resource usage a bit, but the real time on this quad-core laptop is greatly reduced. The final benchstat is as follows: name old time/op new time/op delta Rulegen 5.61s ± 0% 3.97s ± 1% -29.26% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 13.91s ± 1% +93.09% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 269ms ± 9% +99.17% (p=0.008 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 226.3MB ± 1% +218.72% (p=0.008 n=5+5) It might be possible to reduce the cpu or memory usage in the future, such as configuring go/types to do less work, or taking shortcuts to avoid having to run it many times. For now, ~2x cpu and ~4x memory usage seems like a fair trade for a faster and better rulegen. Finally, we can remove the old code that tried to remove some unused variables in a hacky and unmaintainable way. Change-Id: Iff9e83e3f253babf5a1bd48cc993033b8550cee6 Reviewed-on: https://go-review.googlesource.com/c/go/+/189798 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-10 19:27:45 +02:00
}
cmd/compile: stop using go/types in rulegen Using go/types to get rid of all unused variables in CL 189798 was a neat idea, but it was pretty expensive. go/types is a full typechecker, which does a lot more work than we actually need. Moreover, we had to run it multiple times, to catch variables that became unused after removing existing unused variables. Instead, write our own little detector for unused imports and variables. It doesn't use ast.Walk, as we need to know what fields we're inspecting. For example, in "foo := bar", "foo" is declared, and "bar" is used, yet they both appear as simple *ast.Ident cases under ast.Walk. The code is documented to explain how unused variables are detected in a single syntax tree pass. Since this happens after we've generated a complete go/ast.File, we don't need to worry about our own simplified node types. The generated code is the same, but rulegen is much faster and uses less memory at its peak, so it should scale better with time. With 'benchcmd Rulegen go run *.go' on perflock, we get: name old time/op new time/op delta Rulegen 4.00s ± 0% 3.41s ± 1% -14.70% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 14.1s ± 1% 10.6s ± 1% -24.62% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 318ms ±26% 263ms ± 9% ~ (p=0.056 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 231MB ± 4% 181MB ± 3% -21.69% (p=0.008 n=5+5) Change-Id: I8387d52818f6131357868ad348dac8c96d926191 Reviewed-on: https://go-review.googlesource.com/c/go/+/191782 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27 22:47:37 +02:00
file = astutil.Apply(file, pre, post).(*ast.File)
cmd/compile: teach rulegen to remove unused decls First, add cpu and memory profiling flags, as these are useful to see where rulegen is spending its time. It now takes many seconds to run on a recent laptop, so we have to keep an eye on what it's doing. Second, stop writing '_ = var' lines to keep imports and variables used at all times. Now that rulegen removes all such unused names, they're unnecessary. To perform the removal, lean on go/types to first detect what names are unused. We can configure it to give us all the type-checking errors in a file, so we can collect all "declared but not used" errors in a single pass. We then use astutil.Apply to remove the relevant nodes based on the line information from each unused error. This allows us to apply the changes without having to do extra parser+printer roundtrips to plaintext, which are far too expensive. We need to do multiple such passes, as removing an unused variable declaration might then make another declaration unused. Two passes are enough to clean every file at the moment, so add a limit of three passes for now to avoid eating cpu uncontrollably by accident. The resulting performance of the changes above is a ~30% loss across the table, since go/types is fairly expensive. The numbers were obtained with 'benchcmd Rulegen go run *.go', which involves compiling rulegen itself, but that seems reflective of how the program is used. name old time/op new time/op delta Rulegen 5.61s ± 0% 7.36s ± 0% +31.17% (p=0.016 n=5+4) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 9.92s ± 1% +37.76% (p=0.016 n=5+4) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 169ms ±17% +25.66% (p=0.032 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 85.6MB ± 2% +20.56% (p=0.008 n=5+5) We can live with a bit more resource usage, but the time/op getting close to 10s isn't good. To win that back, introduce concurrency in main.go. This further increases resource usage a bit, but the real time on this quad-core laptop is greatly reduced. The final benchstat is as follows: name old time/op new time/op delta Rulegen 5.61s ± 0% 3.97s ± 1% -29.26% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 13.91s ± 1% +93.09% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 269ms ± 9% +99.17% (p=0.008 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 226.3MB ± 1% +218.72% (p=0.008 n=5+5) It might be possible to reduce the cpu or memory usage in the future, such as configuring go/types to do less work, or taking shortcuts to avoid having to run it many times. For now, ~2x cpu and ~4x memory usage seems like a fair trade for a faster and better rulegen. Finally, we can remove the old code that tried to remove some unused variables in a hacky and unmaintainable way. Change-Id: Iff9e83e3f253babf5a1bd48cc993033b8550cee6 Reviewed-on: https://go-review.googlesource.com/c/go/+/189798 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-10 19:27:45 +02:00
// Write the well-formatted source to file
f, err := os.Create("../rewrite" + arch.name + suff + ".go")
if err != nil {
log.Fatalf("can't write output: %v", err)
}
defer f.Close()
// gofmt result; use a buffered writer, as otherwise go/format spends
// far too much time in syscalls.
bw := bufio.NewWriter(f)
if err := format.Node(bw, fset, file); err != nil {
log.Fatalf("can't format output: %v", err)
}
if err := bw.Flush(); err != nil {
log.Fatalf("can't write output: %v", err)
}
if err := f.Close(); err != nil {
log.Fatalf("can't write output: %v", err)
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
}
cmd/compile: stop using go/types in rulegen Using go/types to get rid of all unused variables in CL 189798 was a neat idea, but it was pretty expensive. go/types is a full typechecker, which does a lot more work than we actually need. Moreover, we had to run it multiple times, to catch variables that became unused after removing existing unused variables. Instead, write our own little detector for unused imports and variables. It doesn't use ast.Walk, as we need to know what fields we're inspecting. For example, in "foo := bar", "foo" is declared, and "bar" is used, yet they both appear as simple *ast.Ident cases under ast.Walk. The code is documented to explain how unused variables are detected in a single syntax tree pass. Since this happens after we've generated a complete go/ast.File, we don't need to worry about our own simplified node types. The generated code is the same, but rulegen is much faster and uses less memory at its peak, so it should scale better with time. With 'benchcmd Rulegen go run *.go' on perflock, we get: name old time/op new time/op delta Rulegen 4.00s ± 0% 3.41s ± 1% -14.70% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 14.1s ± 1% 10.6s ± 1% -24.62% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 318ms ±26% 263ms ± 9% ~ (p=0.056 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 231MB ± 4% 181MB ± 3% -21.69% (p=0.008 n=5+5) Change-Id: I8387d52818f6131357868ad348dac8c96d926191 Reviewed-on: https://go-review.googlesource.com/c/go/+/191782 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27 22:47:37 +02:00
// unusedInspector can be used to detect unused variables and imports in an
// ast.Node via its node method. The result is available in the "unused" map.
//
// note that unusedInspector is lazy and best-effort; it only supports the node
// types and patterns used by the rulegen program.
type unusedInspector struct {
// scope is the current scope, which can never be nil when a declaration
// is encountered. That is, the unusedInspector.node entrypoint should
// generally be an entire file or block.
scope *scope
// unused is the resulting set of unused declared names, indexed by the
// starting position of the node that declared the name.
unused map[token.Pos]bool
// defining is the object currently being defined; this is useful so
// that if "foo := bar" is unused and removed, we can then detect if
// "bar" becomes unused as well.
defining *object
}
// scoped opens a new scope when called, and returns a function which closes
// that same scope. When a scope is closed, unused variables are recorded.
func (u *unusedInspector) scoped() func() {
outer := u.scope
u.scope = &scope{outer: outer, objects: map[string]*object{}}
return func() {
for anyUnused := true; anyUnused; {
anyUnused = false
for _, obj := range u.scope.objects {
if obj.numUses > 0 {
continue
}
u.unused[obj.pos] = true
for _, used := range obj.used {
if used.numUses--; used.numUses == 0 {
anyUnused = true
}
}
// We've decremented numUses for each of the
// objects in used. Zero this slice too, to keep
// everything consistent.
obj.used = nil
}
}
u.scope = outer
}
}
func (u *unusedInspector) exprs(list []ast.Expr) {
for _, x := range list {
u.node(x)
}
}
func (u *unusedInspector) node(node ast.Node) {
switch node := node.(type) {
case *ast.File:
defer u.scoped()()
for _, decl := range node.Decls {
u.node(decl)
}
cmd/compile: stop using go/types in rulegen Using go/types to get rid of all unused variables in CL 189798 was a neat idea, but it was pretty expensive. go/types is a full typechecker, which does a lot more work than we actually need. Moreover, we had to run it multiple times, to catch variables that became unused after removing existing unused variables. Instead, write our own little detector for unused imports and variables. It doesn't use ast.Walk, as we need to know what fields we're inspecting. For example, in "foo := bar", "foo" is declared, and "bar" is used, yet they both appear as simple *ast.Ident cases under ast.Walk. The code is documented to explain how unused variables are detected in a single syntax tree pass. Since this happens after we've generated a complete go/ast.File, we don't need to worry about our own simplified node types. The generated code is the same, but rulegen is much faster and uses less memory at its peak, so it should scale better with time. With 'benchcmd Rulegen go run *.go' on perflock, we get: name old time/op new time/op delta Rulegen 4.00s ± 0% 3.41s ± 1% -14.70% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 14.1s ± 1% 10.6s ± 1% -24.62% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 318ms ±26% 263ms ± 9% ~ (p=0.056 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 231MB ± 4% 181MB ± 3% -21.69% (p=0.008 n=5+5) Change-Id: I8387d52818f6131357868ad348dac8c96d926191 Reviewed-on: https://go-review.googlesource.com/c/go/+/191782 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27 22:47:37 +02:00
case *ast.GenDecl:
for _, spec := range node.Specs {
u.node(spec)
}
case *ast.ImportSpec:
impPath, _ := strconv.Unquote(node.Path.Value)
name := path.Base(impPath)
u.scope.objects[name] = &object{
name: name,
pos: node.Pos(),
}
case *ast.FuncDecl:
u.node(node.Type)
if node.Body != nil {
u.node(node.Body)
}
case *ast.FuncType:
if node.Params != nil {
u.node(node.Params)
}
if node.Results != nil {
u.node(node.Results)
}
case *ast.FieldList:
for _, field := range node.List {
u.node(field)
}
case *ast.Field:
u.node(node.Type)
// statements
case *ast.BlockStmt:
defer u.scoped()()
for _, stmt := range node.List {
u.node(stmt)
}
cmd/compile: start implementing strongly typed aux and auxint fields Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-19 16:25:08 -07:00
case *ast.DeclStmt:
u.node(node.Decl)
cmd/compile: stop using go/types in rulegen Using go/types to get rid of all unused variables in CL 189798 was a neat idea, but it was pretty expensive. go/types is a full typechecker, which does a lot more work than we actually need. Moreover, we had to run it multiple times, to catch variables that became unused after removing existing unused variables. Instead, write our own little detector for unused imports and variables. It doesn't use ast.Walk, as we need to know what fields we're inspecting. For example, in "foo := bar", "foo" is declared, and "bar" is used, yet they both appear as simple *ast.Ident cases under ast.Walk. The code is documented to explain how unused variables are detected in a single syntax tree pass. Since this happens after we've generated a complete go/ast.File, we don't need to worry about our own simplified node types. The generated code is the same, but rulegen is much faster and uses less memory at its peak, so it should scale better with time. With 'benchcmd Rulegen go run *.go' on perflock, we get: name old time/op new time/op delta Rulegen 4.00s ± 0% 3.41s ± 1% -14.70% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 14.1s ± 1% 10.6s ± 1% -24.62% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 318ms ±26% 263ms ± 9% ~ (p=0.056 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 231MB ± 4% 181MB ± 3% -21.69% (p=0.008 n=5+5) Change-Id: I8387d52818f6131357868ad348dac8c96d926191 Reviewed-on: https://go-review.googlesource.com/c/go/+/191782 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27 22:47:37 +02:00
case *ast.IfStmt:
if node.Init != nil {
u.node(node.Init)
}
u.node(node.Cond)
u.node(node.Body)
if node.Else != nil {
u.node(node.Else)
}
case *ast.ForStmt:
if node.Init != nil {
u.node(node.Init)
}
if node.Cond != nil {
u.node(node.Cond)
}
if node.Post != nil {
u.node(node.Post)
}
u.node(node.Body)
case *ast.SwitchStmt:
if node.Init != nil {
u.node(node.Init)
}
if node.Tag != nil {
u.node(node.Tag)
}
u.node(node.Body)
case *ast.CaseClause:
u.exprs(node.List)
defer u.scoped()()
for _, stmt := range node.Body {
u.node(stmt)
}
cmd/compile: stop using go/types in rulegen Using go/types to get rid of all unused variables in CL 189798 was a neat idea, but it was pretty expensive. go/types is a full typechecker, which does a lot more work than we actually need. Moreover, we had to run it multiple times, to catch variables that became unused after removing existing unused variables. Instead, write our own little detector for unused imports and variables. It doesn't use ast.Walk, as we need to know what fields we're inspecting. For example, in "foo := bar", "foo" is declared, and "bar" is used, yet they both appear as simple *ast.Ident cases under ast.Walk. The code is documented to explain how unused variables are detected in a single syntax tree pass. Since this happens after we've generated a complete go/ast.File, we don't need to worry about our own simplified node types. The generated code is the same, but rulegen is much faster and uses less memory at its peak, so it should scale better with time. With 'benchcmd Rulegen go run *.go' on perflock, we get: name old time/op new time/op delta Rulegen 4.00s ± 0% 3.41s ± 1% -14.70% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 14.1s ± 1% 10.6s ± 1% -24.62% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 318ms ±26% 263ms ± 9% ~ (p=0.056 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 231MB ± 4% 181MB ± 3% -21.69% (p=0.008 n=5+5) Change-Id: I8387d52818f6131357868ad348dac8c96d926191 Reviewed-on: https://go-review.googlesource.com/c/go/+/191782 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27 22:47:37 +02:00
case *ast.BranchStmt:
case *ast.ExprStmt:
u.node(node.X)
case *ast.AssignStmt:
if node.Tok != token.DEFINE {
u.exprs(node.Rhs)
u.exprs(node.Lhs)
break
}
lhs := node.Lhs
if len(lhs) == 2 && lhs[1].(*ast.Ident).Name == "_" {
lhs = lhs[:1]
}
if len(lhs) != 1 {
cmd/compile: stop using go/types in rulegen Using go/types to get rid of all unused variables in CL 189798 was a neat idea, but it was pretty expensive. go/types is a full typechecker, which does a lot more work than we actually need. Moreover, we had to run it multiple times, to catch variables that became unused after removing existing unused variables. Instead, write our own little detector for unused imports and variables. It doesn't use ast.Walk, as we need to know what fields we're inspecting. For example, in "foo := bar", "foo" is declared, and "bar" is used, yet they both appear as simple *ast.Ident cases under ast.Walk. The code is documented to explain how unused variables are detected in a single syntax tree pass. Since this happens after we've generated a complete go/ast.File, we don't need to worry about our own simplified node types. The generated code is the same, but rulegen is much faster and uses less memory at its peak, so it should scale better with time. With 'benchcmd Rulegen go run *.go' on perflock, we get: name old time/op new time/op delta Rulegen 4.00s ± 0% 3.41s ± 1% -14.70% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 14.1s ± 1% 10.6s ± 1% -24.62% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 318ms ±26% 263ms ± 9% ~ (p=0.056 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 231MB ± 4% 181MB ± 3% -21.69% (p=0.008 n=5+5) Change-Id: I8387d52818f6131357868ad348dac8c96d926191 Reviewed-on: https://go-review.googlesource.com/c/go/+/191782 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27 22:47:37 +02:00
panic("no support for := with multiple names")
}
name := lhs[0].(*ast.Ident)
cmd/compile: stop using go/types in rulegen Using go/types to get rid of all unused variables in CL 189798 was a neat idea, but it was pretty expensive. go/types is a full typechecker, which does a lot more work than we actually need. Moreover, we had to run it multiple times, to catch variables that became unused after removing existing unused variables. Instead, write our own little detector for unused imports and variables. It doesn't use ast.Walk, as we need to know what fields we're inspecting. For example, in "foo := bar", "foo" is declared, and "bar" is used, yet they both appear as simple *ast.Ident cases under ast.Walk. The code is documented to explain how unused variables are detected in a single syntax tree pass. Since this happens after we've generated a complete go/ast.File, we don't need to worry about our own simplified node types. The generated code is the same, but rulegen is much faster and uses less memory at its peak, so it should scale better with time. With 'benchcmd Rulegen go run *.go' on perflock, we get: name old time/op new time/op delta Rulegen 4.00s ± 0% 3.41s ± 1% -14.70% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 14.1s ± 1% 10.6s ± 1% -24.62% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 318ms ±26% 263ms ± 9% ~ (p=0.056 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 231MB ± 4% 181MB ± 3% -21.69% (p=0.008 n=5+5) Change-Id: I8387d52818f6131357868ad348dac8c96d926191 Reviewed-on: https://go-review.googlesource.com/c/go/+/191782 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27 22:47:37 +02:00
obj := &object{
name: name.Name,
pos: name.NamePos,
}
old := u.defining
u.defining = obj
u.exprs(node.Rhs)
u.defining = old
u.scope.objects[name.Name] = obj
case *ast.ReturnStmt:
u.exprs(node.Results)
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
case *ast.IncDecStmt:
u.node(node.X)
cmd/compile: stop using go/types in rulegen Using go/types to get rid of all unused variables in CL 189798 was a neat idea, but it was pretty expensive. go/types is a full typechecker, which does a lot more work than we actually need. Moreover, we had to run it multiple times, to catch variables that became unused after removing existing unused variables. Instead, write our own little detector for unused imports and variables. It doesn't use ast.Walk, as we need to know what fields we're inspecting. For example, in "foo := bar", "foo" is declared, and "bar" is used, yet they both appear as simple *ast.Ident cases under ast.Walk. The code is documented to explain how unused variables are detected in a single syntax tree pass. Since this happens after we've generated a complete go/ast.File, we don't need to worry about our own simplified node types. The generated code is the same, but rulegen is much faster and uses less memory at its peak, so it should scale better with time. With 'benchcmd Rulegen go run *.go' on perflock, we get: name old time/op new time/op delta Rulegen 4.00s ± 0% 3.41s ± 1% -14.70% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 14.1s ± 1% 10.6s ± 1% -24.62% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 318ms ±26% 263ms ± 9% ~ (p=0.056 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 231MB ± 4% 181MB ± 3% -21.69% (p=0.008 n=5+5) Change-Id: I8387d52818f6131357868ad348dac8c96d926191 Reviewed-on: https://go-review.googlesource.com/c/go/+/191782 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27 22:47:37 +02:00
// expressions
case *ast.CallExpr:
u.node(node.Fun)
u.exprs(node.Args)
case *ast.SelectorExpr:
u.node(node.X)
case *ast.UnaryExpr:
u.node(node.X)
case *ast.BinaryExpr:
u.node(node.X)
u.node(node.Y)
case *ast.StarExpr:
u.node(node.X)
case *ast.ParenExpr:
u.node(node.X)
case *ast.IndexExpr:
u.node(node.X)
u.node(node.Index)
case *ast.TypeAssertExpr:
u.node(node.X)
u.node(node.Type)
case *ast.Ident:
if obj := u.scope.Lookup(node.Name); obj != nil {
obj.numUses++
if u.defining != nil {
u.defining.used = append(u.defining.used, obj)
}
}
case *ast.BasicLit:
cmd/compile: start implementing strongly typed aux and auxint fields Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-19 16:25:08 -07:00
case *ast.ValueSpec:
u.exprs(node.Values)
cmd/compile: stop using go/types in rulegen Using go/types to get rid of all unused variables in CL 189798 was a neat idea, but it was pretty expensive. go/types is a full typechecker, which does a lot more work than we actually need. Moreover, we had to run it multiple times, to catch variables that became unused after removing existing unused variables. Instead, write our own little detector for unused imports and variables. It doesn't use ast.Walk, as we need to know what fields we're inspecting. For example, in "foo := bar", "foo" is declared, and "bar" is used, yet they both appear as simple *ast.Ident cases under ast.Walk. The code is documented to explain how unused variables are detected in a single syntax tree pass. Since this happens after we've generated a complete go/ast.File, we don't need to worry about our own simplified node types. The generated code is the same, but rulegen is much faster and uses less memory at its peak, so it should scale better with time. With 'benchcmd Rulegen go run *.go' on perflock, we get: name old time/op new time/op delta Rulegen 4.00s ± 0% 3.41s ± 1% -14.70% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 14.1s ± 1% 10.6s ± 1% -24.62% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 318ms ±26% 263ms ± 9% ~ (p=0.056 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 231MB ± 4% 181MB ± 3% -21.69% (p=0.008 n=5+5) Change-Id: I8387d52818f6131357868ad348dac8c96d926191 Reviewed-on: https://go-review.googlesource.com/c/go/+/191782 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27 22:47:37 +02:00
default:
panic(fmt.Sprintf("unhandled node: %T", node))
}
}
// scope keeps track of a certain scope and its declared names, as well as the
// outer (parent) scope.
type scope struct {
outer *scope // can be nil, if this is the top-level scope
objects map[string]*object // indexed by each declared name
}
func (s *scope) Lookup(name string) *object {
if obj := s.objects[name]; obj != nil {
return obj
}
if s.outer == nil {
return nil
}
return s.outer.Lookup(name)
}
// object keeps track of a declared name, such as a variable or import.
type object struct {
name string
pos token.Pos // start position of the node declaring the object
numUses int // number of times this object is used
used []*object // objects that its declaration makes use of
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
func fprint(w io.Writer, n Node) {
switch n := n.(type) {
case *File:
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
file := n
seenRewrite := make(map[[3]string]string)
fmt.Fprintf(w, "// Code generated from gen/%s%s.rules; DO NOT EDIT.\n", n.Arch.name, n.Suffix)
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
fmt.Fprintf(w, "// generated with: cd gen; go run *.go\n")
fmt.Fprintf(w, "\npackage ssa\n")
for _, path := range append([]string{
"fmt",
"internal/buildcfg",
"math",
"cmd/internal/obj",
"cmd/compile/internal/base",
cmd/compile: teach rulegen to remove unused decls First, add cpu and memory profiling flags, as these are useful to see where rulegen is spending its time. It now takes many seconds to run on a recent laptop, so we have to keep an eye on what it's doing. Second, stop writing '_ = var' lines to keep imports and variables used at all times. Now that rulegen removes all such unused names, they're unnecessary. To perform the removal, lean on go/types to first detect what names are unused. We can configure it to give us all the type-checking errors in a file, so we can collect all "declared but not used" errors in a single pass. We then use astutil.Apply to remove the relevant nodes based on the line information from each unused error. This allows us to apply the changes without having to do extra parser+printer roundtrips to plaintext, which are far too expensive. We need to do multiple such passes, as removing an unused variable declaration might then make another declaration unused. Two passes are enough to clean every file at the moment, so add a limit of three passes for now to avoid eating cpu uncontrollably by accident. The resulting performance of the changes above is a ~30% loss across the table, since go/types is fairly expensive. The numbers were obtained with 'benchcmd Rulegen go run *.go', which involves compiling rulegen itself, but that seems reflective of how the program is used. name old time/op new time/op delta Rulegen 5.61s ± 0% 7.36s ± 0% +31.17% (p=0.016 n=5+4) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 9.92s ± 1% +37.76% (p=0.016 n=5+4) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 169ms ±17% +25.66% (p=0.032 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 85.6MB ± 2% +20.56% (p=0.008 n=5+5) We can live with a bit more resource usage, but the time/op getting close to 10s isn't good. To win that back, introduce concurrency in main.go. This further increases resource usage a bit, but the real time on this quad-core laptop is greatly reduced. The final benchstat is as follows: name old time/op new time/op delta Rulegen 5.61s ± 0% 3.97s ± 1% -29.26% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 13.91s ± 1% +93.09% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 269ms ± 9% +99.17% (p=0.008 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 226.3MB ± 1% +218.72% (p=0.008 n=5+5) It might be possible to reduce the cpu or memory usage in the future, such as configuring go/types to do less work, or taking shortcuts to avoid having to run it many times. For now, ~2x cpu and ~4x memory usage seems like a fair trade for a faster and better rulegen. Finally, we can remove the old code that tried to remove some unused variables in a hacky and unmaintainable way. Change-Id: Iff9e83e3f253babf5a1bd48cc993033b8550cee6 Reviewed-on: https://go-review.googlesource.com/c/go/+/189798 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-10 19:27:45 +02:00
"cmd/compile/internal/types",
}, n.Arch.imports...) {
cmd/compile: teach rulegen to remove unused decls First, add cpu and memory profiling flags, as these are useful to see where rulegen is spending its time. It now takes many seconds to run on a recent laptop, so we have to keep an eye on what it's doing. Second, stop writing '_ = var' lines to keep imports and variables used at all times. Now that rulegen removes all such unused names, they're unnecessary. To perform the removal, lean on go/types to first detect what names are unused. We can configure it to give us all the type-checking errors in a file, so we can collect all "declared but not used" errors in a single pass. We then use astutil.Apply to remove the relevant nodes based on the line information from each unused error. This allows us to apply the changes without having to do extra parser+printer roundtrips to plaintext, which are far too expensive. We need to do multiple such passes, as removing an unused variable declaration might then make another declaration unused. Two passes are enough to clean every file at the moment, so add a limit of three passes for now to avoid eating cpu uncontrollably by accident. The resulting performance of the changes above is a ~30% loss across the table, since go/types is fairly expensive. The numbers were obtained with 'benchcmd Rulegen go run *.go', which involves compiling rulegen itself, but that seems reflective of how the program is used. name old time/op new time/op delta Rulegen 5.61s ± 0% 7.36s ± 0% +31.17% (p=0.016 n=5+4) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 9.92s ± 1% +37.76% (p=0.016 n=5+4) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 169ms ±17% +25.66% (p=0.032 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 85.6MB ± 2% +20.56% (p=0.008 n=5+5) We can live with a bit more resource usage, but the time/op getting close to 10s isn't good. To win that back, introduce concurrency in main.go. This further increases resource usage a bit, but the real time on this quad-core laptop is greatly reduced. The final benchstat is as follows: name old time/op new time/op delta Rulegen 5.61s ± 0% 3.97s ± 1% -29.26% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 13.91s ± 1% +93.09% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 269ms ± 9% +99.17% (p=0.008 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 226.3MB ± 1% +218.72% (p=0.008 n=5+5) It might be possible to reduce the cpu or memory usage in the future, such as configuring go/types to do less work, or taking shortcuts to avoid having to run it many times. For now, ~2x cpu and ~4x memory usage seems like a fair trade for a faster and better rulegen. Finally, we can remove the old code that tried to remove some unused variables in a hacky and unmaintainable way. Change-Id: Iff9e83e3f253babf5a1bd48cc993033b8550cee6 Reviewed-on: https://go-review.googlesource.com/c/go/+/189798 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-10 19:27:45 +02:00
fmt.Fprintf(w, "import %q\n", path)
}
for _, f := range n.List {
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
f := f.(*Func)
fmt.Fprintf(w, "func rewrite%s%s%s%s(", f.Kind, n.Arch.name, n.Suffix, f.Suffix)
fmt.Fprintf(w, "%c *%s) bool {\n", strings.ToLower(f.Kind)[0], f.Kind)
if f.Kind == "Value" && f.ArgLen > 0 {
for i := f.ArgLen - 1; i >= 0; i-- {
cmd/compile: reduce bounds checks in generated rewrite rules CL 213703 converted generated rewrite rules for commutative ops to use loops instead of duplicated code. However, it loaded args using expressions like v.Args[i] and v.Args[i^1], which the compiler could not eliminate bounds for (including with all outstanding prove CLs). Also, given a series of separate rewrite rules for the same op, we generated bounds checks for every rewrite rule, even though we were repeatedly loading the same set of args. This change reduces both sets of bounds checks. Instead of loading v.Args[i] and v.Args[i^1] for commutative loops, we now preload v.Args[0] and v.Args[1] into local variables, and then swap them (as needed) in the commutative loop post statement. And we now load all top level v.Args into local variables at the beginning of every rewrite rule function. The second optimization is the more significant, but the first helps a little, and they play together nicely from the perspective of generating the code. This does increase register pressure, but the reduced bounds checks more than compensate. Note that the vast majority of rewrite rules evaluated are not applied, so the prologue is the most important part of the rewrite rules. There is one subtle aspect to the new generated code. Because the top level v.Args are shared across rewrite rules, and rule evaluation can swap v_0 and v_1, v_0 and v_1 can end up being swapped from one rule to the next. That is OK, because any time a rule does not get applied, they will have been swapped exactly twice. Passes toolstash-check -all. name old time/op new time/op delta Template 213ms ± 2% 211ms ± 2% -0.85% (p=0.000 n=92+96) Unicode 83.5ms ± 2% 83.2ms ± 2% -0.41% (p=0.004 n=95+90) GoTypes 737ms ± 2% 733ms ± 2% -0.51% (p=0.000 n=91+94) Compiler 3.45s ± 2% 3.43s ± 2% -0.44% (p=0.000 n=99+100) SSA 8.54s ± 1% 8.32s ± 2% -2.56% (p=0.000 n=96+99) Flate 136ms ± 2% 135ms ± 1% -0.47% (p=0.000 n=96+96) GoParser 169ms ± 1% 168ms ± 1% -0.33% (p=0.000 n=96+93) Reflect 456ms ± 3% 455ms ± 3% ~ (p=0.261 n=95+94) Tar 186ms ± 2% 185ms ± 2% -0.48% (p=0.000 n=94+95) XML 251ms ± 1% 250ms ± 1% -0.51% (p=0.000 n=91+94) [Geo mean] 424ms 421ms -0.68% name old user-time/op new user-time/op delta Template 275ms ± 1% 274ms ± 2% -0.55% (p=0.000 n=95+98) Unicode 118ms ± 4% 118ms ± 4% ~ (p=0.642 n=98+90) GoTypes 983ms ± 1% 980ms ± 1% -0.30% (p=0.000 n=93+93) Compiler 4.56s ± 6% 4.52s ± 6% -0.72% (p=0.003 n=100+100) SSA 11.4s ± 1% 11.1s ± 1% -2.50% (p=0.000 n=96+97) Flate 168ms ± 1% 167ms ± 1% -0.49% (p=0.000 n=92+92) GoParser 204ms ± 1% 204ms ± 2% -0.27% (p=0.003 n=99+96) Reflect 599ms ± 2% 598ms ± 2% ~ (p=0.116 n=95+92) Tar 227ms ± 2% 225ms ± 2% -0.57% (p=0.000 n=95+98) XML 313ms ± 2% 312ms ± 1% -0.37% (p=0.000 n=89+95) [Geo mean] 547ms 544ms -0.61% file before after Δ % compile 21113112 21109016 -4096 -0.019% total 131704940 131700844 -4096 -0.003% Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956 Reviewed-on: https://go-review.googlesource.com/c/go/+/216218 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-20 20:09:41 -08:00
fmt.Fprintf(w, "v_%d := v.Args[%d]\n", i, i)
}
}
for _, n := range f.List {
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
fprint(w, n)
if rr, ok := n.(*RuleRewrite); ok {
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
k := [3]string{
normalizeMatch(rr.Match, file.Arch),
normalizeWhitespace(rr.Cond),
normalizeWhitespace(rr.Result),
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
}
if prev, ok := seenRewrite[k]; ok {
log.Fatalf("duplicate rule %s, previously seen at %s\n", rr.Loc, prev)
}
seenRewrite[k] = rr.Loc
}
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
fmt.Fprintf(w, "}\n")
}
case *Switch:
fmt.Fprintf(w, "switch ")
fprint(w, n.Expr)
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
fmt.Fprintf(w, " {\n")
for _, n := range n.List {
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
fprint(w, n)
}
fmt.Fprintf(w, "}\n")
case *Case:
fmt.Fprintf(w, "case ")
fprint(w, n.Expr)
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
fmt.Fprintf(w, ":\n")
for _, n := range n.List {
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
fprint(w, n)
}
case *RuleRewrite:
if *addLine {
fmt.Fprintf(w, "// %s\n", n.Loc)
}
fmt.Fprintf(w, "// match: %s\n", n.Match)
if n.Cond != "" {
fmt.Fprintf(w, "// cond: %s\n", n.Cond)
}
fmt.Fprintf(w, "// result: %s\n", n.Result)
fmt.Fprintf(w, "for %s {\n", n.Check)
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
nCommutative := 0
for _, n := range n.List {
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
if b, ok := n.(*CondBreak); ok {
b.InsideCommuteLoop = nCommutative > 0
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
fprint(w, n)
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
if loop, ok := n.(StartCommuteLoop); ok {
if nCommutative != loop.Depth {
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
panic("mismatch commute loop depth")
}
nCommutative++
}
}
fmt.Fprintf(w, "return true\n")
for i := 0; i < nCommutative; i++ {
fmt.Fprintln(w, "}")
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
if n.CommuteDepth > 0 && n.CanFail {
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
fmt.Fprint(w, "break\n")
}
fmt.Fprintf(w, "}\n")
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
case *Declare:
fmt.Fprintf(w, "%s := ", n.Name)
fprint(w, n.Value)
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
fmt.Fprintln(w)
case *CondBreak:
fmt.Fprintf(w, "if ")
fprint(w, n.Cond)
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
fmt.Fprintf(w, " {\n")
if n.InsideCommuteLoop {
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
fmt.Fprintf(w, "continue")
} else {
fmt.Fprintf(w, "break")
}
fmt.Fprintf(w, "\n}\n")
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
case ast.Node:
printConfig.Fprint(w, emptyFset, n)
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
if _, ok := n.(ast.Stmt); ok {
fmt.Fprintln(w)
}
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
case StartCommuteLoop:
fmt.Fprintf(w, "for _i%[1]d := 0; _i%[1]d <= 1; _i%[1]d, %[2]s_0, %[2]s_1 = _i%[1]d + 1, %[2]s_1, %[2]s_0 {\n", n.Depth, n.V)
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
default:
log.Fatalf("cannot print %T", n)
}
}
var printConfig = printer.Config{
Mode: printer.RawFormat, // we use go/format later, so skip work here
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
var emptyFset = token.NewFileSet()
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
// Node can be a Statement or an ast.Expr.
type Node interface{}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
// Statement can be one of our high-level statement struct types, or an
// ast.Stmt under some limited circumstances.
type Statement interface{}
// BodyBase is shared by all of our statement pseudo-node types which can
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
// contain other statements.
type BodyBase struct {
List []Statement
CanFail bool
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
func (w *BodyBase) add(node Statement) {
var last Statement
if len(w.List) > 0 {
last = w.List[len(w.List)-1]
}
if node, ok := node.(*CondBreak); ok {
w.CanFail = true
if last, ok := last.(*CondBreak); ok {
// Add to the previous "if <cond> { break }" via a
// logical OR, which will save verbosity.
last.Cond = &ast.BinaryExpr{
Op: token.LOR,
X: last.Cond,
Y: node.Cond,
}
return
}
}
w.List = append(w.List, node)
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
// predeclared contains globally known tokens that should not be redefined.
var predeclared = map[string]bool{
"nil": true,
"false": true,
"true": true,
}
// declared reports if the body contains a Declare with the given name.
func (w *BodyBase) declared(name string) bool {
if predeclared[name] {
// Treat predeclared names as having already been declared.
// This lets us use nil to match an aux field or
// true and false to match an auxint field.
cmd/compile: convert 386 port to use addressing modes pass (take 2) Retrying CL 222782, with a fix that will hopefully stop the random crashing. The issue with the previous CL is that it does pointer arithmetic in a way that may briefly generate an out-of-bounds pointer. If an interrupt happens to occur in that state, the referenced object may be collected incorrectly. Suppose there was code that did s[x+c]. The previous CL had a rule to the effect of ptr + (x + c) -> c + (ptr + x). But ptr+x is not guaranteed to point to the same object as ptr. In contrast, ptr+(x+c) is guaranteed to point to the same object as ptr, because we would have already checked that x+c is in bounds. For example, strconv.trim used to have this code: MOVZX -0x1(BX)(DX*1), BP CMPL $0x30, AL After CL 222782, it had this code: LEAL 0(BX)(DX*1), BP CMPB $0x30, -0x1(BP) An interrupt between those last two instructions could see BP pointing outside the backing store of the slice involved. It's really hard to actually demonstrate a bug. First, you need to have an interrupt occur at exactly the right time. Then, there must be no other pointers to the object in question. Since the interrupted frame will be scanned conservatively, there can't even be a dead pointer in another register or on the stack. (In the example above, a bug can't happen because BX still holds the original pointer.) Then, the object in question needs to be collected (or at least scanned?) before the interrupted code continues. This CL needs to handle load combining somewhat differently than CL 222782 because of the new restriction on arithmetic. That's the only real difference (other than removing the bad rules) from that old CL. This bug is also present in the amd64 rewrite rules, and we haven't seen any crashing as a result. I will fix up that code similarly to this one in a separate CL. Update #37881 Change-Id: I5f0d584d9bef4696bfe89a61ef0a27c8d507329f Reviewed-on: https://go-review.googlesource.com/c/go/+/225798 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-24 13:39:44 -07:00
return true
}
for _, s := range w.List {
if decl, ok := s.(*Declare); ok && decl.Name == name {
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
return true
}
}
return false
}
// These types define some high-level statement struct types, which can be used
// as a Statement. This allows us to keep some node structs simpler, and have
// higher-level nodes such as an entire rule rewrite.
//
// Note that ast.Expr is always used as-is; we don't declare our own expression
// nodes.
type (
File struct {
BodyBase // []*Func
Arch arch
Suffix string
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
Func struct {
BodyBase
Kind string // "Value" or "Block"
Suffix string
ArgLen int32 // if kind == "Value", number of args for this op
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
Switch struct {
BodyBase // []*Case
Expr ast.Expr
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
Case struct {
BodyBase
Expr ast.Expr
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
RuleRewrite struct {
BodyBase
Match, Cond, Result string // top comments
Check string // top-level boolean expression
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
Alloc int // for unique var names
Loc string // file name & line number of the original rule
CommuteDepth int // used to track depth of commute loops
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
Declare struct {
Name string
Value ast.Expr
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
CondBreak struct {
Cond ast.Expr
InsideCommuteLoop bool
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
}
StartCommuteLoop struct {
Depth int
V string
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
)
// exprf parses a Go expression generated from fmt.Sprintf, panicking if an
// error occurs.
func exprf(format string, a ...interface{}) ast.Expr {
src := fmt.Sprintf(format, a...)
expr, err := parser.ParseExpr(src)
if err != nil {
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
log.Fatalf("expr parse error on %q: %v", src, err)
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
return expr
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
// stmtf parses a Go statement generated from fmt.Sprintf. This function is only
// meant for simple statements that don't have a custom Statement node declared
// in this package, such as ast.ReturnStmt or ast.ExprStmt.
func stmtf(format string, a ...interface{}) Statement {
src := fmt.Sprintf(format, a...)
fsrc := "package p\nfunc _() {\n" + src + "\n}\n"
file, err := parser.ParseFile(token.NewFileSet(), "", fsrc, 0)
if err != nil {
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
log.Fatalf("stmt parse error on %q: %v", src, err)
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
return file.Decls[0].(*ast.FuncDecl).Body.List[0]
}
cmd/compile: disallow rewrite rules from declaring reserved names If I change a rule in ARM64.rules to use the variable name "b" in a conflicting way, rulegen would previously not complain, and the compiler would later give a confusing error: $ go run *.go && go build cmd/compile/internal/ssa # cmd/compile/internal/ssa ../rewriteARM64.go:24236:10: b.NewValue0 undefined (type int64 has no field or method NewValue0) Make rulegen complain early about those cases. Sometimes they might happen to be harmless, but in general they can easily cause confusion or unintended effect due to shadowing. After the change, with the same conflicting rule: $ go run *.go && go build cmd/compile/internal/ssa 2021/03/22 11:31:49 rule ARM64.rules:495 uses the reserved name b exit status 1 Note that 24 existing rules were using reserved names. It seems like the shadowing was harmless, as it wasn't causing typechecking issues nor did it seem to cause unintended behavior when the rule rewrite code ran. The bool values "b" were renamed "t", since that seems to have a precedent in other rules and in the fmt package. Sequential values like "a b c" were renamed to "x y z", since "b" is reserved. Finally, "typ" was renamed to "_typ", since there doesn't seem to be an obviously better answer. Passes all three of: $ GOARCH=amd64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=arm64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=mips64 go build -toolexec 'toolstash -cmp' -a std Fixes #45154. Change-Id: I1cce194dc7b477886a9c218c17973e996bcedccf Reviewed-on: https://go-review.googlesource.com/c/go/+/303549 Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2021-03-22 11:35:02 +01:00
var reservedNames = map[string]bool{
"v": true, // Values[i], etc
"b": true, // v.Block
"config": true, // b.Func.Config
"fe": true, // b.Func.fe
"typ": true, // &b.Func.Config.Types
}
// declf constructs a simple "name := value" declaration,
// using exprf for its value.
//
// name must not be one of reservedNames.
// This helps prevent unintended shadowing and name clashes.
// To declare a reserved name, use declReserved.
func declf(loc, name, format string, a ...interface{}) *Declare {
if reservedNames[name] {
log.Fatalf("rule %s uses the reserved name %s", loc, name)
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
return &Declare{name, exprf(format, a...)}
}
cmd/compile: disallow rewrite rules from declaring reserved names If I change a rule in ARM64.rules to use the variable name "b" in a conflicting way, rulegen would previously not complain, and the compiler would later give a confusing error: $ go run *.go && go build cmd/compile/internal/ssa # cmd/compile/internal/ssa ../rewriteARM64.go:24236:10: b.NewValue0 undefined (type int64 has no field or method NewValue0) Make rulegen complain early about those cases. Sometimes they might happen to be harmless, but in general they can easily cause confusion or unintended effect due to shadowing. After the change, with the same conflicting rule: $ go run *.go && go build cmd/compile/internal/ssa 2021/03/22 11:31:49 rule ARM64.rules:495 uses the reserved name b exit status 1 Note that 24 existing rules were using reserved names. It seems like the shadowing was harmless, as it wasn't causing typechecking issues nor did it seem to cause unintended behavior when the rule rewrite code ran. The bool values "b" were renamed "t", since that seems to have a precedent in other rules and in the fmt package. Sequential values like "a b c" were renamed to "x y z", since "b" is reserved. Finally, "typ" was renamed to "_typ", since there doesn't seem to be an obviously better answer. Passes all three of: $ GOARCH=amd64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=arm64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=mips64 go build -toolexec 'toolstash -cmp' -a std Fixes #45154. Change-Id: I1cce194dc7b477886a9c218c17973e996bcedccf Reviewed-on: https://go-review.googlesource.com/c/go/+/303549 Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2021-03-22 11:35:02 +01:00
// declReserved is like declf, but the name must be one of reservedNames.
// Calls to declReserved should generally be static and top-level.
func declReserved(name, value string) *Declare {
if !reservedNames[name] {
panic(fmt.Sprintf("declReserved call does not use a reserved name: %q", name))
}
return &Declare{name, exprf(value)}
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
// breakf constructs a simple "if cond { break }" statement, using exprf for its
// condition.
func breakf(format string, a ...interface{}) *CondBreak {
return &CondBreak{Cond: exprf(format, a...)}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
func genBlockRewrite(rule Rule, arch arch, data blockData) *RuleRewrite {
rr := &RuleRewrite{Loc: rule.Loc}
rr.Match, rr.Cond, rr.Result = rule.parse()
_, _, auxint, aux, s := extract(rr.Match) // remove parens, then split
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
// check match of control values
if len(s) < data.controls {
log.Fatalf("incorrect number of arguments in %s, got %v wanted at least %v", rule, len(s), data.controls)
}
controls := s[:data.controls]
pos := make([]string, data.controls)
for i, arg := range controls {
cname := fmt.Sprintf("b.Controls[%v]", i)
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
if strings.Contains(arg, "(") {
vname, expr := splitNameExpr(arg)
if vname == "" {
vname = fmt.Sprintf("v_%v", i)
}
cmd/compile: disallow rewrite rules from declaring reserved names If I change a rule in ARM64.rules to use the variable name "b" in a conflicting way, rulegen would previously not complain, and the compiler would later give a confusing error: $ go run *.go && go build cmd/compile/internal/ssa # cmd/compile/internal/ssa ../rewriteARM64.go:24236:10: b.NewValue0 undefined (type int64 has no field or method NewValue0) Make rulegen complain early about those cases. Sometimes they might happen to be harmless, but in general they can easily cause confusion or unintended effect due to shadowing. After the change, with the same conflicting rule: $ go run *.go && go build cmd/compile/internal/ssa 2021/03/22 11:31:49 rule ARM64.rules:495 uses the reserved name b exit status 1 Note that 24 existing rules were using reserved names. It seems like the shadowing was harmless, as it wasn't causing typechecking issues nor did it seem to cause unintended behavior when the rule rewrite code ran. The bool values "b" were renamed "t", since that seems to have a precedent in other rules and in the fmt package. Sequential values like "a b c" were renamed to "x y z", since "b" is reserved. Finally, "typ" was renamed to "_typ", since there doesn't seem to be an obviously better answer. Passes all three of: $ GOARCH=amd64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=arm64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=mips64 go build -toolexec 'toolstash -cmp' -a std Fixes #45154. Change-Id: I1cce194dc7b477886a9c218c17973e996bcedccf Reviewed-on: https://go-review.googlesource.com/c/go/+/303549 Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2021-03-22 11:35:02 +01:00
rr.add(declf(rr.Loc, vname, cname))
p, op := genMatch0(rr, arch, expr, vname, nil, false) // TODO: pass non-nil cnt?
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
if op != "" {
check := fmt.Sprintf("%s.Op == %s", cname, op)
if rr.Check == "" {
rr.Check = check
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
} else {
rr.Check += " && " + check
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
}
}
if p == "" {
p = vname + ".Pos"
}
pos[i] = p
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
} else {
cmd/compile: disallow rewrite rules from declaring reserved names If I change a rule in ARM64.rules to use the variable name "b" in a conflicting way, rulegen would previously not complain, and the compiler would later give a confusing error: $ go run *.go && go build cmd/compile/internal/ssa # cmd/compile/internal/ssa ../rewriteARM64.go:24236:10: b.NewValue0 undefined (type int64 has no field or method NewValue0) Make rulegen complain early about those cases. Sometimes they might happen to be harmless, but in general they can easily cause confusion or unintended effect due to shadowing. After the change, with the same conflicting rule: $ go run *.go && go build cmd/compile/internal/ssa 2021/03/22 11:31:49 rule ARM64.rules:495 uses the reserved name b exit status 1 Note that 24 existing rules were using reserved names. It seems like the shadowing was harmless, as it wasn't causing typechecking issues nor did it seem to cause unintended behavior when the rule rewrite code ran. The bool values "b" were renamed "t", since that seems to have a precedent in other rules and in the fmt package. Sequential values like "a b c" were renamed to "x y z", since "b" is reserved. Finally, "typ" was renamed to "_typ", since there doesn't seem to be an obviously better answer. Passes all three of: $ GOARCH=amd64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=arm64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=mips64 go build -toolexec 'toolstash -cmp' -a std Fixes #45154. Change-Id: I1cce194dc7b477886a9c218c17973e996bcedccf Reviewed-on: https://go-review.googlesource.com/c/go/+/303549 Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2021-03-22 11:35:02 +01:00
rr.add(declf(rr.Loc, arg, cname))
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
pos[i] = arg + ".Pos"
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
}
cmd/compile: add SSA rules for s390x compare-and-branch instructions This commit adds SSA rules for the s390x combined compare-and-branch instructions. These have a shorter encoding than separate compare and branch instructions and they also don't clobber the condition code (a.k.a. flag register) reducing pressure on the flag allocator. I have deleted the 'loop_test.go' file and replaced it with a new codegen test which performs a wider range of checks. Object sizes from compilebench: name old object-bytes new object-bytes delta Template 562kB ± 0% 561kB ± 0% -0.28% (p=0.000 n=10+10) Unicode 217kB ± 0% 217kB ± 0% -0.17% (p=0.000 n=10+10) GoTypes 2.03MB ± 0% 2.02MB ± 0% -0.59% (p=0.000 n=10+10) Compiler 8.16MB ± 0% 8.11MB ± 0% -0.62% (p=0.000 n=10+10) SSA 27.4MB ± 0% 27.0MB ± 0% -1.45% (p=0.000 n=10+10) Flate 356kB ± 0% 356kB ± 0% -0.12% (p=0.000 n=10+10) GoParser 438kB ± 0% 436kB ± 0% -0.51% (p=0.000 n=10+10) Reflect 1.37MB ± 0% 1.37MB ± 0% -0.42% (p=0.000 n=10+10) Tar 485kB ± 0% 483kB ± 0% -0.39% (p=0.000 n=10+10) XML 630kB ± 0% 621kB ± 0% -1.45% (p=0.000 n=10+10) [Geo mean] 1.14MB 1.13MB -0.60% name old text-bytes new text-bytes delta HelloSize 763kB ± 0% 754kB ± 0% -1.30% (p=0.000 n=10+10) CmdGoSize 10.7MB ± 0% 10.6MB ± 0% -0.91% (p=0.000 n=10+10) [Geo mean] 2.86MB 2.82MB -1.10% Change-Id: Ibca55d9c0aa1254aee69433731ab5d26a43a7c18 Reviewed-on: https://go-review.googlesource.com/c/go/+/198037 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-09-17 07:29:31 -07:00
for _, e := range []struct {
name, field, dclType string
cmd/compile: add SSA rules for s390x compare-and-branch instructions This commit adds SSA rules for the s390x combined compare-and-branch instructions. These have a shorter encoding than separate compare and branch instructions and they also don't clobber the condition code (a.k.a. flag register) reducing pressure on the flag allocator. I have deleted the 'loop_test.go' file and replaced it with a new codegen test which performs a wider range of checks. Object sizes from compilebench: name old object-bytes new object-bytes delta Template 562kB ± 0% 561kB ± 0% -0.28% (p=0.000 n=10+10) Unicode 217kB ± 0% 217kB ± 0% -0.17% (p=0.000 n=10+10) GoTypes 2.03MB ± 0% 2.02MB ± 0% -0.59% (p=0.000 n=10+10) Compiler 8.16MB ± 0% 8.11MB ± 0% -0.62% (p=0.000 n=10+10) SSA 27.4MB ± 0% 27.0MB ± 0% -1.45% (p=0.000 n=10+10) Flate 356kB ± 0% 356kB ± 0% -0.12% (p=0.000 n=10+10) GoParser 438kB ± 0% 436kB ± 0% -0.51% (p=0.000 n=10+10) Reflect 1.37MB ± 0% 1.37MB ± 0% -0.42% (p=0.000 n=10+10) Tar 485kB ± 0% 483kB ± 0% -0.39% (p=0.000 n=10+10) XML 630kB ± 0% 621kB ± 0% -1.45% (p=0.000 n=10+10) [Geo mean] 1.14MB 1.13MB -0.60% name old text-bytes new text-bytes delta HelloSize 763kB ± 0% 754kB ± 0% -1.30% (p=0.000 n=10+10) CmdGoSize 10.7MB ± 0% 10.6MB ± 0% -0.91% (p=0.000 n=10+10) [Geo mean] 2.86MB 2.82MB -1.10% Change-Id: Ibca55d9c0aa1254aee69433731ab5d26a43a7c18 Reviewed-on: https://go-review.googlesource.com/c/go/+/198037 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-09-17 07:29:31 -07:00
}{
{auxint, "AuxInt", data.auxIntType()},
{aux, "Aux", data.auxType()},
cmd/compile: add SSA rules for s390x compare-and-branch instructions This commit adds SSA rules for the s390x combined compare-and-branch instructions. These have a shorter encoding than separate compare and branch instructions and they also don't clobber the condition code (a.k.a. flag register) reducing pressure on the flag allocator. I have deleted the 'loop_test.go' file and replaced it with a new codegen test which performs a wider range of checks. Object sizes from compilebench: name old object-bytes new object-bytes delta Template 562kB ± 0% 561kB ± 0% -0.28% (p=0.000 n=10+10) Unicode 217kB ± 0% 217kB ± 0% -0.17% (p=0.000 n=10+10) GoTypes 2.03MB ± 0% 2.02MB ± 0% -0.59% (p=0.000 n=10+10) Compiler 8.16MB ± 0% 8.11MB ± 0% -0.62% (p=0.000 n=10+10) SSA 27.4MB ± 0% 27.0MB ± 0% -1.45% (p=0.000 n=10+10) Flate 356kB ± 0% 356kB ± 0% -0.12% (p=0.000 n=10+10) GoParser 438kB ± 0% 436kB ± 0% -0.51% (p=0.000 n=10+10) Reflect 1.37MB ± 0% 1.37MB ± 0% -0.42% (p=0.000 n=10+10) Tar 485kB ± 0% 483kB ± 0% -0.39% (p=0.000 n=10+10) XML 630kB ± 0% 621kB ± 0% -1.45% (p=0.000 n=10+10) [Geo mean] 1.14MB 1.13MB -0.60% name old text-bytes new text-bytes delta HelloSize 763kB ± 0% 754kB ± 0% -1.30% (p=0.000 n=10+10) CmdGoSize 10.7MB ± 0% 10.6MB ± 0% -0.91% (p=0.000 n=10+10) [Geo mean] 2.86MB 2.82MB -1.10% Change-Id: Ibca55d9c0aa1254aee69433731ab5d26a43a7c18 Reviewed-on: https://go-review.googlesource.com/c/go/+/198037 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-09-17 07:29:31 -07:00
} {
if e.name == "" {
continue
}
if e.dclType == "" {
log.Fatalf("op %s has no declared type for %s", data.name, e.field)
}
cmd/compile: add SSA rules for s390x compare-and-branch instructions This commit adds SSA rules for the s390x combined compare-and-branch instructions. These have a shorter encoding than separate compare and branch instructions and they also don't clobber the condition code (a.k.a. flag register) reducing pressure on the flag allocator. I have deleted the 'loop_test.go' file and replaced it with a new codegen test which performs a wider range of checks. Object sizes from compilebench: name old object-bytes new object-bytes delta Template 562kB ± 0% 561kB ± 0% -0.28% (p=0.000 n=10+10) Unicode 217kB ± 0% 217kB ± 0% -0.17% (p=0.000 n=10+10) GoTypes 2.03MB ± 0% 2.02MB ± 0% -0.59% (p=0.000 n=10+10) Compiler 8.16MB ± 0% 8.11MB ± 0% -0.62% (p=0.000 n=10+10) SSA 27.4MB ± 0% 27.0MB ± 0% -1.45% (p=0.000 n=10+10) Flate 356kB ± 0% 356kB ± 0% -0.12% (p=0.000 n=10+10) GoParser 438kB ± 0% 436kB ± 0% -0.51% (p=0.000 n=10+10) Reflect 1.37MB ± 0% 1.37MB ± 0% -0.42% (p=0.000 n=10+10) Tar 485kB ± 0% 483kB ± 0% -0.39% (p=0.000 n=10+10) XML 630kB ± 0% 621kB ± 0% -1.45% (p=0.000 n=10+10) [Geo mean] 1.14MB 1.13MB -0.60% name old text-bytes new text-bytes delta HelloSize 763kB ± 0% 754kB ± 0% -1.30% (p=0.000 n=10+10) CmdGoSize 10.7MB ± 0% 10.6MB ± 0% -0.91% (p=0.000 n=10+10) [Geo mean] 2.86MB 2.82MB -1.10% Change-Id: Ibca55d9c0aa1254aee69433731ab5d26a43a7c18 Reviewed-on: https://go-review.googlesource.com/c/go/+/198037 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-09-17 07:29:31 -07:00
if !token.IsIdentifier(e.name) || rr.declared(e.name) {
rr.add(breakf("%sTo%s(b.%s) != %s", unTitle(e.field), title(e.dclType), e.field, e.name))
cmd/compile: add SSA rules for s390x compare-and-branch instructions This commit adds SSA rules for the s390x combined compare-and-branch instructions. These have a shorter encoding than separate compare and branch instructions and they also don't clobber the condition code (a.k.a. flag register) reducing pressure on the flag allocator. I have deleted the 'loop_test.go' file and replaced it with a new codegen test which performs a wider range of checks. Object sizes from compilebench: name old object-bytes new object-bytes delta Template 562kB ± 0% 561kB ± 0% -0.28% (p=0.000 n=10+10) Unicode 217kB ± 0% 217kB ± 0% -0.17% (p=0.000 n=10+10) GoTypes 2.03MB ± 0% 2.02MB ± 0% -0.59% (p=0.000 n=10+10) Compiler 8.16MB ± 0% 8.11MB ± 0% -0.62% (p=0.000 n=10+10) SSA 27.4MB ± 0% 27.0MB ± 0% -1.45% (p=0.000 n=10+10) Flate 356kB ± 0% 356kB ± 0% -0.12% (p=0.000 n=10+10) GoParser 438kB ± 0% 436kB ± 0% -0.51% (p=0.000 n=10+10) Reflect 1.37MB ± 0% 1.37MB ± 0% -0.42% (p=0.000 n=10+10) Tar 485kB ± 0% 483kB ± 0% -0.39% (p=0.000 n=10+10) XML 630kB ± 0% 621kB ± 0% -1.45% (p=0.000 n=10+10) [Geo mean] 1.14MB 1.13MB -0.60% name old text-bytes new text-bytes delta HelloSize 763kB ± 0% 754kB ± 0% -1.30% (p=0.000 n=10+10) CmdGoSize 10.7MB ± 0% 10.6MB ± 0% -0.91% (p=0.000 n=10+10) [Geo mean] 2.86MB 2.82MB -1.10% Change-Id: Ibca55d9c0aa1254aee69433731ab5d26a43a7c18 Reviewed-on: https://go-review.googlesource.com/c/go/+/198037 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-09-17 07:29:31 -07:00
} else {
cmd/compile: disallow rewrite rules from declaring reserved names If I change a rule in ARM64.rules to use the variable name "b" in a conflicting way, rulegen would previously not complain, and the compiler would later give a confusing error: $ go run *.go && go build cmd/compile/internal/ssa # cmd/compile/internal/ssa ../rewriteARM64.go:24236:10: b.NewValue0 undefined (type int64 has no field or method NewValue0) Make rulegen complain early about those cases. Sometimes they might happen to be harmless, but in general they can easily cause confusion or unintended effect due to shadowing. After the change, with the same conflicting rule: $ go run *.go && go build cmd/compile/internal/ssa 2021/03/22 11:31:49 rule ARM64.rules:495 uses the reserved name b exit status 1 Note that 24 existing rules were using reserved names. It seems like the shadowing was harmless, as it wasn't causing typechecking issues nor did it seem to cause unintended behavior when the rule rewrite code ran. The bool values "b" were renamed "t", since that seems to have a precedent in other rules and in the fmt package. Sequential values like "a b c" were renamed to "x y z", since "b" is reserved. Finally, "typ" was renamed to "_typ", since there doesn't seem to be an obviously better answer. Passes all three of: $ GOARCH=amd64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=arm64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=mips64 go build -toolexec 'toolstash -cmp' -a std Fixes #45154. Change-Id: I1cce194dc7b477886a9c218c17973e996bcedccf Reviewed-on: https://go-review.googlesource.com/c/go/+/303549 Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2021-03-22 11:35:02 +01:00
rr.add(declf(rr.Loc, e.name, "%sTo%s(b.%s)", unTitle(e.field), title(e.dclType), e.field))
cmd/compile: add SSA rules for s390x compare-and-branch instructions This commit adds SSA rules for the s390x combined compare-and-branch instructions. These have a shorter encoding than separate compare and branch instructions and they also don't clobber the condition code (a.k.a. flag register) reducing pressure on the flag allocator. I have deleted the 'loop_test.go' file and replaced it with a new codegen test which performs a wider range of checks. Object sizes from compilebench: name old object-bytes new object-bytes delta Template 562kB ± 0% 561kB ± 0% -0.28% (p=0.000 n=10+10) Unicode 217kB ± 0% 217kB ± 0% -0.17% (p=0.000 n=10+10) GoTypes 2.03MB ± 0% 2.02MB ± 0% -0.59% (p=0.000 n=10+10) Compiler 8.16MB ± 0% 8.11MB ± 0% -0.62% (p=0.000 n=10+10) SSA 27.4MB ± 0% 27.0MB ± 0% -1.45% (p=0.000 n=10+10) Flate 356kB ± 0% 356kB ± 0% -0.12% (p=0.000 n=10+10) GoParser 438kB ± 0% 436kB ± 0% -0.51% (p=0.000 n=10+10) Reflect 1.37MB ± 0% 1.37MB ± 0% -0.42% (p=0.000 n=10+10) Tar 485kB ± 0% 483kB ± 0% -0.39% (p=0.000 n=10+10) XML 630kB ± 0% 621kB ± 0% -1.45% (p=0.000 n=10+10) [Geo mean] 1.14MB 1.13MB -0.60% name old text-bytes new text-bytes delta HelloSize 763kB ± 0% 754kB ± 0% -1.30% (p=0.000 n=10+10) CmdGoSize 10.7MB ± 0% 10.6MB ± 0% -0.91% (p=0.000 n=10+10) [Geo mean] 2.86MB 2.82MB -1.10% Change-Id: Ibca55d9c0aa1254aee69433731ab5d26a43a7c18 Reviewed-on: https://go-review.googlesource.com/c/go/+/198037 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-09-17 07:29:31 -07:00
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
if rr.Cond != "" {
rr.add(breakf("!(%s)", rr.Cond))
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
// Rule matches. Generate result.
outop, _, auxint, aux, t := extract(rr.Result) // remove parens, then split
blockName, outdata := getBlockInfo(outop, arch)
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
if len(t) < outdata.controls {
log.Fatalf("incorrect number of output arguments in %s, got %v wanted at least %v", rule, len(s), outdata.controls)
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
// Check if newsuccs is the same set as succs.
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
succs := s[data.controls:]
newsuccs := t[outdata.controls:]
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
m := map[string]bool{}
for _, succ := range succs {
if m[succ] {
log.Fatalf("can't have a repeat successor name %s in %s", succ, rule)
}
m[succ] = true
}
for _, succ := range newsuccs {
if !m[succ] {
log.Fatalf("unknown successor %s in %s", succ, rule)
}
delete(m, succ)
}
if len(m) != 0 {
log.Fatalf("unmatched successors %v in %s", m, rule)
}
var genControls [2]string
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
for i, control := range t[:outdata.controls] {
// Select a source position for any new control values.
// TODO: does it always make sense to use the source position
// of the original control values or should we be using the
// block's source position in some cases?
newpos := "b.Pos" // default to block's source position
if i < len(pos) && pos[i] != "" {
// Use the previous control value's source position.
newpos = pos[i]
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
// Generate a new control value (or copy an existing value).
genControls[i] = genResult0(rr, arch, control, false, false, newpos, nil)
}
switch outdata.controls {
case 0:
rr.add(stmtf("b.Reset(%s)", blockName))
case 1:
rr.add(stmtf("b.resetWithControl(%s, %s)", blockName, genControls[0]))
case 2:
rr.add(stmtf("b.resetWithControl2(%s, %s, %s)", blockName, genControls[0], genControls[1]))
default:
log.Fatalf("too many controls: %d", outdata.controls)
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
cmd/compile: add SSA rules for s390x compare-and-branch instructions This commit adds SSA rules for the s390x combined compare-and-branch instructions. These have a shorter encoding than separate compare and branch instructions and they also don't clobber the condition code (a.k.a. flag register) reducing pressure on the flag allocator. I have deleted the 'loop_test.go' file and replaced it with a new codegen test which performs a wider range of checks. Object sizes from compilebench: name old object-bytes new object-bytes delta Template 562kB ± 0% 561kB ± 0% -0.28% (p=0.000 n=10+10) Unicode 217kB ± 0% 217kB ± 0% -0.17% (p=0.000 n=10+10) GoTypes 2.03MB ± 0% 2.02MB ± 0% -0.59% (p=0.000 n=10+10) Compiler 8.16MB ± 0% 8.11MB ± 0% -0.62% (p=0.000 n=10+10) SSA 27.4MB ± 0% 27.0MB ± 0% -1.45% (p=0.000 n=10+10) Flate 356kB ± 0% 356kB ± 0% -0.12% (p=0.000 n=10+10) GoParser 438kB ± 0% 436kB ± 0% -0.51% (p=0.000 n=10+10) Reflect 1.37MB ± 0% 1.37MB ± 0% -0.42% (p=0.000 n=10+10) Tar 485kB ± 0% 483kB ± 0% -0.39% (p=0.000 n=10+10) XML 630kB ± 0% 621kB ± 0% -1.45% (p=0.000 n=10+10) [Geo mean] 1.14MB 1.13MB -0.60% name old text-bytes new text-bytes delta HelloSize 763kB ± 0% 754kB ± 0% -1.30% (p=0.000 n=10+10) CmdGoSize 10.7MB ± 0% 10.6MB ± 0% -0.91% (p=0.000 n=10+10) [Geo mean] 2.86MB 2.82MB -1.10% Change-Id: Ibca55d9c0aa1254aee69433731ab5d26a43a7c18 Reviewed-on: https://go-review.googlesource.com/c/go/+/198037 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-09-17 07:29:31 -07:00
if auxint != "" {
// Make sure auxint value has the right type.
rr.add(stmtf("b.AuxInt = %sToAuxInt(%s)", unTitle(outdata.auxIntType()), auxint))
cmd/compile: add SSA rules for s390x compare-and-branch instructions This commit adds SSA rules for the s390x combined compare-and-branch instructions. These have a shorter encoding than separate compare and branch instructions and they also don't clobber the condition code (a.k.a. flag register) reducing pressure on the flag allocator. I have deleted the 'loop_test.go' file and replaced it with a new codegen test which performs a wider range of checks. Object sizes from compilebench: name old object-bytes new object-bytes delta Template 562kB ± 0% 561kB ± 0% -0.28% (p=0.000 n=10+10) Unicode 217kB ± 0% 217kB ± 0% -0.17% (p=0.000 n=10+10) GoTypes 2.03MB ± 0% 2.02MB ± 0% -0.59% (p=0.000 n=10+10) Compiler 8.16MB ± 0% 8.11MB ± 0% -0.62% (p=0.000 n=10+10) SSA 27.4MB ± 0% 27.0MB ± 0% -1.45% (p=0.000 n=10+10) Flate 356kB ± 0% 356kB ± 0% -0.12% (p=0.000 n=10+10) GoParser 438kB ± 0% 436kB ± 0% -0.51% (p=0.000 n=10+10) Reflect 1.37MB ± 0% 1.37MB ± 0% -0.42% (p=0.000 n=10+10) Tar 485kB ± 0% 483kB ± 0% -0.39% (p=0.000 n=10+10) XML 630kB ± 0% 621kB ± 0% -1.45% (p=0.000 n=10+10) [Geo mean] 1.14MB 1.13MB -0.60% name old text-bytes new text-bytes delta HelloSize 763kB ± 0% 754kB ± 0% -1.30% (p=0.000 n=10+10) CmdGoSize 10.7MB ± 0% 10.6MB ± 0% -0.91% (p=0.000 n=10+10) [Geo mean] 2.86MB 2.82MB -1.10% Change-Id: Ibca55d9c0aa1254aee69433731ab5d26a43a7c18 Reviewed-on: https://go-review.googlesource.com/c/go/+/198037 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-09-17 07:29:31 -07:00
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
if aux != "" {
// Make sure aux value has the right type.
rr.add(stmtf("b.Aux = %sToAux(%s)", unTitle(outdata.auxType()), aux))
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
succChanged := false
for i := 0; i < len(succs); i++ {
if succs[i] != newsuccs[i] {
succChanged = true
}
}
if succChanged {
if len(succs) != 2 {
log.Fatalf("changed successors, len!=2 in %s", rule)
}
if succs[0] != newsuccs[1] || succs[1] != newsuccs[0] {
log.Fatalf("can only handle swapped successors in %s", rule)
}
rr.add(stmtf("b.swapSuccessors()"))
}
if *genLog {
rr.add(stmtf("logRule(%q)", rule.Loc))
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
}
return rr
}
// genMatch returns the variable whose source position should be used for the
// result (or "" if no opinion), and a boolean that reports whether the match can fail.
cmd/compile: reduce bounds checks in generated rewrite rules CL 213703 converted generated rewrite rules for commutative ops to use loops instead of duplicated code. However, it loaded args using expressions like v.Args[i] and v.Args[i^1], which the compiler could not eliminate bounds for (including with all outstanding prove CLs). Also, given a series of separate rewrite rules for the same op, we generated bounds checks for every rewrite rule, even though we were repeatedly loading the same set of args. This change reduces both sets of bounds checks. Instead of loading v.Args[i] and v.Args[i^1] for commutative loops, we now preload v.Args[0] and v.Args[1] into local variables, and then swap them (as needed) in the commutative loop post statement. And we now load all top level v.Args into local variables at the beginning of every rewrite rule function. The second optimization is the more significant, but the first helps a little, and they play together nicely from the perspective of generating the code. This does increase register pressure, but the reduced bounds checks more than compensate. Note that the vast majority of rewrite rules evaluated are not applied, so the prologue is the most important part of the rewrite rules. There is one subtle aspect to the new generated code. Because the top level v.Args are shared across rewrite rules, and rule evaluation can swap v_0 and v_1, v_0 and v_1 can end up being swapped from one rule to the next. That is OK, because any time a rule does not get applied, they will have been swapped exactly twice. Passes toolstash-check -all. name old time/op new time/op delta Template 213ms ± 2% 211ms ± 2% -0.85% (p=0.000 n=92+96) Unicode 83.5ms ± 2% 83.2ms ± 2% -0.41% (p=0.004 n=95+90) GoTypes 737ms ± 2% 733ms ± 2% -0.51% (p=0.000 n=91+94) Compiler 3.45s ± 2% 3.43s ± 2% -0.44% (p=0.000 n=99+100) SSA 8.54s ± 1% 8.32s ± 2% -2.56% (p=0.000 n=96+99) Flate 136ms ± 2% 135ms ± 1% -0.47% (p=0.000 n=96+96) GoParser 169ms ± 1% 168ms ± 1% -0.33% (p=0.000 n=96+93) Reflect 456ms ± 3% 455ms ± 3% ~ (p=0.261 n=95+94) Tar 186ms ± 2% 185ms ± 2% -0.48% (p=0.000 n=94+95) XML 251ms ± 1% 250ms ± 1% -0.51% (p=0.000 n=91+94) [Geo mean] 424ms 421ms -0.68% name old user-time/op new user-time/op delta Template 275ms ± 1% 274ms ± 2% -0.55% (p=0.000 n=95+98) Unicode 118ms ± 4% 118ms ± 4% ~ (p=0.642 n=98+90) GoTypes 983ms ± 1% 980ms ± 1% -0.30% (p=0.000 n=93+93) Compiler 4.56s ± 6% 4.52s ± 6% -0.72% (p=0.003 n=100+100) SSA 11.4s ± 1% 11.1s ± 1% -2.50% (p=0.000 n=96+97) Flate 168ms ± 1% 167ms ± 1% -0.49% (p=0.000 n=92+92) GoParser 204ms ± 1% 204ms ± 2% -0.27% (p=0.003 n=99+96) Reflect 599ms ± 2% 598ms ± 2% ~ (p=0.116 n=95+92) Tar 227ms ± 2% 225ms ± 2% -0.57% (p=0.000 n=95+98) XML 313ms ± 2% 312ms ± 1% -0.37% (p=0.000 n=89+95) [Geo mean] 547ms 544ms -0.61% file before after Δ % compile 21113112 21109016 -4096 -0.019% total 131704940 131700844 -4096 -0.003% Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956 Reviewed-on: https://go-review.googlesource.com/c/go/+/216218 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-20 20:09:41 -08:00
func genMatch(rr *RuleRewrite, arch arch, match string, pregenTop bool) (pos, checkOp string) {
cnt := varCount(rr)
cmd/compile: reduce bounds checks in generated rewrite rules CL 213703 converted generated rewrite rules for commutative ops to use loops instead of duplicated code. However, it loaded args using expressions like v.Args[i] and v.Args[i^1], which the compiler could not eliminate bounds for (including with all outstanding prove CLs). Also, given a series of separate rewrite rules for the same op, we generated bounds checks for every rewrite rule, even though we were repeatedly loading the same set of args. This change reduces both sets of bounds checks. Instead of loading v.Args[i] and v.Args[i^1] for commutative loops, we now preload v.Args[0] and v.Args[1] into local variables, and then swap them (as needed) in the commutative loop post statement. And we now load all top level v.Args into local variables at the beginning of every rewrite rule function. The second optimization is the more significant, but the first helps a little, and they play together nicely from the perspective of generating the code. This does increase register pressure, but the reduced bounds checks more than compensate. Note that the vast majority of rewrite rules evaluated are not applied, so the prologue is the most important part of the rewrite rules. There is one subtle aspect to the new generated code. Because the top level v.Args are shared across rewrite rules, and rule evaluation can swap v_0 and v_1, v_0 and v_1 can end up being swapped from one rule to the next. That is OK, because any time a rule does not get applied, they will have been swapped exactly twice. Passes toolstash-check -all. name old time/op new time/op delta Template 213ms ± 2% 211ms ± 2% -0.85% (p=0.000 n=92+96) Unicode 83.5ms ± 2% 83.2ms ± 2% -0.41% (p=0.004 n=95+90) GoTypes 737ms ± 2% 733ms ± 2% -0.51% (p=0.000 n=91+94) Compiler 3.45s ± 2% 3.43s ± 2% -0.44% (p=0.000 n=99+100) SSA 8.54s ± 1% 8.32s ± 2% -2.56% (p=0.000 n=96+99) Flate 136ms ± 2% 135ms ± 1% -0.47% (p=0.000 n=96+96) GoParser 169ms ± 1% 168ms ± 1% -0.33% (p=0.000 n=96+93) Reflect 456ms ± 3% 455ms ± 3% ~ (p=0.261 n=95+94) Tar 186ms ± 2% 185ms ± 2% -0.48% (p=0.000 n=94+95) XML 251ms ± 1% 250ms ± 1% -0.51% (p=0.000 n=91+94) [Geo mean] 424ms 421ms -0.68% name old user-time/op new user-time/op delta Template 275ms ± 1% 274ms ± 2% -0.55% (p=0.000 n=95+98) Unicode 118ms ± 4% 118ms ± 4% ~ (p=0.642 n=98+90) GoTypes 983ms ± 1% 980ms ± 1% -0.30% (p=0.000 n=93+93) Compiler 4.56s ± 6% 4.52s ± 6% -0.72% (p=0.003 n=100+100) SSA 11.4s ± 1% 11.1s ± 1% -2.50% (p=0.000 n=96+97) Flate 168ms ± 1% 167ms ± 1% -0.49% (p=0.000 n=92+92) GoParser 204ms ± 1% 204ms ± 2% -0.27% (p=0.003 n=99+96) Reflect 599ms ± 2% 598ms ± 2% ~ (p=0.116 n=95+92) Tar 227ms ± 2% 225ms ± 2% -0.57% (p=0.000 n=95+98) XML 313ms ± 2% 312ms ± 1% -0.37% (p=0.000 n=89+95) [Geo mean] 547ms 544ms -0.61% file before after Δ % compile 21113112 21109016 -4096 -0.019% total 131704940 131700844 -4096 -0.003% Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956 Reviewed-on: https://go-review.googlesource.com/c/go/+/216218 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-20 20:09:41 -08:00
return genMatch0(rr, arch, match, "v", cnt, pregenTop)
}
cmd/compile: reduce bounds checks in generated rewrite rules CL 213703 converted generated rewrite rules for commutative ops to use loops instead of duplicated code. However, it loaded args using expressions like v.Args[i] and v.Args[i^1], which the compiler could not eliminate bounds for (including with all outstanding prove CLs). Also, given a series of separate rewrite rules for the same op, we generated bounds checks for every rewrite rule, even though we were repeatedly loading the same set of args. This change reduces both sets of bounds checks. Instead of loading v.Args[i] and v.Args[i^1] for commutative loops, we now preload v.Args[0] and v.Args[1] into local variables, and then swap them (as needed) in the commutative loop post statement. And we now load all top level v.Args into local variables at the beginning of every rewrite rule function. The second optimization is the more significant, but the first helps a little, and they play together nicely from the perspective of generating the code. This does increase register pressure, but the reduced bounds checks more than compensate. Note that the vast majority of rewrite rules evaluated are not applied, so the prologue is the most important part of the rewrite rules. There is one subtle aspect to the new generated code. Because the top level v.Args are shared across rewrite rules, and rule evaluation can swap v_0 and v_1, v_0 and v_1 can end up being swapped from one rule to the next. That is OK, because any time a rule does not get applied, they will have been swapped exactly twice. Passes toolstash-check -all. name old time/op new time/op delta Template 213ms ± 2% 211ms ± 2% -0.85% (p=0.000 n=92+96) Unicode 83.5ms ± 2% 83.2ms ± 2% -0.41% (p=0.004 n=95+90) GoTypes 737ms ± 2% 733ms ± 2% -0.51% (p=0.000 n=91+94) Compiler 3.45s ± 2% 3.43s ± 2% -0.44% (p=0.000 n=99+100) SSA 8.54s ± 1% 8.32s ± 2% -2.56% (p=0.000 n=96+99) Flate 136ms ± 2% 135ms ± 1% -0.47% (p=0.000 n=96+96) GoParser 169ms ± 1% 168ms ± 1% -0.33% (p=0.000 n=96+93) Reflect 456ms ± 3% 455ms ± 3% ~ (p=0.261 n=95+94) Tar 186ms ± 2% 185ms ± 2% -0.48% (p=0.000 n=94+95) XML 251ms ± 1% 250ms ± 1% -0.51% (p=0.000 n=91+94) [Geo mean] 424ms 421ms -0.68% name old user-time/op new user-time/op delta Template 275ms ± 1% 274ms ± 2% -0.55% (p=0.000 n=95+98) Unicode 118ms ± 4% 118ms ± 4% ~ (p=0.642 n=98+90) GoTypes 983ms ± 1% 980ms ± 1% -0.30% (p=0.000 n=93+93) Compiler 4.56s ± 6% 4.52s ± 6% -0.72% (p=0.003 n=100+100) SSA 11.4s ± 1% 11.1s ± 1% -2.50% (p=0.000 n=96+97) Flate 168ms ± 1% 167ms ± 1% -0.49% (p=0.000 n=92+92) GoParser 204ms ± 1% 204ms ± 2% -0.27% (p=0.003 n=99+96) Reflect 599ms ± 2% 598ms ± 2% ~ (p=0.116 n=95+92) Tar 227ms ± 2% 225ms ± 2% -0.57% (p=0.000 n=95+98) XML 313ms ± 2% 312ms ± 1% -0.37% (p=0.000 n=89+95) [Geo mean] 547ms 544ms -0.61% file before after Δ % compile 21113112 21109016 -4096 -0.019% total 131704940 131700844 -4096 -0.003% Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956 Reviewed-on: https://go-review.googlesource.com/c/go/+/216218 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-20 20:09:41 -08:00
func genMatch0(rr *RuleRewrite, arch arch, match, v string, cnt map[string]int, pregenTop bool) (pos, checkOp string) {
if match[0] != '(' || match[len(match)-1] != ')' {
log.Fatalf("%s: non-compound expr in genMatch0: %q", rr.Loc, match)
}
op, oparch, typ, auxint, aux, args := parseValue(match, arch, rr.Loc)
checkOp = fmt.Sprintf("Op%s%s", oparch, op.name)
if op.faultOnNilArg0 || op.faultOnNilArg1 {
// Prefer the position of an instruction which could fault.
pos = v + ".Pos"
}
// If the last argument is ___, it means "don't care about trailing arguments, really"
// The likely/intended use is for rewrites that are too tricky to express in the existing pattern language
// Do a length check early because long patterns fed short (ultimately not-matching) inputs will
// do an indexing error in pattern-matching.
if op.argLength == -1 {
l := len(args)
if l == 0 || args[l-1] != "___" {
rr.add(breakf("len(%s.Args) != %d", v, l))
} else if l > 1 && args[l-1] == "___" {
rr.add(breakf("len(%s.Args) < %d", v, l-1))
}
}
for _, e := range []struct {
cmd/compile: start implementing strongly typed aux and auxint fields Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-19 16:25:08 -07:00
name, field, dclType string
}{
cmd/compile: start implementing strongly typed aux and auxint fields Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-19 16:25:08 -07:00
{typ, "Type", "*types.Type"},
{auxint, "AuxInt", op.auxIntType()},
{aux, "Aux", op.auxType()},
} {
if e.name == "" {
continue
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
}
cmd/compile: start implementing strongly typed aux and auxint fields Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-19 16:25:08 -07:00
if e.dclType == "" {
log.Fatalf("op %s has no declared type for %s", op.name, e.field)
}
if !token.IsIdentifier(e.name) || rr.declared(e.name) {
cmd/compile: start implementing strongly typed aux and auxint fields Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-19 16:25:08 -07:00
switch e.field {
case "Aux":
rr.add(breakf("auxTo%s(%s.%s) != %s", title(e.dclType), v, e.field, e.name))
cmd/compile: start implementing strongly typed aux and auxint fields Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-19 16:25:08 -07:00
case "AuxInt":
rr.add(breakf("auxIntTo%s(%s.%s) != %s", title(e.dclType), v, e.field, e.name))
cmd/compile: start implementing strongly typed aux and auxint fields Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-19 16:25:08 -07:00
case "Type":
rr.add(breakf("%s.%s != %s", v, e.field, e.name))
}
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
} else {
cmd/compile: start implementing strongly typed aux and auxint fields Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-19 16:25:08 -07:00
switch e.field {
case "Aux":
cmd/compile: disallow rewrite rules from declaring reserved names If I change a rule in ARM64.rules to use the variable name "b" in a conflicting way, rulegen would previously not complain, and the compiler would later give a confusing error: $ go run *.go && go build cmd/compile/internal/ssa # cmd/compile/internal/ssa ../rewriteARM64.go:24236:10: b.NewValue0 undefined (type int64 has no field or method NewValue0) Make rulegen complain early about those cases. Sometimes they might happen to be harmless, but in general they can easily cause confusion or unintended effect due to shadowing. After the change, with the same conflicting rule: $ go run *.go && go build cmd/compile/internal/ssa 2021/03/22 11:31:49 rule ARM64.rules:495 uses the reserved name b exit status 1 Note that 24 existing rules were using reserved names. It seems like the shadowing was harmless, as it wasn't causing typechecking issues nor did it seem to cause unintended behavior when the rule rewrite code ran. The bool values "b" were renamed "t", since that seems to have a precedent in other rules and in the fmt package. Sequential values like "a b c" were renamed to "x y z", since "b" is reserved. Finally, "typ" was renamed to "_typ", since there doesn't seem to be an obviously better answer. Passes all three of: $ GOARCH=amd64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=arm64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=mips64 go build -toolexec 'toolstash -cmp' -a std Fixes #45154. Change-Id: I1cce194dc7b477886a9c218c17973e996bcedccf Reviewed-on: https://go-review.googlesource.com/c/go/+/303549 Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2021-03-22 11:35:02 +01:00
rr.add(declf(rr.Loc, e.name, "auxTo%s(%s.%s)", title(e.dclType), v, e.field))
cmd/compile: start implementing strongly typed aux and auxint fields Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-19 16:25:08 -07:00
case "AuxInt":
cmd/compile: disallow rewrite rules from declaring reserved names If I change a rule in ARM64.rules to use the variable name "b" in a conflicting way, rulegen would previously not complain, and the compiler would later give a confusing error: $ go run *.go && go build cmd/compile/internal/ssa # cmd/compile/internal/ssa ../rewriteARM64.go:24236:10: b.NewValue0 undefined (type int64 has no field or method NewValue0) Make rulegen complain early about those cases. Sometimes they might happen to be harmless, but in general they can easily cause confusion or unintended effect due to shadowing. After the change, with the same conflicting rule: $ go run *.go && go build cmd/compile/internal/ssa 2021/03/22 11:31:49 rule ARM64.rules:495 uses the reserved name b exit status 1 Note that 24 existing rules were using reserved names. It seems like the shadowing was harmless, as it wasn't causing typechecking issues nor did it seem to cause unintended behavior when the rule rewrite code ran. The bool values "b" were renamed "t", since that seems to have a precedent in other rules and in the fmt package. Sequential values like "a b c" were renamed to "x y z", since "b" is reserved. Finally, "typ" was renamed to "_typ", since there doesn't seem to be an obviously better answer. Passes all three of: $ GOARCH=amd64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=arm64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=mips64 go build -toolexec 'toolstash -cmp' -a std Fixes #45154. Change-Id: I1cce194dc7b477886a9c218c17973e996bcedccf Reviewed-on: https://go-review.googlesource.com/c/go/+/303549 Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2021-03-22 11:35:02 +01:00
rr.add(declf(rr.Loc, e.name, "auxIntTo%s(%s.%s)", title(e.dclType), v, e.field))
cmd/compile: start implementing strongly typed aux and auxint fields Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-19 16:25:08 -07:00
case "Type":
cmd/compile: disallow rewrite rules from declaring reserved names If I change a rule in ARM64.rules to use the variable name "b" in a conflicting way, rulegen would previously not complain, and the compiler would later give a confusing error: $ go run *.go && go build cmd/compile/internal/ssa # cmd/compile/internal/ssa ../rewriteARM64.go:24236:10: b.NewValue0 undefined (type int64 has no field or method NewValue0) Make rulegen complain early about those cases. Sometimes they might happen to be harmless, but in general they can easily cause confusion or unintended effect due to shadowing. After the change, with the same conflicting rule: $ go run *.go && go build cmd/compile/internal/ssa 2021/03/22 11:31:49 rule ARM64.rules:495 uses the reserved name b exit status 1 Note that 24 existing rules were using reserved names. It seems like the shadowing was harmless, as it wasn't causing typechecking issues nor did it seem to cause unintended behavior when the rule rewrite code ran. The bool values "b" were renamed "t", since that seems to have a precedent in other rules and in the fmt package. Sequential values like "a b c" were renamed to "x y z", since "b" is reserved. Finally, "typ" was renamed to "_typ", since there doesn't seem to be an obviously better answer. Passes all three of: $ GOARCH=amd64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=arm64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=mips64 go build -toolexec 'toolstash -cmp' -a std Fixes #45154. Change-Id: I1cce194dc7b477886a9c218c17973e996bcedccf Reviewed-on: https://go-review.googlesource.com/c/go/+/303549 Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2021-03-22 11:35:02 +01:00
rr.add(declf(rr.Loc, e.name, "%s.%s", v, e.field))
cmd/compile: start implementing strongly typed aux and auxint fields Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-19 16:25:08 -07:00
}
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
}
}
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
commutative := op.commutative
if commutative {
if args[0] == args[1] {
// When we have (Add x x), for any x,
// even if there are other uses of x besides these two,
// and even if x is not a variable,
// we can skip the commutative match.
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
commutative = false
}
if cnt[args[0]] == 1 && cnt[args[1]] == 1 {
// When we have (Add x y) with no other uses
// of x and y in the matching rule and condition,
// then we can skip the commutative match (Add y x).
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
commutative = false
}
}
if !pregenTop {
// Access last argument first to minimize bounds checks.
for n := len(args) - 1; n > 0; n-- {
a := args[n]
if a == "_" {
continue
}
if !rr.declared(a) && token.IsIdentifier(a) && !(commutative && len(args) == 2) {
cmd/compile: disallow rewrite rules from declaring reserved names If I change a rule in ARM64.rules to use the variable name "b" in a conflicting way, rulegen would previously not complain, and the compiler would later give a confusing error: $ go run *.go && go build cmd/compile/internal/ssa # cmd/compile/internal/ssa ../rewriteARM64.go:24236:10: b.NewValue0 undefined (type int64 has no field or method NewValue0) Make rulegen complain early about those cases. Sometimes they might happen to be harmless, but in general they can easily cause confusion or unintended effect due to shadowing. After the change, with the same conflicting rule: $ go run *.go && go build cmd/compile/internal/ssa 2021/03/22 11:31:49 rule ARM64.rules:495 uses the reserved name b exit status 1 Note that 24 existing rules were using reserved names. It seems like the shadowing was harmless, as it wasn't causing typechecking issues nor did it seem to cause unintended behavior when the rule rewrite code ran. The bool values "b" were renamed "t", since that seems to have a precedent in other rules and in the fmt package. Sequential values like "a b c" were renamed to "x y z", since "b" is reserved. Finally, "typ" was renamed to "_typ", since there doesn't seem to be an obviously better answer. Passes all three of: $ GOARCH=amd64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=arm64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=mips64 go build -toolexec 'toolstash -cmp' -a std Fixes #45154. Change-Id: I1cce194dc7b477886a9c218c17973e996bcedccf Reviewed-on: https://go-review.googlesource.com/c/go/+/303549 Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2021-03-22 11:35:02 +01:00
rr.add(declf(rr.Loc, a, "%s.Args[%d]", v, n))
// delete the last argument so it is not reprocessed
args = args[:n]
} else {
rr.add(stmtf("_ = %s.Args[%d]", v, n))
}
break
}
}
cmd/compile: reduce bounds checks in generated rewrite rules CL 213703 converted generated rewrite rules for commutative ops to use loops instead of duplicated code. However, it loaded args using expressions like v.Args[i] and v.Args[i^1], which the compiler could not eliminate bounds for (including with all outstanding prove CLs). Also, given a series of separate rewrite rules for the same op, we generated bounds checks for every rewrite rule, even though we were repeatedly loading the same set of args. This change reduces both sets of bounds checks. Instead of loading v.Args[i] and v.Args[i^1] for commutative loops, we now preload v.Args[0] and v.Args[1] into local variables, and then swap them (as needed) in the commutative loop post statement. And we now load all top level v.Args into local variables at the beginning of every rewrite rule function. The second optimization is the more significant, but the first helps a little, and they play together nicely from the perspective of generating the code. This does increase register pressure, but the reduced bounds checks more than compensate. Note that the vast majority of rewrite rules evaluated are not applied, so the prologue is the most important part of the rewrite rules. There is one subtle aspect to the new generated code. Because the top level v.Args are shared across rewrite rules, and rule evaluation can swap v_0 and v_1, v_0 and v_1 can end up being swapped from one rule to the next. That is OK, because any time a rule does not get applied, they will have been swapped exactly twice. Passes toolstash-check -all. name old time/op new time/op delta Template 213ms ± 2% 211ms ± 2% -0.85% (p=0.000 n=92+96) Unicode 83.5ms ± 2% 83.2ms ± 2% -0.41% (p=0.004 n=95+90) GoTypes 737ms ± 2% 733ms ± 2% -0.51% (p=0.000 n=91+94) Compiler 3.45s ± 2% 3.43s ± 2% -0.44% (p=0.000 n=99+100) SSA 8.54s ± 1% 8.32s ± 2% -2.56% (p=0.000 n=96+99) Flate 136ms ± 2% 135ms ± 1% -0.47% (p=0.000 n=96+96) GoParser 169ms ± 1% 168ms ± 1% -0.33% (p=0.000 n=96+93) Reflect 456ms ± 3% 455ms ± 3% ~ (p=0.261 n=95+94) Tar 186ms ± 2% 185ms ± 2% -0.48% (p=0.000 n=94+95) XML 251ms ± 1% 250ms ± 1% -0.51% (p=0.000 n=91+94) [Geo mean] 424ms 421ms -0.68% name old user-time/op new user-time/op delta Template 275ms ± 1% 274ms ± 2% -0.55% (p=0.000 n=95+98) Unicode 118ms ± 4% 118ms ± 4% ~ (p=0.642 n=98+90) GoTypes 983ms ± 1% 980ms ± 1% -0.30% (p=0.000 n=93+93) Compiler 4.56s ± 6% 4.52s ± 6% -0.72% (p=0.003 n=100+100) SSA 11.4s ± 1% 11.1s ± 1% -2.50% (p=0.000 n=96+97) Flate 168ms ± 1% 167ms ± 1% -0.49% (p=0.000 n=92+92) GoParser 204ms ± 1% 204ms ± 2% -0.27% (p=0.003 n=99+96) Reflect 599ms ± 2% 598ms ± 2% ~ (p=0.116 n=95+92) Tar 227ms ± 2% 225ms ± 2% -0.57% (p=0.000 n=95+98) XML 313ms ± 2% 312ms ± 1% -0.37% (p=0.000 n=89+95) [Geo mean] 547ms 544ms -0.61% file before after Δ % compile 21113112 21109016 -4096 -0.019% total 131704940 131700844 -4096 -0.003% Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956 Reviewed-on: https://go-review.googlesource.com/c/go/+/216218 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-20 20:09:41 -08:00
if commutative && !pregenTop {
for i := 0; i <= 1; i++ {
vname := fmt.Sprintf("%s_%d", v, i)
cmd/compile: disallow rewrite rules from declaring reserved names If I change a rule in ARM64.rules to use the variable name "b" in a conflicting way, rulegen would previously not complain, and the compiler would later give a confusing error: $ go run *.go && go build cmd/compile/internal/ssa # cmd/compile/internal/ssa ../rewriteARM64.go:24236:10: b.NewValue0 undefined (type int64 has no field or method NewValue0) Make rulegen complain early about those cases. Sometimes they might happen to be harmless, but in general they can easily cause confusion or unintended effect due to shadowing. After the change, with the same conflicting rule: $ go run *.go && go build cmd/compile/internal/ssa 2021/03/22 11:31:49 rule ARM64.rules:495 uses the reserved name b exit status 1 Note that 24 existing rules were using reserved names. It seems like the shadowing was harmless, as it wasn't causing typechecking issues nor did it seem to cause unintended behavior when the rule rewrite code ran. The bool values "b" were renamed "t", since that seems to have a precedent in other rules and in the fmt package. Sequential values like "a b c" were renamed to "x y z", since "b" is reserved. Finally, "typ" was renamed to "_typ", since there doesn't seem to be an obviously better answer. Passes all three of: $ GOARCH=amd64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=arm64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=mips64 go build -toolexec 'toolstash -cmp' -a std Fixes #45154. Change-Id: I1cce194dc7b477886a9c218c17973e996bcedccf Reviewed-on: https://go-review.googlesource.com/c/go/+/303549 Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2021-03-22 11:35:02 +01:00
rr.add(declf(rr.Loc, vname, "%s.Args[%d]", v, i))
cmd/compile: reduce bounds checks in generated rewrite rules CL 213703 converted generated rewrite rules for commutative ops to use loops instead of duplicated code. However, it loaded args using expressions like v.Args[i] and v.Args[i^1], which the compiler could not eliminate bounds for (including with all outstanding prove CLs). Also, given a series of separate rewrite rules for the same op, we generated bounds checks for every rewrite rule, even though we were repeatedly loading the same set of args. This change reduces both sets of bounds checks. Instead of loading v.Args[i] and v.Args[i^1] for commutative loops, we now preload v.Args[0] and v.Args[1] into local variables, and then swap them (as needed) in the commutative loop post statement. And we now load all top level v.Args into local variables at the beginning of every rewrite rule function. The second optimization is the more significant, but the first helps a little, and they play together nicely from the perspective of generating the code. This does increase register pressure, but the reduced bounds checks more than compensate. Note that the vast majority of rewrite rules evaluated are not applied, so the prologue is the most important part of the rewrite rules. There is one subtle aspect to the new generated code. Because the top level v.Args are shared across rewrite rules, and rule evaluation can swap v_0 and v_1, v_0 and v_1 can end up being swapped from one rule to the next. That is OK, because any time a rule does not get applied, they will have been swapped exactly twice. Passes toolstash-check -all. name old time/op new time/op delta Template 213ms ± 2% 211ms ± 2% -0.85% (p=0.000 n=92+96) Unicode 83.5ms ± 2% 83.2ms ± 2% -0.41% (p=0.004 n=95+90) GoTypes 737ms ± 2% 733ms ± 2% -0.51% (p=0.000 n=91+94) Compiler 3.45s ± 2% 3.43s ± 2% -0.44% (p=0.000 n=99+100) SSA 8.54s ± 1% 8.32s ± 2% -2.56% (p=0.000 n=96+99) Flate 136ms ± 2% 135ms ± 1% -0.47% (p=0.000 n=96+96) GoParser 169ms ± 1% 168ms ± 1% -0.33% (p=0.000 n=96+93) Reflect 456ms ± 3% 455ms ± 3% ~ (p=0.261 n=95+94) Tar 186ms ± 2% 185ms ± 2% -0.48% (p=0.000 n=94+95) XML 251ms ± 1% 250ms ± 1% -0.51% (p=0.000 n=91+94) [Geo mean] 424ms 421ms -0.68% name old user-time/op new user-time/op delta Template 275ms ± 1% 274ms ± 2% -0.55% (p=0.000 n=95+98) Unicode 118ms ± 4% 118ms ± 4% ~ (p=0.642 n=98+90) GoTypes 983ms ± 1% 980ms ± 1% -0.30% (p=0.000 n=93+93) Compiler 4.56s ± 6% 4.52s ± 6% -0.72% (p=0.003 n=100+100) SSA 11.4s ± 1% 11.1s ± 1% -2.50% (p=0.000 n=96+97) Flate 168ms ± 1% 167ms ± 1% -0.49% (p=0.000 n=92+92) GoParser 204ms ± 1% 204ms ± 2% -0.27% (p=0.003 n=99+96) Reflect 599ms ± 2% 598ms ± 2% ~ (p=0.116 n=95+92) Tar 227ms ± 2% 225ms ± 2% -0.57% (p=0.000 n=95+98) XML 313ms ± 2% 312ms ± 1% -0.37% (p=0.000 n=89+95) [Geo mean] 547ms 544ms -0.61% file before after Δ % compile 21113112 21109016 -4096 -0.019% total 131704940 131700844 -4096 -0.003% Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956 Reviewed-on: https://go-review.googlesource.com/c/go/+/216218 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-20 20:09:41 -08:00
}
}
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
if commutative {
rr.add(StartCommuteLoop{rr.CommuteDepth, v})
rr.CommuteDepth++
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
}
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
for i, arg := range args {
if arg == "_" {
continue
}
cmd/compile: reduce bounds checks in generated rewrite rules CL 213703 converted generated rewrite rules for commutative ops to use loops instead of duplicated code. However, it loaded args using expressions like v.Args[i] and v.Args[i^1], which the compiler could not eliminate bounds for (including with all outstanding prove CLs). Also, given a series of separate rewrite rules for the same op, we generated bounds checks for every rewrite rule, even though we were repeatedly loading the same set of args. This change reduces both sets of bounds checks. Instead of loading v.Args[i] and v.Args[i^1] for commutative loops, we now preload v.Args[0] and v.Args[1] into local variables, and then swap them (as needed) in the commutative loop post statement. And we now load all top level v.Args into local variables at the beginning of every rewrite rule function. The second optimization is the more significant, but the first helps a little, and they play together nicely from the perspective of generating the code. This does increase register pressure, but the reduced bounds checks more than compensate. Note that the vast majority of rewrite rules evaluated are not applied, so the prologue is the most important part of the rewrite rules. There is one subtle aspect to the new generated code. Because the top level v.Args are shared across rewrite rules, and rule evaluation can swap v_0 and v_1, v_0 and v_1 can end up being swapped from one rule to the next. That is OK, because any time a rule does not get applied, they will have been swapped exactly twice. Passes toolstash-check -all. name old time/op new time/op delta Template 213ms ± 2% 211ms ± 2% -0.85% (p=0.000 n=92+96) Unicode 83.5ms ± 2% 83.2ms ± 2% -0.41% (p=0.004 n=95+90) GoTypes 737ms ± 2% 733ms ± 2% -0.51% (p=0.000 n=91+94) Compiler 3.45s ± 2% 3.43s ± 2% -0.44% (p=0.000 n=99+100) SSA 8.54s ± 1% 8.32s ± 2% -2.56% (p=0.000 n=96+99) Flate 136ms ± 2% 135ms ± 1% -0.47% (p=0.000 n=96+96) GoParser 169ms ± 1% 168ms ± 1% -0.33% (p=0.000 n=96+93) Reflect 456ms ± 3% 455ms ± 3% ~ (p=0.261 n=95+94) Tar 186ms ± 2% 185ms ± 2% -0.48% (p=0.000 n=94+95) XML 251ms ± 1% 250ms ± 1% -0.51% (p=0.000 n=91+94) [Geo mean] 424ms 421ms -0.68% name old user-time/op new user-time/op delta Template 275ms ± 1% 274ms ± 2% -0.55% (p=0.000 n=95+98) Unicode 118ms ± 4% 118ms ± 4% ~ (p=0.642 n=98+90) GoTypes 983ms ± 1% 980ms ± 1% -0.30% (p=0.000 n=93+93) Compiler 4.56s ± 6% 4.52s ± 6% -0.72% (p=0.003 n=100+100) SSA 11.4s ± 1% 11.1s ± 1% -2.50% (p=0.000 n=96+97) Flate 168ms ± 1% 167ms ± 1% -0.49% (p=0.000 n=92+92) GoParser 204ms ± 1% 204ms ± 2% -0.27% (p=0.003 n=99+96) Reflect 599ms ± 2% 598ms ± 2% ~ (p=0.116 n=95+92) Tar 227ms ± 2% 225ms ± 2% -0.57% (p=0.000 n=95+98) XML 313ms ± 2% 312ms ± 1% -0.37% (p=0.000 n=89+95) [Geo mean] 547ms 544ms -0.61% file before after Δ % compile 21113112 21109016 -4096 -0.019% total 131704940 131700844 -4096 -0.003% Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956 Reviewed-on: https://go-review.googlesource.com/c/go/+/216218 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-20 20:09:41 -08:00
var rhs string
if (commutative && i < 2) || pregenTop {
rhs = fmt.Sprintf("%s_%d", v, i)
} else {
rhs = fmt.Sprintf("%s.Args[%d]", v, i)
}
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
if !strings.Contains(arg, "(") {
// leaf variable
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
if rr.declared(arg) {
// variable already has a definition. Check whether
// the old definition and the new definition match.
// For example, (add x x). Equality is just pointer equality
// on Values (so cse is important to do before lowering).
cmd/compile: reduce bounds checks in generated rewrite rules CL 213703 converted generated rewrite rules for commutative ops to use loops instead of duplicated code. However, it loaded args using expressions like v.Args[i] and v.Args[i^1], which the compiler could not eliminate bounds for (including with all outstanding prove CLs). Also, given a series of separate rewrite rules for the same op, we generated bounds checks for every rewrite rule, even though we were repeatedly loading the same set of args. This change reduces both sets of bounds checks. Instead of loading v.Args[i] and v.Args[i^1] for commutative loops, we now preload v.Args[0] and v.Args[1] into local variables, and then swap them (as needed) in the commutative loop post statement. And we now load all top level v.Args into local variables at the beginning of every rewrite rule function. The second optimization is the more significant, but the first helps a little, and they play together nicely from the perspective of generating the code. This does increase register pressure, but the reduced bounds checks more than compensate. Note that the vast majority of rewrite rules evaluated are not applied, so the prologue is the most important part of the rewrite rules. There is one subtle aspect to the new generated code. Because the top level v.Args are shared across rewrite rules, and rule evaluation can swap v_0 and v_1, v_0 and v_1 can end up being swapped from one rule to the next. That is OK, because any time a rule does not get applied, they will have been swapped exactly twice. Passes toolstash-check -all. name old time/op new time/op delta Template 213ms ± 2% 211ms ± 2% -0.85% (p=0.000 n=92+96) Unicode 83.5ms ± 2% 83.2ms ± 2% -0.41% (p=0.004 n=95+90) GoTypes 737ms ± 2% 733ms ± 2% -0.51% (p=0.000 n=91+94) Compiler 3.45s ± 2% 3.43s ± 2% -0.44% (p=0.000 n=99+100) SSA 8.54s ± 1% 8.32s ± 2% -2.56% (p=0.000 n=96+99) Flate 136ms ± 2% 135ms ± 1% -0.47% (p=0.000 n=96+96) GoParser 169ms ± 1% 168ms ± 1% -0.33% (p=0.000 n=96+93) Reflect 456ms ± 3% 455ms ± 3% ~ (p=0.261 n=95+94) Tar 186ms ± 2% 185ms ± 2% -0.48% (p=0.000 n=94+95) XML 251ms ± 1% 250ms ± 1% -0.51% (p=0.000 n=91+94) [Geo mean] 424ms 421ms -0.68% name old user-time/op new user-time/op delta Template 275ms ± 1% 274ms ± 2% -0.55% (p=0.000 n=95+98) Unicode 118ms ± 4% 118ms ± 4% ~ (p=0.642 n=98+90) GoTypes 983ms ± 1% 980ms ± 1% -0.30% (p=0.000 n=93+93) Compiler 4.56s ± 6% 4.52s ± 6% -0.72% (p=0.003 n=100+100) SSA 11.4s ± 1% 11.1s ± 1% -2.50% (p=0.000 n=96+97) Flate 168ms ± 1% 167ms ± 1% -0.49% (p=0.000 n=92+92) GoParser 204ms ± 1% 204ms ± 2% -0.27% (p=0.003 n=99+96) Reflect 599ms ± 2% 598ms ± 2% ~ (p=0.116 n=95+92) Tar 227ms ± 2% 225ms ± 2% -0.57% (p=0.000 n=95+98) XML 313ms ± 2% 312ms ± 1% -0.37% (p=0.000 n=89+95) [Geo mean] 547ms 544ms -0.61% file before after Δ % compile 21113112 21109016 -4096 -0.019% total 131704940 131700844 -4096 -0.003% Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956 Reviewed-on: https://go-review.googlesource.com/c/go/+/216218 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-20 20:09:41 -08:00
rr.add(breakf("%s != %s", arg, rhs))
} else {
cmd/compile: reduce bounds checks in generated rewrite rules CL 213703 converted generated rewrite rules for commutative ops to use loops instead of duplicated code. However, it loaded args using expressions like v.Args[i] and v.Args[i^1], which the compiler could not eliminate bounds for (including with all outstanding prove CLs). Also, given a series of separate rewrite rules for the same op, we generated bounds checks for every rewrite rule, even though we were repeatedly loading the same set of args. This change reduces both sets of bounds checks. Instead of loading v.Args[i] and v.Args[i^1] for commutative loops, we now preload v.Args[0] and v.Args[1] into local variables, and then swap them (as needed) in the commutative loop post statement. And we now load all top level v.Args into local variables at the beginning of every rewrite rule function. The second optimization is the more significant, but the first helps a little, and they play together nicely from the perspective of generating the code. This does increase register pressure, but the reduced bounds checks more than compensate. Note that the vast majority of rewrite rules evaluated are not applied, so the prologue is the most important part of the rewrite rules. There is one subtle aspect to the new generated code. Because the top level v.Args are shared across rewrite rules, and rule evaluation can swap v_0 and v_1, v_0 and v_1 can end up being swapped from one rule to the next. That is OK, because any time a rule does not get applied, they will have been swapped exactly twice. Passes toolstash-check -all. name old time/op new time/op delta Template 213ms ± 2% 211ms ± 2% -0.85% (p=0.000 n=92+96) Unicode 83.5ms ± 2% 83.2ms ± 2% -0.41% (p=0.004 n=95+90) GoTypes 737ms ± 2% 733ms ± 2% -0.51% (p=0.000 n=91+94) Compiler 3.45s ± 2% 3.43s ± 2% -0.44% (p=0.000 n=99+100) SSA 8.54s ± 1% 8.32s ± 2% -2.56% (p=0.000 n=96+99) Flate 136ms ± 2% 135ms ± 1% -0.47% (p=0.000 n=96+96) GoParser 169ms ± 1% 168ms ± 1% -0.33% (p=0.000 n=96+93) Reflect 456ms ± 3% 455ms ± 3% ~ (p=0.261 n=95+94) Tar 186ms ± 2% 185ms ± 2% -0.48% (p=0.000 n=94+95) XML 251ms ± 1% 250ms ± 1% -0.51% (p=0.000 n=91+94) [Geo mean] 424ms 421ms -0.68% name old user-time/op new user-time/op delta Template 275ms ± 1% 274ms ± 2% -0.55% (p=0.000 n=95+98) Unicode 118ms ± 4% 118ms ± 4% ~ (p=0.642 n=98+90) GoTypes 983ms ± 1% 980ms ± 1% -0.30% (p=0.000 n=93+93) Compiler 4.56s ± 6% 4.52s ± 6% -0.72% (p=0.003 n=100+100) SSA 11.4s ± 1% 11.1s ± 1% -2.50% (p=0.000 n=96+97) Flate 168ms ± 1% 167ms ± 1% -0.49% (p=0.000 n=92+92) GoParser 204ms ± 1% 204ms ± 2% -0.27% (p=0.003 n=99+96) Reflect 599ms ± 2% 598ms ± 2% ~ (p=0.116 n=95+92) Tar 227ms ± 2% 225ms ± 2% -0.57% (p=0.000 n=95+98) XML 313ms ± 2% 312ms ± 1% -0.37% (p=0.000 n=89+95) [Geo mean] 547ms 544ms -0.61% file before after Δ % compile 21113112 21109016 -4096 -0.019% total 131704940 131700844 -4096 -0.003% Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956 Reviewed-on: https://go-review.googlesource.com/c/go/+/216218 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-20 20:09:41 -08:00
if arg != rhs {
cmd/compile: disallow rewrite rules from declaring reserved names If I change a rule in ARM64.rules to use the variable name "b" in a conflicting way, rulegen would previously not complain, and the compiler would later give a confusing error: $ go run *.go && go build cmd/compile/internal/ssa # cmd/compile/internal/ssa ../rewriteARM64.go:24236:10: b.NewValue0 undefined (type int64 has no field or method NewValue0) Make rulegen complain early about those cases. Sometimes they might happen to be harmless, but in general they can easily cause confusion or unintended effect due to shadowing. After the change, with the same conflicting rule: $ go run *.go && go build cmd/compile/internal/ssa 2021/03/22 11:31:49 rule ARM64.rules:495 uses the reserved name b exit status 1 Note that 24 existing rules were using reserved names. It seems like the shadowing was harmless, as it wasn't causing typechecking issues nor did it seem to cause unintended behavior when the rule rewrite code ran. The bool values "b" were renamed "t", since that seems to have a precedent in other rules and in the fmt package. Sequential values like "a b c" were renamed to "x y z", since "b" is reserved. Finally, "typ" was renamed to "_typ", since there doesn't seem to be an obviously better answer. Passes all three of: $ GOARCH=amd64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=arm64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=mips64 go build -toolexec 'toolstash -cmp' -a std Fixes #45154. Change-Id: I1cce194dc7b477886a9c218c17973e996bcedccf Reviewed-on: https://go-review.googlesource.com/c/go/+/303549 Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2021-03-22 11:35:02 +01:00
rr.add(declf(rr.Loc, arg, "%s", rhs))
cmd/compile: reduce bounds checks in generated rewrite rules CL 213703 converted generated rewrite rules for commutative ops to use loops instead of duplicated code. However, it loaded args using expressions like v.Args[i] and v.Args[i^1], which the compiler could not eliminate bounds for (including with all outstanding prove CLs). Also, given a series of separate rewrite rules for the same op, we generated bounds checks for every rewrite rule, even though we were repeatedly loading the same set of args. This change reduces both sets of bounds checks. Instead of loading v.Args[i] and v.Args[i^1] for commutative loops, we now preload v.Args[0] and v.Args[1] into local variables, and then swap them (as needed) in the commutative loop post statement. And we now load all top level v.Args into local variables at the beginning of every rewrite rule function. The second optimization is the more significant, but the first helps a little, and they play together nicely from the perspective of generating the code. This does increase register pressure, but the reduced bounds checks more than compensate. Note that the vast majority of rewrite rules evaluated are not applied, so the prologue is the most important part of the rewrite rules. There is one subtle aspect to the new generated code. Because the top level v.Args are shared across rewrite rules, and rule evaluation can swap v_0 and v_1, v_0 and v_1 can end up being swapped from one rule to the next. That is OK, because any time a rule does not get applied, they will have been swapped exactly twice. Passes toolstash-check -all. name old time/op new time/op delta Template 213ms ± 2% 211ms ± 2% -0.85% (p=0.000 n=92+96) Unicode 83.5ms ± 2% 83.2ms ± 2% -0.41% (p=0.004 n=95+90) GoTypes 737ms ± 2% 733ms ± 2% -0.51% (p=0.000 n=91+94) Compiler 3.45s ± 2% 3.43s ± 2% -0.44% (p=0.000 n=99+100) SSA 8.54s ± 1% 8.32s ± 2% -2.56% (p=0.000 n=96+99) Flate 136ms ± 2% 135ms ± 1% -0.47% (p=0.000 n=96+96) GoParser 169ms ± 1% 168ms ± 1% -0.33% (p=0.000 n=96+93) Reflect 456ms ± 3% 455ms ± 3% ~ (p=0.261 n=95+94) Tar 186ms ± 2% 185ms ± 2% -0.48% (p=0.000 n=94+95) XML 251ms ± 1% 250ms ± 1% -0.51% (p=0.000 n=91+94) [Geo mean] 424ms 421ms -0.68% name old user-time/op new user-time/op delta Template 275ms ± 1% 274ms ± 2% -0.55% (p=0.000 n=95+98) Unicode 118ms ± 4% 118ms ± 4% ~ (p=0.642 n=98+90) GoTypes 983ms ± 1% 980ms ± 1% -0.30% (p=0.000 n=93+93) Compiler 4.56s ± 6% 4.52s ± 6% -0.72% (p=0.003 n=100+100) SSA 11.4s ± 1% 11.1s ± 1% -2.50% (p=0.000 n=96+97) Flate 168ms ± 1% 167ms ± 1% -0.49% (p=0.000 n=92+92) GoParser 204ms ± 1% 204ms ± 2% -0.27% (p=0.003 n=99+96) Reflect 599ms ± 2% 598ms ± 2% ~ (p=0.116 n=95+92) Tar 227ms ± 2% 225ms ± 2% -0.57% (p=0.000 n=95+98) XML 313ms ± 2% 312ms ± 1% -0.37% (p=0.000 n=89+95) [Geo mean] 547ms 544ms -0.61% file before after Δ % compile 21113112 21109016 -4096 -0.019% total 131704940 131700844 -4096 -0.003% Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956 Reviewed-on: https://go-review.googlesource.com/c/go/+/216218 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-20 20:09:41 -08:00
}
}
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
continue
}
// compound sexpr
argname, expr := splitNameExpr(arg)
if argname == "" {
argname = fmt.Sprintf("%s_%d", v, i)
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
}
if argname == "b" {
log.Fatalf("don't name args 'b', it is ambiguous with blocks")
}
cmd/compile: reduce bounds checks in generated rewrite rules CL 213703 converted generated rewrite rules for commutative ops to use loops instead of duplicated code. However, it loaded args using expressions like v.Args[i] and v.Args[i^1], which the compiler could not eliminate bounds for (including with all outstanding prove CLs). Also, given a series of separate rewrite rules for the same op, we generated bounds checks for every rewrite rule, even though we were repeatedly loading the same set of args. This change reduces both sets of bounds checks. Instead of loading v.Args[i] and v.Args[i^1] for commutative loops, we now preload v.Args[0] and v.Args[1] into local variables, and then swap them (as needed) in the commutative loop post statement. And we now load all top level v.Args into local variables at the beginning of every rewrite rule function. The second optimization is the more significant, but the first helps a little, and they play together nicely from the perspective of generating the code. This does increase register pressure, but the reduced bounds checks more than compensate. Note that the vast majority of rewrite rules evaluated are not applied, so the prologue is the most important part of the rewrite rules. There is one subtle aspect to the new generated code. Because the top level v.Args are shared across rewrite rules, and rule evaluation can swap v_0 and v_1, v_0 and v_1 can end up being swapped from one rule to the next. That is OK, because any time a rule does not get applied, they will have been swapped exactly twice. Passes toolstash-check -all. name old time/op new time/op delta Template 213ms ± 2% 211ms ± 2% -0.85% (p=0.000 n=92+96) Unicode 83.5ms ± 2% 83.2ms ± 2% -0.41% (p=0.004 n=95+90) GoTypes 737ms ± 2% 733ms ± 2% -0.51% (p=0.000 n=91+94) Compiler 3.45s ± 2% 3.43s ± 2% -0.44% (p=0.000 n=99+100) SSA 8.54s ± 1% 8.32s ± 2% -2.56% (p=0.000 n=96+99) Flate 136ms ± 2% 135ms ± 1% -0.47% (p=0.000 n=96+96) GoParser 169ms ± 1% 168ms ± 1% -0.33% (p=0.000 n=96+93) Reflect 456ms ± 3% 455ms ± 3% ~ (p=0.261 n=95+94) Tar 186ms ± 2% 185ms ± 2% -0.48% (p=0.000 n=94+95) XML 251ms ± 1% 250ms ± 1% -0.51% (p=0.000 n=91+94) [Geo mean] 424ms 421ms -0.68% name old user-time/op new user-time/op delta Template 275ms ± 1% 274ms ± 2% -0.55% (p=0.000 n=95+98) Unicode 118ms ± 4% 118ms ± 4% ~ (p=0.642 n=98+90) GoTypes 983ms ± 1% 980ms ± 1% -0.30% (p=0.000 n=93+93) Compiler 4.56s ± 6% 4.52s ± 6% -0.72% (p=0.003 n=100+100) SSA 11.4s ± 1% 11.1s ± 1% -2.50% (p=0.000 n=96+97) Flate 168ms ± 1% 167ms ± 1% -0.49% (p=0.000 n=92+92) GoParser 204ms ± 1% 204ms ± 2% -0.27% (p=0.003 n=99+96) Reflect 599ms ± 2% 598ms ± 2% ~ (p=0.116 n=95+92) Tar 227ms ± 2% 225ms ± 2% -0.57% (p=0.000 n=95+98) XML 313ms ± 2% 312ms ± 1% -0.37% (p=0.000 n=89+95) [Geo mean] 547ms 544ms -0.61% file before after Δ % compile 21113112 21109016 -4096 -0.019% total 131704940 131700844 -4096 -0.003% Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956 Reviewed-on: https://go-review.googlesource.com/c/go/+/216218 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-20 20:09:41 -08:00
if argname != rhs {
cmd/compile: disallow rewrite rules from declaring reserved names If I change a rule in ARM64.rules to use the variable name "b" in a conflicting way, rulegen would previously not complain, and the compiler would later give a confusing error: $ go run *.go && go build cmd/compile/internal/ssa # cmd/compile/internal/ssa ../rewriteARM64.go:24236:10: b.NewValue0 undefined (type int64 has no field or method NewValue0) Make rulegen complain early about those cases. Sometimes they might happen to be harmless, but in general they can easily cause confusion or unintended effect due to shadowing. After the change, with the same conflicting rule: $ go run *.go && go build cmd/compile/internal/ssa 2021/03/22 11:31:49 rule ARM64.rules:495 uses the reserved name b exit status 1 Note that 24 existing rules were using reserved names. It seems like the shadowing was harmless, as it wasn't causing typechecking issues nor did it seem to cause unintended behavior when the rule rewrite code ran. The bool values "b" were renamed "t", since that seems to have a precedent in other rules and in the fmt package. Sequential values like "a b c" were renamed to "x y z", since "b" is reserved. Finally, "typ" was renamed to "_typ", since there doesn't seem to be an obviously better answer. Passes all three of: $ GOARCH=amd64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=arm64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=mips64 go build -toolexec 'toolstash -cmp' -a std Fixes #45154. Change-Id: I1cce194dc7b477886a9c218c17973e996bcedccf Reviewed-on: https://go-review.googlesource.com/c/go/+/303549 Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2021-03-22 11:35:02 +01:00
rr.add(declf(rr.Loc, argname, "%s", rhs))
cmd/compile: reduce bounds checks in generated rewrite rules CL 213703 converted generated rewrite rules for commutative ops to use loops instead of duplicated code. However, it loaded args using expressions like v.Args[i] and v.Args[i^1], which the compiler could not eliminate bounds for (including with all outstanding prove CLs). Also, given a series of separate rewrite rules for the same op, we generated bounds checks for every rewrite rule, even though we were repeatedly loading the same set of args. This change reduces both sets of bounds checks. Instead of loading v.Args[i] and v.Args[i^1] for commutative loops, we now preload v.Args[0] and v.Args[1] into local variables, and then swap them (as needed) in the commutative loop post statement. And we now load all top level v.Args into local variables at the beginning of every rewrite rule function. The second optimization is the more significant, but the first helps a little, and they play together nicely from the perspective of generating the code. This does increase register pressure, but the reduced bounds checks more than compensate. Note that the vast majority of rewrite rules evaluated are not applied, so the prologue is the most important part of the rewrite rules. There is one subtle aspect to the new generated code. Because the top level v.Args are shared across rewrite rules, and rule evaluation can swap v_0 and v_1, v_0 and v_1 can end up being swapped from one rule to the next. That is OK, because any time a rule does not get applied, they will have been swapped exactly twice. Passes toolstash-check -all. name old time/op new time/op delta Template 213ms ± 2% 211ms ± 2% -0.85% (p=0.000 n=92+96) Unicode 83.5ms ± 2% 83.2ms ± 2% -0.41% (p=0.004 n=95+90) GoTypes 737ms ± 2% 733ms ± 2% -0.51% (p=0.000 n=91+94) Compiler 3.45s ± 2% 3.43s ± 2% -0.44% (p=0.000 n=99+100) SSA 8.54s ± 1% 8.32s ± 2% -2.56% (p=0.000 n=96+99) Flate 136ms ± 2% 135ms ± 1% -0.47% (p=0.000 n=96+96) GoParser 169ms ± 1% 168ms ± 1% -0.33% (p=0.000 n=96+93) Reflect 456ms ± 3% 455ms ± 3% ~ (p=0.261 n=95+94) Tar 186ms ± 2% 185ms ± 2% -0.48% (p=0.000 n=94+95) XML 251ms ± 1% 250ms ± 1% -0.51% (p=0.000 n=91+94) [Geo mean] 424ms 421ms -0.68% name old user-time/op new user-time/op delta Template 275ms ± 1% 274ms ± 2% -0.55% (p=0.000 n=95+98) Unicode 118ms ± 4% 118ms ± 4% ~ (p=0.642 n=98+90) GoTypes 983ms ± 1% 980ms ± 1% -0.30% (p=0.000 n=93+93) Compiler 4.56s ± 6% 4.52s ± 6% -0.72% (p=0.003 n=100+100) SSA 11.4s ± 1% 11.1s ± 1% -2.50% (p=0.000 n=96+97) Flate 168ms ± 1% 167ms ± 1% -0.49% (p=0.000 n=92+92) GoParser 204ms ± 1% 204ms ± 2% -0.27% (p=0.003 n=99+96) Reflect 599ms ± 2% 598ms ± 2% ~ (p=0.116 n=95+92) Tar 227ms ± 2% 225ms ± 2% -0.57% (p=0.000 n=95+98) XML 313ms ± 2% 312ms ± 1% -0.37% (p=0.000 n=89+95) [Geo mean] 547ms 544ms -0.61% file before after Δ % compile 21113112 21109016 -4096 -0.019% total 131704940 131700844 -4096 -0.003% Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956 Reviewed-on: https://go-review.googlesource.com/c/go/+/216218 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-20 20:09:41 -08:00
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
bexpr := exprf("%s.Op != addLater", argname)
rr.add(&CondBreak{Cond: bexpr})
argPos, argCheckOp := genMatch0(rr, arch, expr, argname, cnt, false)
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
bexpr.(*ast.BinaryExpr).Y.(*ast.Ident).Name = argCheckOp
if argPos != "" {
// Keep the argument in preference to the parent, as the
// argument is normally earlier in program flow.
// Keep the argument in preference to an earlier argument,
// as that prefers the memory argument which is also earlier
// in the program flow.
pos = argPos
}
}
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
return pos, checkOp
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
func genResult(rr *RuleRewrite, arch arch, result, pos string) {
move := result[0] == '@'
if move {
// parse @block directive
s := strings.SplitN(result[1:], " ", 2)
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
rr.add(stmtf("b = %s", s[0]))
result = s[1]
}
cse := make(map[string]string)
genResult0(rr, arch, result, true, move, pos, cse)
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
func genResult0(rr *RuleRewrite, arch arch, result string, top, move bool, pos string, cse map[string]string) string {
resname, expr := splitNameExpr(result)
result = expr
// TODO: when generating a constant result, use f.constVal to avoid
// introducing copies just to clean them up again.
if result[0] != '(' {
// variable
if top {
// It in not safe in general to move a variable between blocks
// (and particularly not a phi node).
// Introduce a copy.
rr.add(stmtf("v.copyOf(%s)", result))
}
return result
}
w := normalizeWhitespace(result)
if prev := cse[w]; prev != "" {
return prev
}
op, oparch, typ, auxint, aux, args := parseValue(result, arch, rr.Loc)
// Find the type of the variable.
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
typeOverride := typ != ""
if typ == "" && op.typ != "" {
typ = typeName(op.typ)
}
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
v := "v"
if top && !move {
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
rr.add(stmtf("v.reset(Op%s%s)", oparch, op.name))
if typeOverride {
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
rr.add(stmtf("v.Type = %s", typ))
}
} else {
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
if typ == "" {
log.Fatalf("sub-expression %s (op=Op%s%s) at %s must have a type", result, oparch, op.name, rr.Loc)
}
if resname == "" {
v = fmt.Sprintf("v%d", rr.Alloc)
} else {
v = resname
}
rr.Alloc++
cmd/compile: disallow rewrite rules from declaring reserved names If I change a rule in ARM64.rules to use the variable name "b" in a conflicting way, rulegen would previously not complain, and the compiler would later give a confusing error: $ go run *.go && go build cmd/compile/internal/ssa # cmd/compile/internal/ssa ../rewriteARM64.go:24236:10: b.NewValue0 undefined (type int64 has no field or method NewValue0) Make rulegen complain early about those cases. Sometimes they might happen to be harmless, but in general they can easily cause confusion or unintended effect due to shadowing. After the change, with the same conflicting rule: $ go run *.go && go build cmd/compile/internal/ssa 2021/03/22 11:31:49 rule ARM64.rules:495 uses the reserved name b exit status 1 Note that 24 existing rules were using reserved names. It seems like the shadowing was harmless, as it wasn't causing typechecking issues nor did it seem to cause unintended behavior when the rule rewrite code ran. The bool values "b" were renamed "t", since that seems to have a precedent in other rules and in the fmt package. Sequential values like "a b c" were renamed to "x y z", since "b" is reserved. Finally, "typ" was renamed to "_typ", since there doesn't seem to be an obviously better answer. Passes all three of: $ GOARCH=amd64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=arm64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=mips64 go build -toolexec 'toolstash -cmp' -a std Fixes #45154. Change-Id: I1cce194dc7b477886a9c218c17973e996bcedccf Reviewed-on: https://go-review.googlesource.com/c/go/+/303549 Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2021-03-22 11:35:02 +01:00
rr.add(declf(rr.Loc, v, "b.NewValue0(%s, Op%s%s, %s)", pos, oparch, op.name, typ))
if move && top {
// Rewrite original into a copy
rr.add(stmtf("v.copyOf(%s)", v))
}
}
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
if auxint != "" {
// Make sure auxint value has the right type.
rr.add(stmtf("%s.AuxInt = %sToAuxInt(%s)", v, unTitle(op.auxIntType()), auxint))
}
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
if aux != "" {
// Make sure aux value has the right type.
rr.add(stmtf("%s.Aux = %sToAux(%s)", v, unTitle(op.auxType()), aux))
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
}
cmd/compile: add specialized AddArgN functions for rewrite rules This shrinks the compiler without impacting performance. (The performance-sensitive part of rewrite rules is the non-match case.) Passes toolstash-check -all. Executable size: file before after Δ % compile 20356168 20163960 -192208 -0.944% total 115599376 115407168 -192208 -0.166% Text size: file before after Δ % cmd/compile/internal/ssa.s 3928309 3778774 -149535 -3.807% total 18862943 18713408 -149535 -0.793% Memory allocated compiling package SSA: SSA 12.7M ± 0% 12.5M ± 0% -1.74% (p=0.008 n=5+5) Compiler speed impact: name old time/op new time/op delta Template 211ms ± 1% 211ms ± 2% ~ (p=0.832 n=49+49) Unicode 82.8ms ± 2% 83.2ms ± 2% +0.44% (p=0.022 n=46+49) GoTypes 726ms ± 1% 728ms ± 2% ~ (p=0.076 n=46+48) Compiler 3.39s ± 2% 3.40s ± 2% ~ (p=0.633 n=48+49) SSA 7.71s ± 1% 7.65s ± 1% -0.78% (p=0.000 n=45+44) Flate 134ms ± 1% 134ms ± 1% ~ (p=0.195 n=50+49) GoParser 167ms ± 1% 167ms ± 1% ~ (p=0.390 n=47+47) Reflect 453ms ± 3% 452ms ± 2% ~ (p=0.492 n=48+49) Tar 184ms ± 3% 184ms ± 2% ~ (p=0.862 n=50+48) XML 248ms ± 2% 248ms ± 2% ~ (p=0.096 n=49+47) [Geo mean] 415ms 415ms -0.03% name old user-time/op new user-time/op delta Template 273ms ± 1% 273ms ± 2% ~ (p=0.711 n=48+48) Unicode 117ms ± 6% 117ms ± 5% ~ (p=0.633 n=50+50) GoTypes 972ms ± 2% 974ms ± 1% +0.29% (p=0.016 n=47+49) Compiler 4.46s ± 6% 4.51s ± 6% ~ (p=0.093 n=50+50) SSA 10.4s ± 1% 10.3s ± 2% -0.94% (p=0.000 n=45+50) Flate 166ms ± 2% 167ms ± 2% ~ (p=0.148 n=49+48) GoParser 202ms ± 1% 202ms ± 2% -0.28% (p=0.014 n=47+49) Reflect 594ms ± 2% 594ms ± 2% ~ (p=0.717 n=48+49) Tar 224ms ± 2% 224ms ± 2% ~ (p=0.805 n=50+49) XML 311ms ± 1% 310ms ± 1% ~ (p=0.177 n=49+48) [Geo mean] 537ms 537ms +0.01% Change-Id: I562b9f349b34ddcff01771769e6dbbc80604da7a Reviewed-on: https://go-review.googlesource.com/c/go/+/221237 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-02-26 11:29:34 -08:00
all := new(strings.Builder)
for i, arg := range args {
x := genResult0(rr, arch, arg, false, move, pos, cse)
cmd/compile: add specialized AddArgN functions for rewrite rules This shrinks the compiler without impacting performance. (The performance-sensitive part of rewrite rules is the non-match case.) Passes toolstash-check -all. Executable size: file before after Δ % compile 20356168 20163960 -192208 -0.944% total 115599376 115407168 -192208 -0.166% Text size: file before after Δ % cmd/compile/internal/ssa.s 3928309 3778774 -149535 -3.807% total 18862943 18713408 -149535 -0.793% Memory allocated compiling package SSA: SSA 12.7M ± 0% 12.5M ± 0% -1.74% (p=0.008 n=5+5) Compiler speed impact: name old time/op new time/op delta Template 211ms ± 1% 211ms ± 2% ~ (p=0.832 n=49+49) Unicode 82.8ms ± 2% 83.2ms ± 2% +0.44% (p=0.022 n=46+49) GoTypes 726ms ± 1% 728ms ± 2% ~ (p=0.076 n=46+48) Compiler 3.39s ± 2% 3.40s ± 2% ~ (p=0.633 n=48+49) SSA 7.71s ± 1% 7.65s ± 1% -0.78% (p=0.000 n=45+44) Flate 134ms ± 1% 134ms ± 1% ~ (p=0.195 n=50+49) GoParser 167ms ± 1% 167ms ± 1% ~ (p=0.390 n=47+47) Reflect 453ms ± 3% 452ms ± 2% ~ (p=0.492 n=48+49) Tar 184ms ± 3% 184ms ± 2% ~ (p=0.862 n=50+48) XML 248ms ± 2% 248ms ± 2% ~ (p=0.096 n=49+47) [Geo mean] 415ms 415ms -0.03% name old user-time/op new user-time/op delta Template 273ms ± 1% 273ms ± 2% ~ (p=0.711 n=48+48) Unicode 117ms ± 6% 117ms ± 5% ~ (p=0.633 n=50+50) GoTypes 972ms ± 2% 974ms ± 1% +0.29% (p=0.016 n=47+49) Compiler 4.46s ± 6% 4.51s ± 6% ~ (p=0.093 n=50+50) SSA 10.4s ± 1% 10.3s ± 2% -0.94% (p=0.000 n=45+50) Flate 166ms ± 2% 167ms ± 2% ~ (p=0.148 n=49+48) GoParser 202ms ± 1% 202ms ± 2% -0.28% (p=0.014 n=47+49) Reflect 594ms ± 2% 594ms ± 2% ~ (p=0.717 n=48+49) Tar 224ms ± 2% 224ms ± 2% ~ (p=0.805 n=50+49) XML 311ms ± 1% 310ms ± 1% ~ (p=0.177 n=49+48) [Geo mean] 537ms 537ms +0.01% Change-Id: I562b9f349b34ddcff01771769e6dbbc80604da7a Reviewed-on: https://go-review.googlesource.com/c/go/+/221237 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-02-26 11:29:34 -08:00
if i > 0 {
all.WriteString(", ")
}
all.WriteString(x)
}
switch len(args) {
case 0:
case 1:
rr.add(stmtf("%s.AddArg(%s)", v, all.String()))
default:
rr.add(stmtf("%s.AddArg%d(%s)", v, len(args), all.String()))
}
if cse != nil {
cse[w] = v
}
return v
}
func split(s string) []string {
var r []string
outer:
for s != "" {
d := 0 // depth of ({[<
var open, close byte // opening and closing markers ({[< or )}]>
nonsp := false // found a non-space char so far
for i := 0; i < len(s); i++ {
switch {
case d == 0 && s[i] == '(':
open, close = '(', ')'
d++
case d == 0 && s[i] == '<':
open, close = '<', '>'
d++
case d == 0 && s[i] == '[':
open, close = '[', ']'
d++
case d == 0 && s[i] == '{':
open, close = '{', '}'
d++
case d == 0 && (s[i] == ' ' || s[i] == '\t'):
if nonsp {
r = append(r, strings.TrimSpace(s[:i]))
s = s[i:]
continue outer
}
case d > 0 && s[i] == open:
d++
case d > 0 && s[i] == close:
d--
default:
nonsp = true
}
}
if d != 0 {
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
log.Fatalf("imbalanced expression: %q", s)
}
if nonsp {
r = append(r, strings.TrimSpace(s))
}
break
}
return r
}
// isBlock reports whether this op is a block opcode.
func isBlock(name string, arch arch) bool {
for _, b := range genericBlocks {
if b.name == name {
return true
}
}
for _, b := range arch.blocks {
if b.name == name {
return true
}
}
return false
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
func extract(val string) (op, typ, auxint, aux string, args []string) {
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
val = val[1 : len(val)-1] // remove ()
// Split val up into regions.
// Split by spaces/tabs, except those contained in (), {}, [], or <>.
s := split(val)
// Extract restrictions and args.
op = s[0]
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
for _, a := range s[1:] {
switch a[0] {
case '<':
typ = a[1 : len(a)-1] // remove <>
case '[':
auxint = a[1 : len(a)-1] // remove []
case '{':
aux = a[1 : len(a)-1] // remove {}
default:
args = append(args, a)
}
}
return
}
// parseValue parses a parenthesized value from a rule.
// The value can be from the match or the result side.
// It returns the op and unparsed strings for typ, auxint, and aux restrictions and for all args.
// oparch is the architecture that op is located in, or "" for generic.
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
func parseValue(val string, arch arch, loc string) (op opData, oparch, typ, auxint, aux string, args []string) {
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
// Resolve the op.
var s string
s, typ, auxint, aux, args = extract(val)
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
// match reports whether x is a good op to select.
// If strict is true, rule generation might succeed.
// If strict is false, rule generation has failed,
// but we're trying to generate a useful error.
// Doing strict=true then strict=false allows
// precise op matching while retaining good error messages.
match := func(x opData, strict bool, archname string) bool {
if x.name != s {
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
return false
}
if x.argLength != -1 && int(x.argLength) != len(args) && (len(args) != 1 || args[0] != "...") {
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
if strict {
return false
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
log.Printf("%s: op %s (%s) should have %d args, has %d", loc, s, archname, x.argLength, len(args))
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
}
return true
}
for _, x := range genericOps {
if match(x, true, "generic") {
op = x
break
}
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
for _, x := range arch.ops {
if arch.name != "generic" && match(x, true, arch.name) {
if op.name != "" {
log.Fatalf("%s: matches for op %s found in both generic and %s", loc, op.name, arch.name)
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
op = x
oparch = arch.name
break
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
}
}
if op.name == "" {
// Failed to find the op.
// Run through everything again with strict=false
// to generate useful diagnosic messages before failing.
for _, x := range genericOps {
match(x, false, "generic")
}
for _, x := range arch.ops {
match(x, false, arch.name)
}
log.Fatalf("%s: unknown op %s", loc, s)
}
// Sanity check aux, auxint.
2020-01-23 14:33:26 -08:00
if auxint != "" && !opHasAuxInt(op) {
log.Fatalf("%s: op %s %s can't have auxint", loc, op.name, op.aux)
}
2020-01-23 14:33:26 -08:00
if aux != "" && !opHasAux(op) {
log.Fatalf("%s: op %s %s can't have aux", loc, op.name, op.aux)
[dev.ssa] cmd/compile: refactor out rulegen value parsing Previously, genMatch0 and genResult0 contained lots of duplication: locating the op, parsing the value, validation, etc. Parsing and validation was mixed in with code gen. Extract a helper, parseValue. It is responsible for parsing the value, locating the op, and doing shared validation. As a bonus (and possibly as my original motivation), make op selection pay attention to the number of args present. This allows arch-specific ops to share a name with generic ops as long as there is no ambiguity. It also detects and reports unresolved ambiguity, unlike before, where it would simply always pick the generic op, with no warning. Also use parseValue when generating the top-level op dispatch, to ensure its opinion about ops matches genMatch0 and genResult0. The order of statements in the generated code used to depend on the exact rule. It is now somewhat independent of the rule. That is the source of some of the generated code changes in this CL. See rewritedec64 and rewritegeneric for examples. It is a one-time change. The op dispatch switch and functions used to be sorted by opname without architecture. The sort now includes the architecture, leading to further generated code changes. See rewriteARM and rewriteAMD64 for examples. Again, it is a one-time change. There are no functional changes. Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c Reviewed-on: https://go-review.googlesource.com/24649 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-01 11:05:29 -07:00
}
return
}
2020-01-23 14:33:26 -08:00
func opHasAuxInt(op opData) bool {
switch op.aux {
case "Bool", "Int8", "Int16", "Int32", "Int64", "Int128", "UInt8", "Float32", "Float64",
"SymOff", "CallOff", "SymValAndOff", "TypSize", "ARM64BitField", "FlagConstant", "CCop":
2020-01-23 14:33:26 -08:00
return true
}
return false
}
func opHasAux(op opData) bool {
switch op.aux {
case "String", "Sym", "SymOff", "Call", "CallOff", "SymValAndOff", "Typ", "TypSize",
"S390XCCMask", "S390XRotateParams":
2020-01-23 14:33:26 -08:00
return true
}
return false
}
// splitNameExpr splits s-expr arg, possibly prefixed by "name:",
// into name and the unprefixed expression.
// For example, "x:(Foo)" yields "x", "(Foo)",
// and "(Foo)" yields "", "(Foo)".
func splitNameExpr(arg string) (name, expr string) {
colon := strings.Index(arg, ":")
if colon < 0 {
return "", arg
}
openparen := strings.Index(arg, "(")
if openparen < 0 {
log.Fatalf("splitNameExpr(%q): colon but no open parens", arg)
}
if colon > openparen {
// colon is inside the parens, such as in "(Foo x:(Bar))".
return "", arg
}
return arg[:colon], arg[colon+1:]
}
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
func getBlockInfo(op string, arch arch) (name string, data blockData) {
for _, b := range genericBlocks {
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
if b.name == op {
return "Block" + op, b
}
}
for _, b := range arch.blocks {
if b.name == op {
return "Block" + arch.name + op, b
}
}
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
log.Fatalf("could not find block data for %s", op)
panic("unreachable")
}
// typeName returns the string to use to generate a type.
func typeName(typ string) string {
if typ[0] == '(' {
ts := strings.Split(typ[1:len(typ)-1], ",")
if len(ts) != 2 {
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
log.Fatalf("Tuple expect 2 arguments")
}
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
return "types.NewTuple(" + typeName(ts[0]) + ", " + typeName(ts[1]) + ")"
}
switch typ {
case "Flags", "Mem", "Void", "Int128":
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
return "types.Type" + typ
default:
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
return "typ." + typ
}
}
// balance returns the number of unclosed '(' characters in s.
// If a ')' appears without a corresponding '(', balance returns -1.
func balance(s string) int {
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
balance := 0
for _, c := range s {
switch c {
case '(':
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
balance++
case ')':
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
balance--
if balance < 0 {
// don't allow ")(" to return 0
return -1
}
}
}
return balance
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
// findAllOpcode is a function to find the opcode portion of s-expressions.
var findAllOpcode = regexp.MustCompile(`[(](\w+[|])+\w+[)]`).FindAllStringIndex
// excludeFromExpansion reports whether the substring s[idx[0]:idx[1]] in a rule
// should be disregarded as a candidate for | expansion.
// It uses simple syntactic checks to see whether the substring
// is inside an AuxInt expression or inside the && conditions.
func excludeFromExpansion(s string, idx []int) bool {
left := s[:idx[0]]
if strings.LastIndexByte(left, '[') > strings.LastIndexByte(left, ']') {
// Inside an AuxInt expression.
return true
}
right := s[idx[1]:]
if strings.Contains(left, "&&") && strings.Contains(right, "=>") {
// Inside && conditions.
return true
}
return false
}
// expandOr converts a rule into multiple rules by expanding | ops.
func expandOr(r string) []string {
// Find every occurrence of |-separated things.
// They look like MOV(B|W|L|Q|SS|SD)load or MOV(Q|L)loadidx(1|8).
// Generate rules selecting one case from each |-form.
// Count width of |-forms. They must match.
n := 1
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
for _, idx := range findAllOpcode(r, -1) {
if excludeFromExpansion(r, idx) {
continue
}
s := r[idx[0]:idx[1]]
c := strings.Count(s, "|") + 1
if c == 1 {
continue
}
if n > 1 && n != c {
log.Fatalf("'|' count doesn't match in %s: both %d and %d\n", r, n, c)
}
n = c
}
if n == 1 {
// No |-form in this rule.
return []string{r}
}
// Build each new rule.
res := make([]string, n)
for i := 0; i < n; i++ {
buf := new(strings.Builder)
x := 0
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
for _, idx := range findAllOpcode(r, -1) {
if excludeFromExpansion(r, idx) {
continue
}
buf.WriteString(r[x:idx[0]]) // write bytes we've skipped over so far
s := r[idx[0]+1 : idx[1]-1] // remove leading "(" and trailing ")"
buf.WriteString(strings.Split(s, "|")[i]) // write the op component for this rule
x = idx[1] // note that we've written more bytes
}
buf.WriteString(r[x:])
res[i] = buf.String()
}
return res
}
cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-03-30 03:30:22 +00:00
// varCount returns a map which counts the number of occurrences of
// Value variables in the s-expression rr.Match and the Go expression rr.Cond.
func varCount(rr *RuleRewrite) map[string]int {
cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-03-30 03:30:22 +00:00
cnt := map[string]int{}
varCount1(rr.Loc, rr.Match, cnt)
if rr.Cond != "" {
expr, err := parser.ParseExpr(rr.Cond)
if err != nil {
log.Fatalf("%s: failed to parse cond %q: %v", rr.Loc, rr.Cond, err)
}
ast.Inspect(expr, func(n ast.Node) bool {
if id, ok := n.(*ast.Ident); ok {
cnt[id.Name]++
}
return true
})
}
cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-03-30 03:30:22 +00:00
return cnt
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
func varCount1(loc, m string, cnt map[string]int) {
cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-03-30 03:30:22 +00:00
if m[0] == '<' || m[0] == '[' || m[0] == '{' {
return
}
cmd/compile: initial rulegen rewrite rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-16 11:26:40 +01:00
if token.IsIdentifier(m) {
cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-03-30 03:30:22 +00:00
cnt[m]++
return
}
// Split up input.
name, expr := splitNameExpr(m)
if name != "" {
cnt[name]++
cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-03-30 03:30:22 +00:00
}
if expr[0] != '(' || expr[len(expr)-1] != ')' {
log.Fatalf("%s: non-compound expr in varCount1: %q", loc, expr)
cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-03-30 03:30:22 +00:00
}
s := split(expr[1 : len(expr)-1])
cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-03-30 03:30:22 +00:00
for _, arg := range s[1:] {
varCount1(loc, arg, cnt)
cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-03-30 03:30:22 +00:00
}
}
// normalizeWhitespace replaces 2+ whitespace sequences with a single space.
func normalizeWhitespace(x string) string {
x = strings.Join(strings.Fields(x), " ")
x = strings.Replace(x, "( ", "(", -1)
x = strings.Replace(x, " )", ")", -1)
x = strings.Replace(x, "[ ", "[", -1)
x = strings.Replace(x, " ]", "]", -1)
cmd/compile: start implementing strongly typed aux and auxint fields Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-19 16:25:08 -07:00
x = strings.Replace(x, ")=>", ") =>", -1)
cmd/compile: automatically handle commuting ops in rewrite rules Note that this is a redo of an undo of the original buggy CL 38666. We have lots of rewrite rules that vary only in the fact that we have 2 versions for the 2 different orderings of various commuting ops. For example: (ADDL x (MOVLconst [c])) -> (ADDLconst [c] x) (ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x) It can get unwieldly quickly, especially when there is more than one commuting op in a rule. Our existing "fix" for this problem is to have rules that canonicalize the operations first. For example: (Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x) Subsequent rules can then assume if there is a constant arg to Eq64, it will be the first one. This fix kinda works, but it is fragile and only works when we remember to include the required extra rules. The fundamental problem is that the rule matcher doesn't know anything about commuting ops. This CL fixes that fact. We already have information about which ops commute. (The register allocator takes advantage of commutivity.) The rule generator now automatically generates multiple rules for a single source rule when there are commutative ops in the rule. We can now drop all of our almost-duplicate source-level rules and the canonicalization rules. I have some CLs in progress that will be a lot less verbose when the rule generator handles commutivity for me. I had to reorganize the load-combining rules a bit. The 8-way OR rules generated 128 different reorderings, which was causing the generator to put too much code in the rewrite*.go files (the big ones were going from 25K lines to 132K lines). Instead I reorganized the rules to combine pairs of loads at a time. The generated rule files are now actually a bit (5%) smaller. Make.bash times are ~unchanged. Compiler benchmarks are not observably different. Probably because we don't spend much compiler time in rule matching anyway. I've also done a pass over all of our ops adding commutative markings for ops which hadn't had them previously. Fixes #18292 Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805 Reviewed-on: https://go-review.googlesource.com/38801 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-03-30 03:30:22 +00:00
return x
}
// opIsCommutative reports whether op s is commutative.
func opIsCommutative(op string, arch arch) bool {
for _, x := range genericOps {
if op == x.name {
if x.commutative {
return true
}
break
}
}
if arch.name != "generic" {
for _, x := range arch.ops {
if op == x.name {
if x.commutative {
return true
}
break
}
}
}
return false
}
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
func normalizeMatch(m string, arch arch) string {
if token.IsIdentifier(m) {
return m
}
op, typ, auxint, aux, args := extract(m)
if opIsCommutative(op, arch) {
if args[1] < args[0] {
args[0], args[1] = args[1], args[0]
}
}
s := new(strings.Builder)
fmt.Fprintf(s, "%s <%s> [%s] {%s}", op, typ, auxint, aux)
for _, arg := range args {
prefix, expr := splitNameExpr(arg)
fmt.Fprint(s, " ", prefix, normalizeMatch(expr, arch))
cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-06 22:24:02 -08:00
}
return s.String()
}
func parseEllipsisRules(rules []Rule, arch arch) (newop string, ok bool) {
if len(rules) != 1 {
2020-01-23 14:33:26 -08:00
for _, r := range rules {
if strings.Contains(r.Rule, "...") {
log.Fatalf("%s: found ellipsis in rule, but there are other rules with the same op", r.Loc)
2020-01-23 14:33:26 -08:00
}
}
return "", false
}
rule := rules[0]
match, cond, result := rule.parse()
if cond != "" || !isEllipsisValue(match) || !isEllipsisValue(result) {
if strings.Contains(rule.Rule, "...") {
log.Fatalf("%s: found ellipsis in non-ellipsis rule", rule.Loc)
2020-01-23 14:33:26 -08:00
}
checkEllipsisRuleCandidate(rule, arch)
return "", false
}
op, oparch, _, _, _, _ := parseValue(result, arch, rule.Loc)
return fmt.Sprintf("Op%s%s", oparch, op.name), true
}
// isEllipsisValue reports whether s is of the form (OpX ...).
func isEllipsisValue(s string) bool {
if len(s) < 2 || s[0] != '(' || s[len(s)-1] != ')' {
return false
}
c := split(s[1 : len(s)-1])
if len(c) != 2 || c[1] != "..." {
return false
}
return true
}
cmd/compile: reduce bounds checks in generated rewrite rules CL 213703 converted generated rewrite rules for commutative ops to use loops instead of duplicated code. However, it loaded args using expressions like v.Args[i] and v.Args[i^1], which the compiler could not eliminate bounds for (including with all outstanding prove CLs). Also, given a series of separate rewrite rules for the same op, we generated bounds checks for every rewrite rule, even though we were repeatedly loading the same set of args. This change reduces both sets of bounds checks. Instead of loading v.Args[i] and v.Args[i^1] for commutative loops, we now preload v.Args[0] and v.Args[1] into local variables, and then swap them (as needed) in the commutative loop post statement. And we now load all top level v.Args into local variables at the beginning of every rewrite rule function. The second optimization is the more significant, but the first helps a little, and they play together nicely from the perspective of generating the code. This does increase register pressure, but the reduced bounds checks more than compensate. Note that the vast majority of rewrite rules evaluated are not applied, so the prologue is the most important part of the rewrite rules. There is one subtle aspect to the new generated code. Because the top level v.Args are shared across rewrite rules, and rule evaluation can swap v_0 and v_1, v_0 and v_1 can end up being swapped from one rule to the next. That is OK, because any time a rule does not get applied, they will have been swapped exactly twice. Passes toolstash-check -all. name old time/op new time/op delta Template 213ms ± 2% 211ms ± 2% -0.85% (p=0.000 n=92+96) Unicode 83.5ms ± 2% 83.2ms ± 2% -0.41% (p=0.004 n=95+90) GoTypes 737ms ± 2% 733ms ± 2% -0.51% (p=0.000 n=91+94) Compiler 3.45s ± 2% 3.43s ± 2% -0.44% (p=0.000 n=99+100) SSA 8.54s ± 1% 8.32s ± 2% -2.56% (p=0.000 n=96+99) Flate 136ms ± 2% 135ms ± 1% -0.47% (p=0.000 n=96+96) GoParser 169ms ± 1% 168ms ± 1% -0.33% (p=0.000 n=96+93) Reflect 456ms ± 3% 455ms ± 3% ~ (p=0.261 n=95+94) Tar 186ms ± 2% 185ms ± 2% -0.48% (p=0.000 n=94+95) XML 251ms ± 1% 250ms ± 1% -0.51% (p=0.000 n=91+94) [Geo mean] 424ms 421ms -0.68% name old user-time/op new user-time/op delta Template 275ms ± 1% 274ms ± 2% -0.55% (p=0.000 n=95+98) Unicode 118ms ± 4% 118ms ± 4% ~ (p=0.642 n=98+90) GoTypes 983ms ± 1% 980ms ± 1% -0.30% (p=0.000 n=93+93) Compiler 4.56s ± 6% 4.52s ± 6% -0.72% (p=0.003 n=100+100) SSA 11.4s ± 1% 11.1s ± 1% -2.50% (p=0.000 n=96+97) Flate 168ms ± 1% 167ms ± 1% -0.49% (p=0.000 n=92+92) GoParser 204ms ± 1% 204ms ± 2% -0.27% (p=0.003 n=99+96) Reflect 599ms ± 2% 598ms ± 2% ~ (p=0.116 n=95+92) Tar 227ms ± 2% 225ms ± 2% -0.57% (p=0.000 n=95+98) XML 313ms ± 2% 312ms ± 1% -0.37% (p=0.000 n=89+95) [Geo mean] 547ms 544ms -0.61% file before after Δ % compile 21113112 21109016 -4096 -0.019% total 131704940 131700844 -4096 -0.003% Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956 Reviewed-on: https://go-review.googlesource.com/c/go/+/216218 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-20 20:09:41 -08:00
2020-01-23 14:33:26 -08:00
func checkEllipsisRuleCandidate(rule Rule, arch arch) {
match, cond, result := rule.parse()
2020-01-23 14:33:26 -08:00
if cond != "" {
return
}
op, _, _, auxint, aux, args := parseValue(match, arch, rule.Loc)
2020-01-23 14:33:26 -08:00
var auxint2, aux2 string
var args2 []string
var usingCopy string
var eop opData
2020-01-23 14:33:26 -08:00
if result[0] != '(' {
// Check for (Foo x) => x, which can be converted to (Foo ...) => (Copy ...).
2020-01-23 14:33:26 -08:00
args2 = []string{result}
usingCopy = " using Copy"
} else {
eop, _, _, auxint2, aux2, args2 = parseValue(result, arch, rule.Loc)
2020-01-23 14:33:26 -08:00
}
// Check that all restrictions in match are reproduced exactly in result.
if aux != aux2 || auxint != auxint2 || len(args) != len(args2) {
return
}
if strings.Contains(rule.Rule, "=>") && op.aux != eop.aux {
return
}
2020-01-23 14:33:26 -08:00
for i := range args {
if args[i] != args2[i] {
return
}
}
switch {
case opHasAux(op) && aux == "" && aux2 == "":
fmt.Printf("%s: rule silently zeros aux, either copy aux or explicitly zero\n", rule.Loc)
2020-01-23 14:33:26 -08:00
case opHasAuxInt(op) && auxint == "" && auxint2 == "":
fmt.Printf("%s: rule silently zeros auxint, either copy auxint or explicitly zero\n", rule.Loc)
2020-01-23 14:33:26 -08:00
default:
fmt.Printf("%s: possible ellipsis rule candidate%s: %q\n", rule.Loc, usingCopy, rule.Rule)
2020-01-23 14:33:26 -08:00
}
}
cmd/compile: reduce bounds checks in generated rewrite rules CL 213703 converted generated rewrite rules for commutative ops to use loops instead of duplicated code. However, it loaded args using expressions like v.Args[i] and v.Args[i^1], which the compiler could not eliminate bounds for (including with all outstanding prove CLs). Also, given a series of separate rewrite rules for the same op, we generated bounds checks for every rewrite rule, even though we were repeatedly loading the same set of args. This change reduces both sets of bounds checks. Instead of loading v.Args[i] and v.Args[i^1] for commutative loops, we now preload v.Args[0] and v.Args[1] into local variables, and then swap them (as needed) in the commutative loop post statement. And we now load all top level v.Args into local variables at the beginning of every rewrite rule function. The second optimization is the more significant, but the first helps a little, and they play together nicely from the perspective of generating the code. This does increase register pressure, but the reduced bounds checks more than compensate. Note that the vast majority of rewrite rules evaluated are not applied, so the prologue is the most important part of the rewrite rules. There is one subtle aspect to the new generated code. Because the top level v.Args are shared across rewrite rules, and rule evaluation can swap v_0 and v_1, v_0 and v_1 can end up being swapped from one rule to the next. That is OK, because any time a rule does not get applied, they will have been swapped exactly twice. Passes toolstash-check -all. name old time/op new time/op delta Template 213ms ± 2% 211ms ± 2% -0.85% (p=0.000 n=92+96) Unicode 83.5ms ± 2% 83.2ms ± 2% -0.41% (p=0.004 n=95+90) GoTypes 737ms ± 2% 733ms ± 2% -0.51% (p=0.000 n=91+94) Compiler 3.45s ± 2% 3.43s ± 2% -0.44% (p=0.000 n=99+100) SSA 8.54s ± 1% 8.32s ± 2% -2.56% (p=0.000 n=96+99) Flate 136ms ± 2% 135ms ± 1% -0.47% (p=0.000 n=96+96) GoParser 169ms ± 1% 168ms ± 1% -0.33% (p=0.000 n=96+93) Reflect 456ms ± 3% 455ms ± 3% ~ (p=0.261 n=95+94) Tar 186ms ± 2% 185ms ± 2% -0.48% (p=0.000 n=94+95) XML 251ms ± 1% 250ms ± 1% -0.51% (p=0.000 n=91+94) [Geo mean] 424ms 421ms -0.68% name old user-time/op new user-time/op delta Template 275ms ± 1% 274ms ± 2% -0.55% (p=0.000 n=95+98) Unicode 118ms ± 4% 118ms ± 4% ~ (p=0.642 n=98+90) GoTypes 983ms ± 1% 980ms ± 1% -0.30% (p=0.000 n=93+93) Compiler 4.56s ± 6% 4.52s ± 6% -0.72% (p=0.003 n=100+100) SSA 11.4s ± 1% 11.1s ± 1% -2.50% (p=0.000 n=96+97) Flate 168ms ± 1% 167ms ± 1% -0.49% (p=0.000 n=92+92) GoParser 204ms ± 1% 204ms ± 2% -0.27% (p=0.003 n=99+96) Reflect 599ms ± 2% 598ms ± 2% ~ (p=0.116 n=95+92) Tar 227ms ± 2% 225ms ± 2% -0.57% (p=0.000 n=95+98) XML 313ms ± 2% 312ms ± 1% -0.37% (p=0.000 n=89+95) [Geo mean] 547ms 544ms -0.61% file before after Δ % compile 21113112 21109016 -4096 -0.019% total 131704940 131700844 -4096 -0.003% Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956 Reviewed-on: https://go-review.googlesource.com/c/go/+/216218 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-01-20 20:09:41 -08:00
func opByName(arch arch, name string) opData {
name = name[2:]
for _, x := range genericOps {
if name == x.name {
return x
}
}
if arch.name != "generic" {
name = name[len(arch.name):]
for _, x := range arch.ops {
if name == x.name {
return x
}
}
}
log.Fatalf("failed to find op named %s in arch %s", name, arch.name)
panic("unreachable")
}
cmd/compile: start implementing strongly typed aux and auxint fields Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-19 16:25:08 -07:00
// auxType returns the Go type that this operation should store in its aux field.
func (op opData) auxType() string {
switch op.aux {
case "String":
return "string"
case "Sym":
// Note: a Sym can be an *obj.LSym, a *gc.Node, or nil.
return "Sym"
cmd/compile: start implementing strongly typed aux and auxint fields Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-19 16:25:08 -07:00
case "SymOff":
return "Sym"
case "Call":
return "Call"
case "CallOff":
return "Call"
cmd/compile: start implementing strongly typed aux and auxint fields Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-19 16:25:08 -07:00
case "SymValAndOff":
return "Sym"
cmd/compile: start implementing strongly typed aux and auxint fields Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-19 16:25:08 -07:00
case "Typ":
return "*types.Type"
case "TypSize":
return "*types.Type"
case "S390XCCMask":
return "s390x.CCMask"
case "S390XRotateParams":
return "s390x.RotateParams"
cmd/compile: start implementing strongly typed aux and auxint fields Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-19 16:25:08 -07:00
default:
return "invalid"
}
}
// auxIntType returns the Go type that this operation should store in its auxInt field.
func (op opData) auxIntType() string {
switch op.aux {
case "Bool":
return "bool"
cmd/compile: start implementing strongly typed aux and auxint fields Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-19 16:25:08 -07:00
case "Int8":
return "int8"
case "Int16":
return "int16"
case "Int32":
return "int32"
case "Int64":
return "int64"
case "Int128":
return "int128"
case "UInt8":
return "uint8"
case "Float32":
return "float32"
case "Float64":
return "float64"
case "CallOff":
return "int32"
cmd/compile: start implementing strongly typed aux and auxint fields Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-19 16:25:08 -07:00
case "SymOff":
return "int32"
case "SymValAndOff":
return "ValAndOff"
case "TypSize":
return "int64"
case "CCop":
return "Op"
case "FlagConstant":
return "flagConstant"
case "ARM64BitField":
return "arm64BitField"
cmd/compile: start implementing strongly typed aux and auxint fields Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-03-19 16:25:08 -07:00
default:
return "invalid"
}
}
// auxType returns the Go type that this block should store in its aux field.
func (b blockData) auxType() string {
switch b.aux {
cmd/compile: implement jump tables Performance is kind of hard to exactly quantify. One big difference between jump tables and the old binary search scheme is that there's only 1 branch statement instead of O(n) of them. That can be both a blessing and a curse, and can make evaluating jump tables very hard to do. The single branch can become a choke point for the hardware branch predictor. A branch table jump must fit all of its state in a single branch predictor entry (technically, a branch target predictor entry). With binary search that predictor state can be spread among lots of entries. In cases where the case selection is repetitive and thus predictable, binary search can perform better. The big win for a jump table is that it doesn't consume so much of the branch predictor's resources. But that benefit is essentially never observed in microbenchmarks, because the branch predictor can easily keep state for all the binary search branches in a microbenchmark. So that benefit is really hard to measure. So predictable switch microbenchmarks are ~useless - they will almost always favor the binary search scheme. Fully unpredictable switch microbenchmarks are better, as they aren't lying to us quite so much. In a perfectly unpredictable situation, a jump table will expect to incur 1-1/N branch mispredicts, where a binary search would incur lg(N)/2 of them. That makes the crossover point at about N=4. But of course switches in real programs are seldom fully unpredictable, so we'll use a higher crossover point. Beyond the branch predictor, jump tables tend to execute more instructions per switch but have no additional instructions per case, which also argues for a larger crossover. As far as code size goes, with this CL cmd/go has a slightly smaller code segment and a slightly larger overall size (from the jump tables themselves which live in the data segment). This is a case where some FDO (feedback-directed optimization) would be really nice to have. #28262 Some large-program benchmarks might help make the case for this CL. Especially if we can turn on branch mispredict counters so we can see how much using jump tables can free up branch prediction resources that can be gainfully used elsewhere in the program. name old time/op new time/op delta Switch8Predictable 1.89ns ± 2% 1.27ns ± 3% -32.58% (p=0.000 n=9+10) Switch8Unpredictable 9.33ns ± 1% 7.50ns ± 1% -19.60% (p=0.000 n=10+9) Switch32Predictable 2.20ns ± 2% 1.64ns ± 1% -25.39% (p=0.000 n=10+9) Switch32Unpredictable 10.0ns ± 2% 7.6ns ± 2% -24.04% (p=0.000 n=10+10) Fixes #5496 Update #34381 Change-Id: I3ff56011d02be53f605ca5fd3fb96b905517c34f Reviewed-on: https://go-review.googlesource.com/c/go/+/357330 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com>
2021-10-04 12:17:46 -07:00
case "Sym":
return "Sym"
case "S390XCCMask", "S390XCCMaskInt8", "S390XCCMaskUint8":
return "s390x.CCMask"
case "S390XRotateParams":
return "s390x.RotateParams"
default:
return "invalid"
}
}
// auxIntType returns the Go type that this block should store in its auxInt field.
func (b blockData) auxIntType() string {
switch b.aux {
case "S390XCCMaskInt8":
return "int8"
case "S390XCCMaskUint8":
return "uint8"
case "Int64":
return "int64"
default:
return "invalid"
}
}
func title(s string) string {
if i := strings.Index(s, "."); i >= 0 {
switch strings.ToLower(s[:i]) {
case "s390x": // keep arch prefix for clarity
s = s[:i] + s[i+1:]
default:
s = s[i+1:]
}
}
return strings.Title(s)
}
func unTitle(s string) string {
if i := strings.Index(s, "."); i >= 0 {
switch strings.ToLower(s[:i]) {
case "s390x": // keep arch prefix for clarity
s = s[:i] + s[i+1:]
default:
s = s[i+1:]
}
}
return strings.ToLower(s[:1]) + s[1:]
}