go/src/cmd/compile/internal/gc/ssa.go

4860 lines
151 KiB
Go
Raw Normal View History

// Copyright 2015 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package gc
import (
"bytes"
"encoding/binary"
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
"fmt"
"html"
[dev.ssa] cmd/compile: add GOSSAFUNC and GOSSAPKG These temporary environment variables make it possible to enable using SSA-generated code for a particular function or package without having to rebuild the compiler. This makes it possible to start bulk testing SSA generated code. First, bump up the default stack size (_StackMin in runtime/stack2.go) to something large like 32768, because without stackmaps we can't grow stacks. Then run something like: for pkg in `go list std` do GOGC=off GOSSAPKG=`basename $pkg` go test -a $pkg done When a test fails, you can re-run those tests, selectively enabling one function after another, until you find the one that is causing trouble. Doing this right now yields some interesting results: * There are several packages for which we generate some code and whose tests pass. Yay! * We can generate code for encoding/base64, but tests there fail, so there's a bug to fix. * Attempting to build the runtime yields a panic during codegen: panic: interface conversion: ssa.Location is nil, not *ssa.LocalSlot * The top unimplemented codegen items are (simplified): 59 genValue not implemented: REPMOVSB 18 genValue not implemented: REPSTOSQ 14 genValue not implemented: SUBQ 9 branch not implemented: If v -> b b. Control: XORQconst <bool> [1] 8 genValue not implemented: MOVQstoreidx8 4 branch not implemented: If v -> b b. Control: SETG <bool> 3 branch not implemented: If v -> b b. Control: SETLE <bool> 2 load flags not implemented: LoadReg8 <flags> 2 genValue not implemented: InvertFlags <flags> 1 store flags not implemented: StoreReg8 <flags> 1 branch not implemented: If v -> b b. Control: SETGE <bool> Change-Id: Ib64809ac0c917e25bcae27829ae634c70d290c7f Reviewed-on: https://go-review.googlesource.com/12547 Reviewed-by: Keith Randall <khr@golang.org>
2015-07-22 13:13:53 -07:00
"os"
"sort"
"cmd/compile/internal/ssa"
"cmd/internal/obj"
"cmd/internal/src"
"cmd/internal/sys"
)
var ssaConfig *ssa.Config
var ssaExp ssaExport
func initssa() *ssa.Config {
if ssaConfig == nil {
ssaConfig = ssa.NewConfig(Thearch.LinkArch.Name, &ssaExp, Ctxt, Debug['N'] == 0)
if Thearch.LinkArch.Name == "386" {
ssaConfig.Set387(Thearch.Use387)
}
}
ssaConfig.HTML = nil
return ssaConfig
}
// buildssa builds an SSA function.
func buildssa(fn *Node) *ssa.Func {
name := fn.Func.Nname.Sym.Name
printssa := name == os.Getenv("GOSSAFUNC")
if printssa {
fmt.Println("generating SSA for", name)
dumplist("buildssa-enter", fn.Func.Enter)
dumplist("buildssa-body", fn.Nbody)
dumplist("buildssa-exit", fn.Func.Exit)
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
}
var s state
s.pushLine(fn.Pos)
defer s.popLine()
s.hasdefer = fn.Func.HasDefer()
if fn.Func.Pragma&CgoUnsafeArgs != 0 {
s.cgoUnsafeArgs = true
}
// TODO(khr): build config just once at the start of the compiler binary
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
ssaExp.log = printssa
s.config = initssa()
s.f = s.config.NewFunc()
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
s.f.Name = name
if fn.Func.Pragma&Nosplit != 0 {
s.f.NoSplit = true
}
if fn.Func.Pragma&Nowritebarrier != 0 {
s.f.NoWB = true
}
defer func() {
if s.f.WBPos.IsKnown() {
fn.Func.WBPos = s.f.WBPos
}
}()
s.exitCode = fn.Func.Exit
s.panics = map[funcLine]*ssa.Block{}
s.config.DebugTest = s.config.DebugHashMatch("GOSSAHASH", name)
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
if name == os.Getenv("GOSSAFUNC") {
// TODO: tempfile? it is handy to have the location
// of this file be stable, so you can just reload in the browser.
s.config.HTML = ssa.NewHTMLWriter("ssa.html", s.config, name)
// TODO: generate and print a mapping from nodes to values and blocks
}
// Allocate starting block
s.f.Entry = s.f.NewBlock(ssa.BlockPlain)
// Allocate starting values
s.labels = map[string]*ssaLabel{}
s.labeledNodes = map[*Node]*ssaLabel{}
s.fwdVars = map[*Node]*ssa.Value{}
s.startmem = s.entryNewValue0(ssa.OpInitMem, ssa.TypeMem)
s.sp = s.entryNewValue0(ssa.OpSP, Types[TUINTPTR]) // TODO: use generic pointer type (unsafe.Pointer?) instead
s.sb = s.entryNewValue0(ssa.OpSB, Types[TUINTPTR])
s.startBlock(s.f.Entry)
s.vars[&memVar] = s.startmem
s.varsyms = map[*Node]interface{}{}
// Generate addresses of local declarations
s.decladdrs = map[*Node]*ssa.Value{}
for _, n := range fn.Func.Dcl {
switch n.Class {
case PPARAM, PPARAMOUT:
aux := s.lookupSymbol(n, &ssa.ArgSymbol{Typ: n.Type, Node: n})
s.decladdrs[n] = s.entryNewValue1A(ssa.OpAddr, ptrto(n.Type), aux, s.sp)
if n.Class == PPARAMOUT && s.canSSA(n) {
// Save ssa-able PPARAMOUT variables so we can
// store them back to the stack at the end of
// the function.
s.returns = append(s.returns, n)
}
case PAUTO:
// processed at each use, to prevent Addr coming
// before the decl.
cmd/compile: fix liveness computation for heap-escaped parameters The liveness computation of parameters generally was never correct, but forcing all parameters to be live throughout the function covered up that problem. The new SSA back end is too clever: even though it currently keeps the parameter values live throughout the function, it may find optimizations that mean the current values are not written back to the original parameter stack slots immediately or ever (for example if a parameter is set to nil, SSA constant propagation may replace all later uses of the parameter with a constant nil, eliminating the need to write the nil value back to the stack slot), so the liveness code must now track the actual operations on the stack slots, exposing these problems. One small problem in the handling of arguments is that nodarg can return ONAME PPARAM nodes with adjusted offsets, so that there are actually multiple *Node pointers for the same parameter in the instruction stream. This might be possible to correct, but not in this CL. For now, we fix this by using n.Orig instead of n when considering PPARAM and PPARAMOUT nodes. The major problem in the handling of arguments is general confusion in the liveness code about the meaning of PPARAM|PHEAP and PPARAMOUT|PHEAP nodes, especially as contrasted with PAUTO|PHEAP. The difference between these two is that when a local variable "moves" to the heap, it's really just allocated there to start with; in contrast, when an argument moves to the heap, the actual data has to be copied there from the stack at the beginning of the function, and when a result "moves" to the heap the value in the heap has to be copied back to the stack when the function returns This general confusion is also present in the SSA back end. The PHEAP bit worked decently when I first introduced it 7 years ago (!) in 391425ae. The back end did nothing sophisticated, and in particular there was no analysis at all: no escape analysis, no liveness analysis, and certainly no SSA back end. But the complications caused in the various downstream consumers suggest that this should be a detail kept mainly in the front end. This CL therefore eliminates both the PHEAP bit and even the idea of "heap variables" from the back ends. First, it replaces the PPARAM|PHEAP, PPARAMOUT|PHEAP, and PAUTO|PHEAP variable classes with the single PAUTOHEAP, a pseudo-class indicating a variable maintained on the heap and available by indirecting a local variable kept on the stack (a plain PAUTO). Second, walkexpr replaces all references to PAUTOHEAP variables with indirections of the corresponding PAUTO variable. The back ends and the liveness code now just see plain indirected variables. This may actually produce better code, but the real goal here is to eliminate these little-used and somewhat suspect code paths in the back end analyses. The OPARAM node type goes away too. A followup CL will do the same to PPARAMREF. I'm not sure that the back ends (SSA in particular) are handling those right either, and with the framework established in this CL that change is trivial and the result clearly more correct. Fixes #15747. Change-Id: I2770b1ce3cbc93981bfc7166be66a9da12013d74 Reviewed-on: https://go-review.googlesource.com/23393 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-25 01:33:24 -04:00
case PAUTOHEAP:
// moved to heap - already handled by frontend
case PFUNC:
// local function - already handled by frontend
default:
s.Fatalf("local variable with class %s unimplemented", classnames[n.Class])
}
}
// Populate SSAable arguments.
for _, n := range fn.Func.Dcl {
if n.Class == PPARAM && s.canSSA(n) {
s.vars[n] = s.newValue0A(ssa.OpArg, n.Type, n)
}
}
// Convert the AST-based IR to the SSA-based IR
s.stmtList(fn.Func.Enter)
s.stmtList(fn.Nbody)
// fallthrough to exit
if s.curBlock != nil {
s.pushLine(fn.Func.Endlineno)
s.exit()
s.popLine()
}
if nerrors > 0 {
s.f.Free()
return nil
}
s.insertPhis()
// Don't carry reference this around longer than necessary
s.exitCode = Nodes{}
// Main call to ssa package to compile function
ssa.Compile(s.f)
if nerrors > 0 {
s.f.Free()
return nil
}
return s.f
}
type state struct {
// configuration (arch) information
config *ssa.Config
// function we're building
f *ssa.Func
// labels and labeled control flow nodes (OFOR, OFORUNTIL, OSWITCH, OSELECT) in f
labels map[string]*ssaLabel
labeledNodes map[*Node]*ssaLabel
// Code that must precede any return
// (e.g., copying value of heap-escaped paramout back to true paramout)
exitCode Nodes
// unlabeled break and continue statement tracking
breakTo *ssa.Block // current target for plain break statement
continueTo *ssa.Block // current target for plain continue statement
// current location where we're interpreting the AST
curBlock *ssa.Block
// variable assignments in the current block (map from variable symbol to ssa value)
// *Node is the unique identifier (an ONAME Node) for the variable.
// TODO: keep a single varnum map, then make all of these maps slices instead?
vars map[*Node]*ssa.Value
// fwdVars are variables that are used before they are defined in the current block.
// This map exists just to coalesce multiple references into a single FwdRef op.
// *Node is the unique identifier (an ONAME Node) for the variable.
fwdVars map[*Node]*ssa.Value
// all defined variables at the end of each block. Indexed by block ID.
defvars []map[*Node]*ssa.Value
// addresses of PPARAM and PPARAMOUT variables.
decladdrs map[*Node]*ssa.Value
// symbols for PEXTERN, PAUTO and PPARAMOUT variables so they can be reused.
varsyms map[*Node]interface{}
// starting values. Memory, stack pointer, and globals pointer
startmem *ssa.Value
sp *ssa.Value
sb *ssa.Value
// line number stack. The current line number is top of stack
2016-12-15 17:17:01 -08:00
line []src.XPos
// list of panic calls by function name and line number.
// Used to deduplicate panic calls.
panics map[funcLine]*ssa.Block
cmd/compile: fix liveness computation for heap-escaped parameters The liveness computation of parameters generally was never correct, but forcing all parameters to be live throughout the function covered up that problem. The new SSA back end is too clever: even though it currently keeps the parameter values live throughout the function, it may find optimizations that mean the current values are not written back to the original parameter stack slots immediately or ever (for example if a parameter is set to nil, SSA constant propagation may replace all later uses of the parameter with a constant nil, eliminating the need to write the nil value back to the stack slot), so the liveness code must now track the actual operations on the stack slots, exposing these problems. One small problem in the handling of arguments is that nodarg can return ONAME PPARAM nodes with adjusted offsets, so that there are actually multiple *Node pointers for the same parameter in the instruction stream. This might be possible to correct, but not in this CL. For now, we fix this by using n.Orig instead of n when considering PPARAM and PPARAMOUT nodes. The major problem in the handling of arguments is general confusion in the liveness code about the meaning of PPARAM|PHEAP and PPARAMOUT|PHEAP nodes, especially as contrasted with PAUTO|PHEAP. The difference between these two is that when a local variable "moves" to the heap, it's really just allocated there to start with; in contrast, when an argument moves to the heap, the actual data has to be copied there from the stack at the beginning of the function, and when a result "moves" to the heap the value in the heap has to be copied back to the stack when the function returns This general confusion is also present in the SSA back end. The PHEAP bit worked decently when I first introduced it 7 years ago (!) in 391425ae. The back end did nothing sophisticated, and in particular there was no analysis at all: no escape analysis, no liveness analysis, and certainly no SSA back end. But the complications caused in the various downstream consumers suggest that this should be a detail kept mainly in the front end. This CL therefore eliminates both the PHEAP bit and even the idea of "heap variables" from the back ends. First, it replaces the PPARAM|PHEAP, PPARAMOUT|PHEAP, and PAUTO|PHEAP variable classes with the single PAUTOHEAP, a pseudo-class indicating a variable maintained on the heap and available by indirecting a local variable kept on the stack (a plain PAUTO). Second, walkexpr replaces all references to PAUTOHEAP variables with indirections of the corresponding PAUTO variable. The back ends and the liveness code now just see plain indirected variables. This may actually produce better code, but the real goal here is to eliminate these little-used and somewhat suspect code paths in the back end analyses. The OPARAM node type goes away too. A followup CL will do the same to PPARAMREF. I'm not sure that the back ends (SSA in particular) are handling those right either, and with the framework established in this CL that change is trivial and the result clearly more correct. Fixes #15747. Change-Id: I2770b1ce3cbc93981bfc7166be66a9da12013d74 Reviewed-on: https://go-review.googlesource.com/23393 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-25 01:33:24 -04:00
// list of PPARAMOUT (return) variables.
returns []*Node
// A dummy value used during phi construction.
placeholder *ssa.Value
cgoUnsafeArgs bool
hasdefer bool // whether the function contains a defer statement
}
type funcLine struct {
f *obj.LSym
2016-12-15 17:17:01 -08:00
line src.XPos
}
type ssaLabel struct {
target *ssa.Block // block identified by this label
breakTarget *ssa.Block // block to break to in control flow node identified by this label
continueTarget *ssa.Block // block to continue to in control flow node identified by this label
}
// label returns the label associated with sym, creating it if necessary.
func (s *state) label(sym *Sym) *ssaLabel {
lab := s.labels[sym.Name]
if lab == nil {
lab = new(ssaLabel)
s.labels[sym.Name] = lab
}
return lab
}
func (s *state) Logf(msg string, args ...interface{}) { s.config.Logf(msg, args...) }
func (s *state) Log() bool { return s.config.Log() }
func (s *state) Fatalf(msg string, args ...interface{}) { s.config.Fatalf(s.peekPos(), msg, args...) }
2016-12-15 17:17:01 -08:00
func (s *state) Warnl(pos src.XPos, msg string, args ...interface{}) {
[dev.inline] cmd/internal/src: replace src.Pos with syntax.Pos This replaces the src.Pos LineHist-based position tracking with the syntax.Pos implementation and updates all uses. The LineHist table is not used anymore - the respective code is still there but should be removed eventually. CL forthcoming. Passes toolstash -cmp when comparing to the master repo (with the exception of a couple of swapped assembly instructions, likely due to different instruction scheduling because the line-based sorting has changed; though this is won't affect correctness). The sizes of various important compiler data structures have increased significantly (see the various sizes_test.go files); this is probably the reason for an increase of compilation times (to be addressed). Here are the results of compilebench -count 5, run on a "quiet" machine (no apps running besides a terminal): name old time/op new time/op delta Template 256ms ± 1% 280ms ±15% +9.54% (p=0.008 n=5+5) Unicode 132ms ± 1% 132ms ± 1% ~ (p=0.690 n=5+5) GoTypes 891ms ± 1% 917ms ± 2% +2.88% (p=0.008 n=5+5) Compiler 3.84s ± 2% 3.99s ± 2% +3.95% (p=0.016 n=5+5) MakeBash 47.1s ± 1% 47.2s ± 2% ~ (p=0.841 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 326M ± 2% +5.18% (p=0.008 n=5+5) Unicode 165M ± 1% 168M ± 4% ~ (p=0.421 n=5+5) GoTypes 1.14G ± 2% 1.18G ± 1% +3.47% (p=0.008 n=5+5) Compiler 5.00G ± 1% 5.16G ± 1% +3.12% (p=0.008 n=5+5) Change-Id: I241c4246cdff627d7ecb95cac23060b38f9775ec Reviewed-on: https://go-review.googlesource.com/34273 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09 17:15:05 -08:00
s.config.Warnl(pos, msg, args...)
}
func (s *state) Debug_checknil() bool { return s.config.Debug_checknil() }
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
var (
// dummy node for the memory variable
memVar = Node{Op: ONAME, Class: Pxxx, Sym: &Sym{Name: "mem"}}
// dummy nodes for temporary variables
cmd/compile: avoid a spill in append fast path Instead of spilling newlen, recalculate it. This removes a spill from the fast path, at the cost of a cheap recalculation on the (rare) growth path. This uses 8 bytes less of stack space. It generates two more bytes of code, but that is due to suboptimal register allocation; see far below. Runtime append microbenchmarks are all over the map, presumably due to incidental code movement. Sample code: func s(b []byte) []byte { b = append(b, 1, 2, 3) return b } Before: "".s t=1 size=160 args=0x30 locals=0x48 0x0000 00000 (append.go:8) TEXT "".s(SB), $72-48 0x0000 00000 (append.go:8) MOVQ (TLS), CX 0x0009 00009 (append.go:8) CMPQ SP, 16(CX) 0x000d 00013 (append.go:8) JLS 149 0x0013 00019 (append.go:8) SUBQ $72, SP 0x0017 00023 (append.go:8) FUNCDATA $0, gclocals·6432f8c6a0d23fa7bee6c5d96f21a92a(SB) 0x0017 00023 (append.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:9) MOVQ "".b+88(FP), CX 0x001c 00028 (append.go:9) LEAQ 3(CX), DX 0x0020 00032 (append.go:9) MOVQ DX, "".autotmp_0+64(SP) 0x0025 00037 (append.go:9) MOVQ "".b+96(FP), BX 0x002a 00042 (append.go:9) CMPQ DX, BX 0x002d 00045 (append.go:9) JGT $0, 86 0x002f 00047 (append.go:8) MOVQ "".b+80(FP), AX 0x0034 00052 (append.go:9) MOVB $1, (AX)(CX*1) 0x0038 00056 (append.go:9) MOVB $2, 1(AX)(CX*1) 0x003d 00061 (append.go:9) MOVB $3, 2(AX)(CX*1) 0x0042 00066 (append.go:10) MOVQ AX, "".~r1+104(FP) 0x0047 00071 (append.go:10) MOVQ DX, "".~r1+112(FP) 0x004c 00076 (append.go:10) MOVQ BX, "".~r1+120(FP) 0x0051 00081 (append.go:10) ADDQ $72, SP 0x0055 00085 (append.go:10) RET 0x0056 00086 (append.go:9) LEAQ type.[]uint8(SB), AX 0x005d 00093 (append.go:9) MOVQ AX, (SP) 0x0061 00097 (append.go:9) MOVQ "".b+80(FP), BP 0x0066 00102 (append.go:9) MOVQ BP, 8(SP) 0x006b 00107 (append.go:9) MOVQ CX, 16(SP) 0x0070 00112 (append.go:9) MOVQ BX, 24(SP) 0x0075 00117 (append.go:9) MOVQ DX, 32(SP) 0x007a 00122 (append.go:9) PCDATA $0, $0 0x007a 00122 (append.go:9) CALL runtime.growslice(SB) 0x007f 00127 (append.go:9) MOVQ 40(SP), AX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:8) MOVQ "".b+88(FP), CX 0x008e 00142 (append.go:9) MOVQ "".autotmp_0+64(SP), DX 0x0093 00147 (append.go:9) JMP 52 0x0095 00149 (append.go:9) NOP 0x0095 00149 (append.go:8) CALL runtime.morestack_noctxt(SB) 0x009a 00154 (append.go:8) JMP 0 After: "".s t=1 size=176 args=0x30 locals=0x40 0x0000 00000 (append.go:8) TEXT "".s(SB), $64-48 0x0000 00000 (append.go:8) MOVQ (TLS), CX 0x0009 00009 (append.go:8) CMPQ SP, 16(CX) 0x000d 00013 (append.go:8) JLS 151 0x0013 00019 (append.go:8) SUBQ $64, SP 0x0017 00023 (append.go:8) FUNCDATA $0, gclocals·6432f8c6a0d23fa7bee6c5d96f21a92a(SB) 0x0017 00023 (append.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:9) MOVQ "".b+80(FP), CX 0x001c 00028 (append.go:9) LEAQ 3(CX), DX 0x0020 00032 (append.go:9) MOVQ "".b+88(FP), BX 0x0025 00037 (append.go:9) CMPQ DX, BX 0x0028 00040 (append.go:9) JGT $0, 81 0x002a 00042 (append.go:8) MOVQ "".b+72(FP), AX 0x002f 00047 (append.go:9) MOVB $1, (AX)(CX*1) 0x0033 00051 (append.go:9) MOVB $2, 1(AX)(CX*1) 0x0038 00056 (append.go:9) MOVB $3, 2(AX)(CX*1) 0x003d 00061 (append.go:10) MOVQ AX, "".~r1+96(FP) 0x0042 00066 (append.go:10) MOVQ DX, "".~r1+104(FP) 0x0047 00071 (append.go:10) MOVQ BX, "".~r1+112(FP) 0x004c 00076 (append.go:10) ADDQ $64, SP 0x0050 00080 (append.go:10) RET 0x0051 00081 (append.go:9) LEAQ type.[]uint8(SB), AX 0x0058 00088 (append.go:9) MOVQ AX, (SP) 0x005c 00092 (append.go:9) MOVQ "".b+72(FP), BP 0x0061 00097 (append.go:9) MOVQ BP, 8(SP) 0x0066 00102 (append.go:9) MOVQ CX, 16(SP) 0x006b 00107 (append.go:9) MOVQ BX, 24(SP) 0x0070 00112 (append.go:9) MOVQ DX, 32(SP) 0x0075 00117 (append.go:9) PCDATA $0, $0 0x0075 00117 (append.go:9) CALL runtime.growslice(SB) 0x007a 00122 (append.go:9) MOVQ 40(SP), AX 0x007f 00127 (append.go:9) MOVQ 48(SP), CX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:9) ADDQ $3, CX 0x008d 00141 (append.go:9) MOVQ CX, DX 0x0090 00144 (append.go:8) MOVQ "".b+80(FP), CX 0x0095 00149 (append.go:9) JMP 47 0x0097 00151 (append.go:9) NOP 0x0097 00151 (append.go:8) CALL runtime.morestack_noctxt(SB) 0x009c 00156 (append.go:8) JMP 0 Observe that in the following sequence, we should use DX directly instead of using CX as a temporary register, which would make the new code a strict improvement on the old: 0x007f 00127 (append.go:9) MOVQ 48(SP), CX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:9) ADDQ $3, CX 0x008d 00141 (append.go:9) MOVQ CX, DX 0x0090 00144 (append.go:8) MOVQ "".b+80(FP), CX Change-Id: I4ee50b18fa53865901d2d7f86c2cbb54c6fa6924 Reviewed-on: https://go-review.googlesource.com/21812 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:08:00 -07:00
ptrVar = Node{Op: ONAME, Class: Pxxx, Sym: &Sym{Name: "ptr"}}
cmd/compile: re-enable in-place append optimization CL 21891 was too clever in its attempts to avoid spills. Storing newlen too early caused uses of append in the runtime itself to receive an inconsistent view of a slice, leading to corruption. This CL makes the generate code much more similar to the old backend. It spills more than before, but those spills have been contained to the grow path. It recalculates newlen unnecessarily on the fast path, but that's measurably cheaper than spilling it. CL 21891 caused runtime failures in 6 of 2000 runs of net/http and crypto/x509 in my test setup. This CL has gone 6000 runs without a failure. Benchmarks going from master to this CL: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 439ns ± 2% 436ns ± 2% -0.72% (p=0.001 n=28+27) AppendInPlace/NoGrow/1Ptr-8 901ns ± 0% 856ns ± 0% -4.95% (p=0.000 n=26+29) AppendInPlace/NoGrow/2Ptr-8 2.15µs ± 1% 1.95µs ± 0% -9.07% (p=0.000 n=28+30) AppendInPlace/NoGrow/3Ptr-8 2.66µs ± 0% 2.45µs ± 0% -7.93% (p=0.000 n=29+26) AppendInPlace/NoGrow/4Ptr-8 3.24µs ± 1% 3.02µs ± 1% -6.75% (p=0.000 n=28+30) AppendInPlace/Grow/Byte-8 269ns ± 1% 271ns ± 1% +0.84% (p=0.000 n=30+29) AppendInPlace/Grow/1Ptr-8 275ns ± 1% 280ns ± 1% +1.75% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 384ns ± 0% 391ns ± 0% +1.94% (p=0.000 n=27+30) AppendInPlace/Grow/3Ptr-8 455ns ± 0% 462ns ± 0% +1.43% (p=0.000 n=29+29) AppendInPlace/Grow/4Ptr-8 478ns ± 0% 479ns ± 0% +0.23% (p=0.000 n=30+27) However, for the large no-grow cases, there is still more work to be done. Going from this CL to the non-SSA backend: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 436ns ± 2% 436ns ± 2% ~ (p=0.967 n=27+29) AppendInPlace/NoGrow/1Ptr-8 856ns ± 0% 884ns ± 0% +3.28% (p=0.000 n=29+26) AppendInPlace/NoGrow/2Ptr-8 1.95µs ± 0% 1.56µs ± 0% -20.28% (p=0.000 n=30+29) AppendInPlace/NoGrow/3Ptr-8 2.45µs ± 0% 1.89µs ± 0% -22.88% (p=0.000 n=26+28) AppendInPlace/NoGrow/4Ptr-8 3.02µs ± 1% 2.56µs ± 1% -15.35% (p=0.000 n=30+28) AppendInPlace/Grow/Byte-8 271ns ± 1% 283ns ± 1% +4.56% (p=0.000 n=29+29) AppendInPlace/Grow/1Ptr-8 280ns ± 1% 288ns ± 1% +2.99% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 391ns ± 0% 409ns ± 0% +4.66% (p=0.000 n=30+29) AppendInPlace/Grow/3Ptr-8 462ns ± 0% 481ns ± 0% +4.13% (p=0.000 n=29+30) AppendInPlace/Grow/4Ptr-8 479ns ± 0% 502ns ± 0% +4.81% (p=0.000 n=27+26) New generated code: var x []byte func a() { x = append(x, 1) } "".a t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (a.go:5) TEXT "".a(SB), $72-0 0x0000 00000 (a.go:5) MOVQ (TLS), CX 0x0009 00009 (a.go:5) CMPQ SP, 16(CX) 0x000d 00013 (a.go:5) JLS 190 0x0013 00019 (a.go:5) SUBQ $72, SP 0x0017 00023 (a.go:5) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:5) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:6) MOVQ "".x+16(SB), CX 0x001e 00030 (a.go:6) MOVQ "".x+8(SB), DX 0x0025 00037 (a.go:6) MOVQ "".x(SB), BX 0x002c 00044 (a.go:6) LEAQ 1(DX), BP 0x0030 00048 (a.go:6) CMPQ BP, CX 0x0033 00051 (a.go:6) JGT $0, 73 0x0035 00053 (a.go:6) LEAQ 1(DX), AX 0x0039 00057 (a.go:6) MOVQ AX, "".x+8(SB) 0x0040 00064 (a.go:6) MOVB $1, (BX)(DX*1) 0x0044 00068 (a.go:7) ADDQ $72, SP 0x0048 00072 (a.go:7) RET 0x0049 00073 (a.go:6) LEAQ type.[]uint8(SB), AX 0x0050 00080 (a.go:6) MOVQ AX, (SP) 0x0054 00084 (a.go:6) MOVQ BX, 8(SP) 0x0059 00089 (a.go:6) MOVQ DX, 16(SP) 0x005e 00094 (a.go:6) MOVQ CX, 24(SP) 0x0063 00099 (a.go:6) MOVQ BP, 32(SP) 0x0068 00104 (a.go:6) PCDATA $0, $0 0x0068 00104 (a.go:6) CALL runtime.growslice(SB) 0x006d 00109 (a.go:6) MOVQ 40(SP), CX 0x0072 00114 (a.go:6) MOVQ 48(SP), DX 0x0077 00119 (a.go:6) MOVQ DX, "".autotmp_0+64(SP) 0x007c 00124 (a.go:6) MOVQ 56(SP), BX 0x0081 00129 (a.go:6) MOVQ BX, "".x+16(SB) 0x0088 00136 (a.go:6) MOVL runtime.writeBarrier(SB), AX 0x008e 00142 (a.go:6) TESTB AL, AL 0x0090 00144 (a.go:6) JNE $0, 162 0x0092 00146 (a.go:6) MOVQ CX, "".x(SB) 0x0099 00153 (a.go:6) MOVQ "".x(SB), BX 0x00a0 00160 (a.go:6) JMP 53 0x00a2 00162 (a.go:6) LEAQ "".x(SB), BX 0x00a9 00169 (a.go:6) MOVQ BX, (SP) 0x00ad 00173 (a.go:6) MOVQ CX, 8(SP) 0x00b2 00178 (a.go:6) PCDATA $0, $0 0x00b2 00178 (a.go:6) CALL runtime.writebarrierptr(SB) 0x00b7 00183 (a.go:6) MOVQ "".autotmp_0+64(SP), DX 0x00bc 00188 (a.go:6) JMP 153 0x00be 00190 (a.go:6) NOP 0x00be 00190 (a.go:5) CALL runtime.morestack_noctxt(SB) 0x00c3 00195 (a.go:5) JMP 0 Fixes #14969 again Change-Id: Ia50463b1f506011aad0718a4fef1d4738e43c32d Reviewed-on: https://go-review.googlesource.com/22197 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-18 09:40:30 -07:00
lenVar = Node{Op: ONAME, Class: Pxxx, Sym: &Sym{Name: "len"}}
cmd/compile: avoid a spill in append fast path Instead of spilling newlen, recalculate it. This removes a spill from the fast path, at the cost of a cheap recalculation on the (rare) growth path. This uses 8 bytes less of stack space. It generates two more bytes of code, but that is due to suboptimal register allocation; see far below. Runtime append microbenchmarks are all over the map, presumably due to incidental code movement. Sample code: func s(b []byte) []byte { b = append(b, 1, 2, 3) return b } Before: "".s t=1 size=160 args=0x30 locals=0x48 0x0000 00000 (append.go:8) TEXT "".s(SB), $72-48 0x0000 00000 (append.go:8) MOVQ (TLS), CX 0x0009 00009 (append.go:8) CMPQ SP, 16(CX) 0x000d 00013 (append.go:8) JLS 149 0x0013 00019 (append.go:8) SUBQ $72, SP 0x0017 00023 (append.go:8) FUNCDATA $0, gclocals·6432f8c6a0d23fa7bee6c5d96f21a92a(SB) 0x0017 00023 (append.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:9) MOVQ "".b+88(FP), CX 0x001c 00028 (append.go:9) LEAQ 3(CX), DX 0x0020 00032 (append.go:9) MOVQ DX, "".autotmp_0+64(SP) 0x0025 00037 (append.go:9) MOVQ "".b+96(FP), BX 0x002a 00042 (append.go:9) CMPQ DX, BX 0x002d 00045 (append.go:9) JGT $0, 86 0x002f 00047 (append.go:8) MOVQ "".b+80(FP), AX 0x0034 00052 (append.go:9) MOVB $1, (AX)(CX*1) 0x0038 00056 (append.go:9) MOVB $2, 1(AX)(CX*1) 0x003d 00061 (append.go:9) MOVB $3, 2(AX)(CX*1) 0x0042 00066 (append.go:10) MOVQ AX, "".~r1+104(FP) 0x0047 00071 (append.go:10) MOVQ DX, "".~r1+112(FP) 0x004c 00076 (append.go:10) MOVQ BX, "".~r1+120(FP) 0x0051 00081 (append.go:10) ADDQ $72, SP 0x0055 00085 (append.go:10) RET 0x0056 00086 (append.go:9) LEAQ type.[]uint8(SB), AX 0x005d 00093 (append.go:9) MOVQ AX, (SP) 0x0061 00097 (append.go:9) MOVQ "".b+80(FP), BP 0x0066 00102 (append.go:9) MOVQ BP, 8(SP) 0x006b 00107 (append.go:9) MOVQ CX, 16(SP) 0x0070 00112 (append.go:9) MOVQ BX, 24(SP) 0x0075 00117 (append.go:9) MOVQ DX, 32(SP) 0x007a 00122 (append.go:9) PCDATA $0, $0 0x007a 00122 (append.go:9) CALL runtime.growslice(SB) 0x007f 00127 (append.go:9) MOVQ 40(SP), AX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:8) MOVQ "".b+88(FP), CX 0x008e 00142 (append.go:9) MOVQ "".autotmp_0+64(SP), DX 0x0093 00147 (append.go:9) JMP 52 0x0095 00149 (append.go:9) NOP 0x0095 00149 (append.go:8) CALL runtime.morestack_noctxt(SB) 0x009a 00154 (append.go:8) JMP 0 After: "".s t=1 size=176 args=0x30 locals=0x40 0x0000 00000 (append.go:8) TEXT "".s(SB), $64-48 0x0000 00000 (append.go:8) MOVQ (TLS), CX 0x0009 00009 (append.go:8) CMPQ SP, 16(CX) 0x000d 00013 (append.go:8) JLS 151 0x0013 00019 (append.go:8) SUBQ $64, SP 0x0017 00023 (append.go:8) FUNCDATA $0, gclocals·6432f8c6a0d23fa7bee6c5d96f21a92a(SB) 0x0017 00023 (append.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:9) MOVQ "".b+80(FP), CX 0x001c 00028 (append.go:9) LEAQ 3(CX), DX 0x0020 00032 (append.go:9) MOVQ "".b+88(FP), BX 0x0025 00037 (append.go:9) CMPQ DX, BX 0x0028 00040 (append.go:9) JGT $0, 81 0x002a 00042 (append.go:8) MOVQ "".b+72(FP), AX 0x002f 00047 (append.go:9) MOVB $1, (AX)(CX*1) 0x0033 00051 (append.go:9) MOVB $2, 1(AX)(CX*1) 0x0038 00056 (append.go:9) MOVB $3, 2(AX)(CX*1) 0x003d 00061 (append.go:10) MOVQ AX, "".~r1+96(FP) 0x0042 00066 (append.go:10) MOVQ DX, "".~r1+104(FP) 0x0047 00071 (append.go:10) MOVQ BX, "".~r1+112(FP) 0x004c 00076 (append.go:10) ADDQ $64, SP 0x0050 00080 (append.go:10) RET 0x0051 00081 (append.go:9) LEAQ type.[]uint8(SB), AX 0x0058 00088 (append.go:9) MOVQ AX, (SP) 0x005c 00092 (append.go:9) MOVQ "".b+72(FP), BP 0x0061 00097 (append.go:9) MOVQ BP, 8(SP) 0x0066 00102 (append.go:9) MOVQ CX, 16(SP) 0x006b 00107 (append.go:9) MOVQ BX, 24(SP) 0x0070 00112 (append.go:9) MOVQ DX, 32(SP) 0x0075 00117 (append.go:9) PCDATA $0, $0 0x0075 00117 (append.go:9) CALL runtime.growslice(SB) 0x007a 00122 (append.go:9) MOVQ 40(SP), AX 0x007f 00127 (append.go:9) MOVQ 48(SP), CX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:9) ADDQ $3, CX 0x008d 00141 (append.go:9) MOVQ CX, DX 0x0090 00144 (append.go:8) MOVQ "".b+80(FP), CX 0x0095 00149 (append.go:9) JMP 47 0x0097 00151 (append.go:9) NOP 0x0097 00151 (append.go:8) CALL runtime.morestack_noctxt(SB) 0x009c 00156 (append.go:8) JMP 0 Observe that in the following sequence, we should use DX directly instead of using CX as a temporary register, which would make the new code a strict improvement on the old: 0x007f 00127 (append.go:9) MOVQ 48(SP), CX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:9) ADDQ $3, CX 0x008d 00141 (append.go:9) MOVQ CX, DX 0x0090 00144 (append.go:8) MOVQ "".b+80(FP), CX Change-Id: I4ee50b18fa53865901d2d7f86c2cbb54c6fa6924 Reviewed-on: https://go-review.googlesource.com/21812 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:08:00 -07:00
newlenVar = Node{Op: ONAME, Class: Pxxx, Sym: &Sym{Name: "newlen"}}
capVar = Node{Op: ONAME, Class: Pxxx, Sym: &Sym{Name: "cap"}}
typVar = Node{Op: ONAME, Class: Pxxx, Sym: &Sym{Name: "typ"}}
okVar = Node{Op: ONAME, Class: Pxxx, Sym: &Sym{Name: "ok"}}
)
// startBlock sets the current block we're generating code in to b.
func (s *state) startBlock(b *ssa.Block) {
if s.curBlock != nil {
s.Fatalf("starting block %v when block %v has not ended", b, s.curBlock)
}
s.curBlock = b
s.vars = map[*Node]*ssa.Value{}
for n := range s.fwdVars {
delete(s.fwdVars, n)
}
}
// endBlock marks the end of generating code for the current block.
// Returns the (former) current block. Returns nil if there is no current
// block, i.e. if no code flows to the current execution point.
func (s *state) endBlock() *ssa.Block {
b := s.curBlock
if b == nil {
return nil
}
for len(s.defvars) <= int(b.ID) {
s.defvars = append(s.defvars, nil)
}
s.defvars[b.ID] = s.vars
s.curBlock = nil
s.vars = nil
b.Pos = s.peekPos()
return b
}
// pushLine pushes a line number on the line number stack.
2016-12-15 17:17:01 -08:00
func (s *state) pushLine(line src.XPos) {
if !line.IsKnown() {
// the frontend may emit node with line number missing,
// use the parent line number in this case.
line = s.peekPos()
if Debug['K'] != 0 {
Warn("buildssa: unknown position (line 0)")
}
}
s.line = append(s.line, line)
}
// popLine pops the top of the line number stack.
func (s *state) popLine() {
s.line = s.line[:len(s.line)-1]
}
// peekPos peeks the top of the line number stack.
2016-12-15 17:17:01 -08:00
func (s *state) peekPos() src.XPos {
return s.line[len(s.line)-1]
}
func (s *state) Error(msg string, args ...interface{}) {
yyerrorl(s.peekPos(), msg, args...)
}
// newValue0 adds a new value with no arguments to the current block.
func (s *state) newValue0(op ssa.Op, t ssa.Type) *ssa.Value {
return s.curBlock.NewValue0(s.peekPos(), op, t)
}
// newValue0A adds a new value with no arguments and an aux value to the current block.
func (s *state) newValue0A(op ssa.Op, t ssa.Type, aux interface{}) *ssa.Value {
return s.curBlock.NewValue0A(s.peekPos(), op, t, aux)
}
// newValue0I adds a new value with no arguments and an auxint value to the current block.
func (s *state) newValue0I(op ssa.Op, t ssa.Type, auxint int64) *ssa.Value {
return s.curBlock.NewValue0I(s.peekPos(), op, t, auxint)
}
// newValue1 adds a new value with one argument to the current block.
func (s *state) newValue1(op ssa.Op, t ssa.Type, arg *ssa.Value) *ssa.Value {
return s.curBlock.NewValue1(s.peekPos(), op, t, arg)
}
// newValue1A adds a new value with one argument and an aux value to the current block.
func (s *state) newValue1A(op ssa.Op, t ssa.Type, aux interface{}, arg *ssa.Value) *ssa.Value {
return s.curBlock.NewValue1A(s.peekPos(), op, t, aux, arg)
}
// newValue1I adds a new value with one argument and an auxint value to the current block.
func (s *state) newValue1I(op ssa.Op, t ssa.Type, aux int64, arg *ssa.Value) *ssa.Value {
return s.curBlock.NewValue1I(s.peekPos(), op, t, aux, arg)
}
// newValue2 adds a new value with two arguments to the current block.
func (s *state) newValue2(op ssa.Op, t ssa.Type, arg0, arg1 *ssa.Value) *ssa.Value {
return s.curBlock.NewValue2(s.peekPos(), op, t, arg0, arg1)
}
// newValue2I adds a new value with two arguments and an auxint value to the current block.
func (s *state) newValue2I(op ssa.Op, t ssa.Type, aux int64, arg0, arg1 *ssa.Value) *ssa.Value {
return s.curBlock.NewValue2I(s.peekPos(), op, t, aux, arg0, arg1)
}
// newValue3 adds a new value with three arguments to the current block.
func (s *state) newValue3(op ssa.Op, t ssa.Type, arg0, arg1, arg2 *ssa.Value) *ssa.Value {
return s.curBlock.NewValue3(s.peekPos(), op, t, arg0, arg1, arg2)
}
// newValue3I adds a new value with three arguments and an auxint value to the current block.
func (s *state) newValue3I(op ssa.Op, t ssa.Type, aux int64, arg0, arg1, arg2 *ssa.Value) *ssa.Value {
return s.curBlock.NewValue3I(s.peekPos(), op, t, aux, arg0, arg1, arg2)
}
// newValue3A adds a new value with three arguments and an aux value to the current block.
func (s *state) newValue3A(op ssa.Op, t ssa.Type, aux interface{}, arg0, arg1, arg2 *ssa.Value) *ssa.Value {
return s.curBlock.NewValue3A(s.peekPos(), op, t, aux, arg0, arg1, arg2)
}
// newValue4 adds a new value with four arguments to the current block.
func (s *state) newValue4(op ssa.Op, t ssa.Type, arg0, arg1, arg2, arg3 *ssa.Value) *ssa.Value {
return s.curBlock.NewValue4(s.peekPos(), op, t, arg0, arg1, arg2, arg3)
}
// entryNewValue0 adds a new value with no arguments to the entry block.
func (s *state) entryNewValue0(op ssa.Op, t ssa.Type) *ssa.Value {
return s.f.Entry.NewValue0(s.peekPos(), op, t)
}
// entryNewValue0A adds a new value with no arguments and an aux value to the entry block.
func (s *state) entryNewValue0A(op ssa.Op, t ssa.Type, aux interface{}) *ssa.Value {
return s.f.Entry.NewValue0A(s.peekPos(), op, t, aux)
}
// entryNewValue0I adds a new value with no arguments and an auxint value to the entry block.
func (s *state) entryNewValue0I(op ssa.Op, t ssa.Type, auxint int64) *ssa.Value {
return s.f.Entry.NewValue0I(s.peekPos(), op, t, auxint)
}
// entryNewValue1 adds a new value with one argument to the entry block.
func (s *state) entryNewValue1(op ssa.Op, t ssa.Type, arg *ssa.Value) *ssa.Value {
return s.f.Entry.NewValue1(s.peekPos(), op, t, arg)
}
// entryNewValue1 adds a new value with one argument and an auxint value to the entry block.
func (s *state) entryNewValue1I(op ssa.Op, t ssa.Type, auxint int64, arg *ssa.Value) *ssa.Value {
return s.f.Entry.NewValue1I(s.peekPos(), op, t, auxint, arg)
}
// entryNewValue1A adds a new value with one argument and an aux value to the entry block.
func (s *state) entryNewValue1A(op ssa.Op, t ssa.Type, aux interface{}, arg *ssa.Value) *ssa.Value {
return s.f.Entry.NewValue1A(s.peekPos(), op, t, aux, arg)
}
// entryNewValue2 adds a new value with two arguments to the entry block.
func (s *state) entryNewValue2(op ssa.Op, t ssa.Type, arg0, arg1 *ssa.Value) *ssa.Value {
return s.f.Entry.NewValue2(s.peekPos(), op, t, arg0, arg1)
}
// const* routines add a new const value to the entry block.
func (s *state) constSlice(t ssa.Type) *ssa.Value { return s.f.ConstSlice(s.peekPos(), t) }
func (s *state) constInterface(t ssa.Type) *ssa.Value { return s.f.ConstInterface(s.peekPos(), t) }
func (s *state) constNil(t ssa.Type) *ssa.Value { return s.f.ConstNil(s.peekPos(), t) }
func (s *state) constEmptyString(t ssa.Type) *ssa.Value { return s.f.ConstEmptyString(s.peekPos(), t) }
func (s *state) constBool(c bool) *ssa.Value {
return s.f.ConstBool(s.peekPos(), Types[TBOOL], c)
}
func (s *state) constInt8(t ssa.Type, c int8) *ssa.Value {
return s.f.ConstInt8(s.peekPos(), t, c)
}
func (s *state) constInt16(t ssa.Type, c int16) *ssa.Value {
return s.f.ConstInt16(s.peekPos(), t, c)
}
func (s *state) constInt32(t ssa.Type, c int32) *ssa.Value {
return s.f.ConstInt32(s.peekPos(), t, c)
}
func (s *state) constInt64(t ssa.Type, c int64) *ssa.Value {
return s.f.ConstInt64(s.peekPos(), t, c)
}
func (s *state) constFloat32(t ssa.Type, c float64) *ssa.Value {
return s.f.ConstFloat32(s.peekPos(), t, c)
}
func (s *state) constFloat64(t ssa.Type, c float64) *ssa.Value {
return s.f.ConstFloat64(s.peekPos(), t, c)
}
func (s *state) constInt(t ssa.Type, c int64) *ssa.Value {
if s.config.IntSize == 8 {
return s.constInt64(t, c)
}
if int64(int32(c)) != c {
s.Fatalf("integer constant too big %d", c)
}
return s.constInt32(t, int32(c))
}
cmd/compile: add OpOffPtr [c] SP to constant cache They accounted for almost 30% of all CSE'd values. By never creating the duplicates in the first place, we reduce the high water mark of Value IDs, which in turn makes all SSA phases cheaper, particularly regalloc. name old time/op new time/op delta Template 200ms ± 3% 198ms ± 4% -0.87% (p=0.016 n=50+49) Unicode 86.9ms ± 2% 85.5ms ± 3% -1.56% (p=0.000 n=49+50) GoTypes 553ms ± 4% 551ms ± 4% ~ (p=0.183 n=50+49) SSA 3.97s ± 3% 3.93s ± 2% -1.06% (p=0.000 n=48+48) Flate 124ms ± 4% 124ms ± 3% ~ (p=0.545 n=48+50) GoParser 146ms ± 4% 146ms ± 4% ~ (p=0.810 n=49+49) Reflect 357ms ± 3% 355ms ± 3% -0.59% (p=0.049 n=50+48) Tar 106ms ± 4% 107ms ± 5% ~ (p=0.454 n=49+50) XML 203ms ± 4% 203ms ± 4% ~ (p=0.726 n=48+50) name old user-ns/op new user-ns/op delta Template 237M ± 3% 235M ± 4% ~ (p=0.208 n=47+48) Unicode 111M ± 4% 108M ± 9% -2.50% (p=0.000 n=47+50) GoTypes 736M ± 5% 729M ± 4% -0.95% (p=0.017 n=50+46) SSA 5.73G ± 4% 5.74G ± 4% ~ (p=0.765 n=50+50) Flate 150M ± 5% 148M ± 6% -0.89% (p=0.045 n=48+47) GoParser 180M ± 5% 178M ± 7% -1.34% (p=0.012 n=50+50) Reflect 450M ± 4% 444M ± 4% -1.40% (p=0.000 n=50+49) Tar 124M ± 7% 123M ± 7% ~ (p=0.092 n=50+50) XML 248M ± 6% 245M ± 5% ~ (p=0.057 n=50+50) name old alloc/op new alloc/op delta Template 39.4MB ± 0% 39.3MB ± 0% -0.37% (p=0.000 n=50+50) Unicode 30.9MB ± 0% 30.9MB ± 0% -0.27% (p=0.000 n=48+50) GoTypes 114MB ± 0% 113MB ± 0% -1.03% (p=0.000 n=50+49) SSA 882MB ± 0% 865MB ± 0% -1.95% (p=0.000 n=49+49) Flate 25.8MB ± 0% 25.7MB ± 0% -0.21% (p=0.000 n=50+50) GoParser 31.7MB ± 0% 31.6MB ± 0% -0.33% (p=0.000 n=50+50) Reflect 79.7MB ± 0% 79.3MB ± 0% -0.49% (p=0.000 n=44+49) Tar 27.2MB ± 0% 27.1MB ± 0% -0.31% (p=0.000 n=50+50) XML 42.7MB ± 0% 42.3MB ± 0% -1.05% (p=0.000 n=48+49) name old allocs/op new allocs/op delta Template 379k ± 1% 380k ± 1% +0.26% (p=0.000 n=50+50) Unicode 324k ± 1% 324k ± 1% ~ (p=0.964 n=49+50) GoTypes 1.14M ± 0% 1.15M ± 0% +0.14% (p=0.000 n=50+49) SSA 7.89M ± 0% 7.89M ± 0% -0.05% (p=0.000 n=49+49) Flate 240k ± 1% 241k ± 1% +0.27% (p=0.001 n=50+50) GoParser 310k ± 1% 311k ± 1% +0.48% (p=0.000 n=50+49) Reflect 1.00M ± 0% 1.00M ± 0% +0.17% (p=0.000 n=48+50) Tar 254k ± 1% 255k ± 1% +0.23% (p=0.005 n=50+50) XML 395k ± 1% 395k ± 1% +0.19% (p=0.002 n=49+47) Change-Id: Iaa8f5f37e23bd81983409f7359f9dcd4dfe2961f Reviewed-on: https://go-review.googlesource.com/38003 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-08 12:50:00 -08:00
func (s *state) constOffPtrSP(t ssa.Type, c int64) *ssa.Value {
return s.f.ConstOffPtrSP(s.peekPos(), t, c, s.sp)
}
// stmtList converts the statement list n to SSA and adds it to s.
func (s *state) stmtList(l Nodes) {
for _, n := range l.Slice() {
s.stmt(n)
}
}
// stmt converts the statement n to SSA and adds it to s.
func (s *state) stmt(n *Node) {
s.pushLine(n.Pos)
defer s.popLine()
cmd/compile: check labels and gotos before building SSA This CL introduces yet another compiler pass, which checks for correct control flow constructs prior to converting from AST to SSA form. It cannot be integrated with walk, since walk rewrites switch and select statements on the fly. To reduce code duplication, this CL also does some minor refactoring. With this pass in place, the AST to SSA converter can now stop generating SSA for any known-dead code. This minor savings pays for the minor cost of the new pass. Performance is almost a wash: name old time/op new time/op delta Template 206ms ± 4% 205ms ± 4% ~ (p=0.108 n=43+43) Unicode 84.0ms ± 4% 84.0ms ± 4% ~ (p=0.979 n=43+43) GoTypes 550ms ± 3% 553ms ± 3% ~ (p=0.065 n=40+41) Compiler 2.57s ± 4% 2.58s ± 2% ~ (p=0.103 n=44+41) SSA 3.94s ± 3% 3.93s ± 2% ~ (p=0.833 n=44+42) Flate 126ms ± 6% 125ms ± 4% ~ (p=0.941 n=43+39) GoParser 147ms ± 4% 148ms ± 3% ~ (p=0.164 n=42+39) Reflect 359ms ± 3% 357ms ± 5% ~ (p=0.241 n=43+44) Tar 106ms ± 5% 106ms ± 7% ~ (p=0.853 n=40+43) XML 202ms ± 3% 203ms ± 3% ~ (p=0.488 n=42+41) name old user-ns/op new user-ns/op delta Template 240M ± 4% 239M ± 4% ~ (p=0.844 n=42+43) Unicode 107M ± 5% 107M ± 4% ~ (p=0.332 n=40+43) GoTypes 735M ± 3% 731M ± 4% ~ (p=0.141 n=43+44) Compiler 3.51G ± 3% 3.52G ± 3% ~ (p=0.208 n=42+43) SSA 5.72G ± 4% 5.72G ± 3% ~ (p=0.928 n=44+42) Flate 151M ± 7% 150M ± 8% ~ (p=0.662 n=44+43) GoParser 181M ± 5% 181M ± 4% ~ (p=0.379 n=41+44) Reflect 447M ± 4% 445M ± 4% ~ (p=0.344 n=43+43) Tar 125M ± 7% 124M ± 6% ~ (p=0.353 n=43+43) XML 248M ± 4% 250M ± 6% ~ (p=0.158 n=44+44) name old alloc/op new alloc/op delta Template 40.3MB ± 0% 40.2MB ± 0% -0.27% (p=0.000 n=10+10) Unicode 30.3MB ± 0% 30.2MB ± 0% -0.10% (p=0.015 n=10+10) GoTypes 114MB ± 0% 114MB ± 0% -0.06% (p=0.000 n=7+9) Compiler 480MB ± 0% 481MB ± 0% +0.07% (p=0.000 n=10+10) SSA 864MB ± 0% 862MB ± 0% -0.25% (p=0.000 n=9+10) Flate 25.9MB ± 0% 25.9MB ± 0% ~ (p=0.123 n=10+10) GoParser 32.1MB ± 0% 32.1MB ± 0% ~ (p=0.631 n=10+10) Reflect 79.9MB ± 0% 79.6MB ± 0% -0.39% (p=0.000 n=10+9) Tar 27.1MB ± 0% 27.0MB ± 0% -0.18% (p=0.003 n=10+10) XML 42.6MB ± 0% 42.6MB ± 0% ~ (p=0.143 n=10+10) name old allocs/op new allocs/op delta Template 401k ± 0% 401k ± 1% ~ (p=0.353 n=10+10) Unicode 322k ± 0% 322k ± 0% ~ (p=0.739 n=10+10) GoTypes 1.18M ± 0% 1.18M ± 0% +0.25% (p=0.001 n=7+8) Compiler 4.51M ± 0% 4.53M ± 0% +0.37% (p=0.000 n=10+10) SSA 7.91M ± 0% 7.93M ± 0% +0.20% (p=0.000 n=9+10) Flate 244k ± 0% 245k ± 0% ~ (p=0.123 n=10+10) GoParser 323k ± 1% 324k ± 1% +0.40% (p=0.035 n=10+10) Reflect 1.01M ± 0% 1.02M ± 0% +0.37% (p=0.000 n=10+9) Tar 258k ± 1% 258k ± 1% ~ (p=0.661 n=10+9) XML 403k ± 0% 405k ± 0% +0.47% (p=0.004 n=10+10) Updates #15756 Updates #19250 Change-Id: I647bfbb745c35630447eb79dfcaa994b490ce942 Reviewed-on: https://go-review.googlesource.com/38159 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-14 11:05:03 -07:00
// If s.curBlock is nil, and n isn't a label (which might have an associated goto somewhere),
// then this code is dead. Stop here.
if s.curBlock == nil && n.Op != OLABEL {
return
}
s.stmtList(n.Ninit)
switch n.Op {
case OBLOCK:
s.stmtList(n.List)
// No-ops
case OEMPTY, ODCLCONST, ODCLTYPE, OFALL:
// Expression statements
case OCALLFUNC:
if isIntrinsicCall(n) {
s.intrinsicCall(n)
return
}
fallthrough
case OCALLMETH, OCALLINTER:
s.call(n, callNormal)
if n.Op == OCALLFUNC && n.Left.Op == ONAME && n.Left.Class == PFUNC {
if fn := n.Left.Sym.Name; compiling_runtime && fn == "throw" ||
n.Left.Sym.Pkg == Runtimepkg && (fn == "throwinit" || fn == "gopanic" || fn == "panicwrap" || fn == "block") {
m := s.mem()
b := s.endBlock()
b.Kind = ssa.BlockExit
b.SetControl(m)
// TODO: never rewrite OPANIC to OCALLFUNC in the
// first place. Need to wait until all backends
// go through SSA.
}
}
case ODEFER:
s.call(n.Left, callDefer)
case OPROC:
s.call(n.Left, callGo)
case OAS2DOTTYPE:
res, resok := s.dottype(n.Rlist.First(), true)
deref := false
if !canSSAType(n.Rlist.First().Type) {
if res.Op != ssa.OpLoad {
s.Fatalf("dottype of non-load")
}
mem := s.mem()
if mem.Op == ssa.OpVarKill {
mem = mem.Args[0]
}
if res.Args[1] != mem {
s.Fatalf("memory no longer live from 2-result dottype load")
}
deref = true
res = res.Args[0]
}
s.assign(n.List.First(), res, deref, 0)
s.assign(n.List.Second(), resok, false, 0)
return
case OAS2FUNC:
// We come here only when it is an intrinsic call returning two values.
if !isIntrinsicCall(n.Rlist.First()) {
s.Fatalf("non-intrinsic AS2FUNC not expanded %v", n.Rlist.First())
}
v := s.intrinsicCall(n.Rlist.First())
v1 := s.newValue1(ssa.OpSelect0, n.List.First().Type, v)
v2 := s.newValue1(ssa.OpSelect1, n.List.Second().Type, v)
s.assign(n.List.First(), v1, false, 0)
s.assign(n.List.Second(), v2, false, 0)
return
case ODCL:
cmd/compile: fix liveness computation for heap-escaped parameters The liveness computation of parameters generally was never correct, but forcing all parameters to be live throughout the function covered up that problem. The new SSA back end is too clever: even though it currently keeps the parameter values live throughout the function, it may find optimizations that mean the current values are not written back to the original parameter stack slots immediately or ever (for example if a parameter is set to nil, SSA constant propagation may replace all later uses of the parameter with a constant nil, eliminating the need to write the nil value back to the stack slot), so the liveness code must now track the actual operations on the stack slots, exposing these problems. One small problem in the handling of arguments is that nodarg can return ONAME PPARAM nodes with adjusted offsets, so that there are actually multiple *Node pointers for the same parameter in the instruction stream. This might be possible to correct, but not in this CL. For now, we fix this by using n.Orig instead of n when considering PPARAM and PPARAMOUT nodes. The major problem in the handling of arguments is general confusion in the liveness code about the meaning of PPARAM|PHEAP and PPARAMOUT|PHEAP nodes, especially as contrasted with PAUTO|PHEAP. The difference between these two is that when a local variable "moves" to the heap, it's really just allocated there to start with; in contrast, when an argument moves to the heap, the actual data has to be copied there from the stack at the beginning of the function, and when a result "moves" to the heap the value in the heap has to be copied back to the stack when the function returns This general confusion is also present in the SSA back end. The PHEAP bit worked decently when I first introduced it 7 years ago (!) in 391425ae. The back end did nothing sophisticated, and in particular there was no analysis at all: no escape analysis, no liveness analysis, and certainly no SSA back end. But the complications caused in the various downstream consumers suggest that this should be a detail kept mainly in the front end. This CL therefore eliminates both the PHEAP bit and even the idea of "heap variables" from the back ends. First, it replaces the PPARAM|PHEAP, PPARAMOUT|PHEAP, and PAUTO|PHEAP variable classes with the single PAUTOHEAP, a pseudo-class indicating a variable maintained on the heap and available by indirecting a local variable kept on the stack (a plain PAUTO). Second, walkexpr replaces all references to PAUTOHEAP variables with indirections of the corresponding PAUTO variable. The back ends and the liveness code now just see plain indirected variables. This may actually produce better code, but the real goal here is to eliminate these little-used and somewhat suspect code paths in the back end analyses. The OPARAM node type goes away too. A followup CL will do the same to PPARAMREF. I'm not sure that the back ends (SSA in particular) are handling those right either, and with the framework established in this CL that change is trivial and the result clearly more correct. Fixes #15747. Change-Id: I2770b1ce3cbc93981bfc7166be66a9da12013d74 Reviewed-on: https://go-review.googlesource.com/23393 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-25 01:33:24 -04:00
if n.Left.Class == PAUTOHEAP {
Fatalf("DCL %v", n)
}
case OLABEL:
sym := n.Left.Sym
lab := s.label(sym)
// Associate label with its control flow node, if any
cmd/compile: check labels and gotos before building SSA This CL introduces yet another compiler pass, which checks for correct control flow constructs prior to converting from AST to SSA form. It cannot be integrated with walk, since walk rewrites switch and select statements on the fly. To reduce code duplication, this CL also does some minor refactoring. With this pass in place, the AST to SSA converter can now stop generating SSA for any known-dead code. This minor savings pays for the minor cost of the new pass. Performance is almost a wash: name old time/op new time/op delta Template 206ms ± 4% 205ms ± 4% ~ (p=0.108 n=43+43) Unicode 84.0ms ± 4% 84.0ms ± 4% ~ (p=0.979 n=43+43) GoTypes 550ms ± 3% 553ms ± 3% ~ (p=0.065 n=40+41) Compiler 2.57s ± 4% 2.58s ± 2% ~ (p=0.103 n=44+41) SSA 3.94s ± 3% 3.93s ± 2% ~ (p=0.833 n=44+42) Flate 126ms ± 6% 125ms ± 4% ~ (p=0.941 n=43+39) GoParser 147ms ± 4% 148ms ± 3% ~ (p=0.164 n=42+39) Reflect 359ms ± 3% 357ms ± 5% ~ (p=0.241 n=43+44) Tar 106ms ± 5% 106ms ± 7% ~ (p=0.853 n=40+43) XML 202ms ± 3% 203ms ± 3% ~ (p=0.488 n=42+41) name old user-ns/op new user-ns/op delta Template 240M ± 4% 239M ± 4% ~ (p=0.844 n=42+43) Unicode 107M ± 5% 107M ± 4% ~ (p=0.332 n=40+43) GoTypes 735M ± 3% 731M ± 4% ~ (p=0.141 n=43+44) Compiler 3.51G ± 3% 3.52G ± 3% ~ (p=0.208 n=42+43) SSA 5.72G ± 4% 5.72G ± 3% ~ (p=0.928 n=44+42) Flate 151M ± 7% 150M ± 8% ~ (p=0.662 n=44+43) GoParser 181M ± 5% 181M ± 4% ~ (p=0.379 n=41+44) Reflect 447M ± 4% 445M ± 4% ~ (p=0.344 n=43+43) Tar 125M ± 7% 124M ± 6% ~ (p=0.353 n=43+43) XML 248M ± 4% 250M ± 6% ~ (p=0.158 n=44+44) name old alloc/op new alloc/op delta Template 40.3MB ± 0% 40.2MB ± 0% -0.27% (p=0.000 n=10+10) Unicode 30.3MB ± 0% 30.2MB ± 0% -0.10% (p=0.015 n=10+10) GoTypes 114MB ± 0% 114MB ± 0% -0.06% (p=0.000 n=7+9) Compiler 480MB ± 0% 481MB ± 0% +0.07% (p=0.000 n=10+10) SSA 864MB ± 0% 862MB ± 0% -0.25% (p=0.000 n=9+10) Flate 25.9MB ± 0% 25.9MB ± 0% ~ (p=0.123 n=10+10) GoParser 32.1MB ± 0% 32.1MB ± 0% ~ (p=0.631 n=10+10) Reflect 79.9MB ± 0% 79.6MB ± 0% -0.39% (p=0.000 n=10+9) Tar 27.1MB ± 0% 27.0MB ± 0% -0.18% (p=0.003 n=10+10) XML 42.6MB ± 0% 42.6MB ± 0% ~ (p=0.143 n=10+10) name old allocs/op new allocs/op delta Template 401k ± 0% 401k ± 1% ~ (p=0.353 n=10+10) Unicode 322k ± 0% 322k ± 0% ~ (p=0.739 n=10+10) GoTypes 1.18M ± 0% 1.18M ± 0% +0.25% (p=0.001 n=7+8) Compiler 4.51M ± 0% 4.53M ± 0% +0.37% (p=0.000 n=10+10) SSA 7.91M ± 0% 7.93M ± 0% +0.20% (p=0.000 n=9+10) Flate 244k ± 0% 245k ± 0% ~ (p=0.123 n=10+10) GoParser 323k ± 1% 324k ± 1% +0.40% (p=0.035 n=10+10) Reflect 1.01M ± 0% 1.02M ± 0% +0.37% (p=0.000 n=10+9) Tar 258k ± 1% 258k ± 1% ~ (p=0.661 n=10+9) XML 403k ± 0% 405k ± 0% +0.47% (p=0.004 n=10+10) Updates #15756 Updates #19250 Change-Id: I647bfbb745c35630447eb79dfcaa994b490ce942 Reviewed-on: https://go-review.googlesource.com/38159 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-14 11:05:03 -07:00
if ctl := n.labeledControl(); ctl != nil {
s.labeledNodes[ctl] = lab
}
// The label might already have a target block via a goto.
if lab.target == nil {
lab.target = s.f.NewBlock(ssa.BlockPlain)
}
cmd/compile: avoid generating some dead blocks We generate a lot of pointless dead blocks during the AST to SSA conversion. There are a few commonly occurring kinds of statements that contain neither variables nor code and that switch to a new block themselves. Stop making dead blocks for them. For the code in #19379, this reduces compilation wall time by 36% and max rss by 28%. This also helps a little for regular code, particularly code heavy on switch statements. name old time/op new time/op delta Template 231ms ± 3% 230ms ± 5% ~ (p=0.402 n=17+16) Unicode 101ms ± 4% 103ms ± 5% ~ (p=0.221 n=19+18) GoTypes 635ms ± 5% 625ms ± 4% ~ (p=0.063 n=20+18) Compiler 2.93s ± 2% 2.89s ± 2% -1.22% (p=0.003 n=20+19) SSA 4.53s ± 3% 4.52s ± 3% ~ (p=0.380 n=20+19) Flate 132ms ± 4% 133ms ± 5% ~ (p=0.647 n=20+19) GoParser 161ms ± 3% 161ms ± 4% ~ (p=0.749 n=20+19) Reflect 403ms ± 4% 397ms ± 3% -1.53% (p=0.030 n=20+19) Tar 121ms ± 2% 121ms ± 8% ~ (p=0.544 n=19+19) XML 225ms ± 3% 224ms ± 4% ~ (p=0.396 n=20+19) name old user-ns/op new user-ns/op delta Template 302user-ms ± 1% 297user-ms ± 7% -1.49% (p=0.048 n=15+18) Unicode 142user-ms ± 3% 143user-ms ± 5% ~ (p=0.363 n=19+17) GoTypes 852user-ms ± 5% 851user-ms ± 3% ~ (p=0.851 n=20+18) Compiler 4.11user-s ± 6% 3.98user-s ± 3% -3.08% (p=0.000 n=20+19) SSA 6.91user-s ± 5% 6.82user-s ± 7% ~ (p=0.113 n=20+19) Flate 164user-ms ± 4% 168user-ms ± 4% +2.42% (p=0.001 n=18+19) GoParser 207user-ms ± 4% 206user-ms ± 4% ~ (p=0.176 n=20+18) Reflect 509user-ms ± 4% 505user-ms ± 4% ~ (p=0.113 n=20+19) Tar 153user-ms ± 7% 151user-ms ± 9% ~ (p=0.283 n=20+19) XML 284user-ms ± 4% 282user-ms ± 4% ~ (p=0.270 n=20+19) name old alloc/op new alloc/op delta Template 42.6MB ± 0% 41.9MB ± 0% -1.55% (p=0.000 n=19+19) Unicode 31.7MB ± 0% 31.7MB ± 0% ~ (p=0.828 n=20+18) GoTypes 124MB ± 0% 121MB ± 0% -2.11% (p=0.000 n=20+17) Compiler 534MB ± 0% 523MB ± 0% -2.06% (p=0.000 n=20+19) SSA 989MB ± 0% 977MB ± 0% -1.28% (p=0.000 n=20+19) Flate 27.8MB ± 0% 27.5MB ± 0% -0.98% (p=0.000 n=20+19) GoParser 34.3MB ± 0% 34.0MB ± 0% -0.81% (p=0.000 n=20+19) Reflect 84.6MB ± 0% 82.9MB ± 0% -2.00% (p=0.000 n=17+18) Tar 28.8MB ± 0% 28.3MB ± 0% -1.52% (p=0.000 n=16+18) XML 47.2MB ± 0% 45.8MB ± 0% -2.99% (p=0.000 n=20+19) name old allocs/op new allocs/op delta Template 421k ± 1% 419k ± 1% -0.41% (p=0.001 n=20+19) Unicode 338k ± 1% 338k ± 1% ~ (p=0.478 n=20+19) GoTypes 1.28M ± 0% 1.28M ± 0% -0.36% (p=0.000 n=20+18) Compiler 5.06M ± 0% 5.03M ± 0% -0.63% (p=0.000 n=20+19) SSA 9.14M ± 0% 9.11M ± 0% -0.34% (p=0.000 n=20+19) Flate 267k ± 1% 266k ± 1% ~ (p=0.149 n=20+19) GoParser 347k ± 0% 347k ± 1% ~ (p=0.103 n=19+19) Reflect 1.07M ± 0% 1.07M ± 0% -0.42% (p=0.000 n=16+18) Tar 274k ± 0% 273k ± 1% ~ (p=0.116 n=19+19) XML 449k ± 0% 446k ± 1% -0.60% (p=0.000 n=20+19) Updates #19379 Change-Id: Ie798c347a0c081f5e349e1529880bebaae290967 Reviewed-on: https://go-review.googlesource.com/37760 Reviewed-by: David Chase <drchase@google.com>
2017-03-03 16:15:35 -08:00
// Go to that label.
// (We pretend "label:" is preceded by "goto label", unless the predecessor is unreachable.)
if s.curBlock != nil {
b := s.endBlock()
b.AddEdgeTo(lab.target)
}
s.startBlock(lab.target)
case OGOTO:
sym := n.Left.Sym
lab := s.label(sym)
if lab.target == nil {
lab.target = s.f.NewBlock(ssa.BlockPlain)
}
b := s.endBlock()
b.AddEdgeTo(lab.target)
case OAS:
if n.Left == n.Right && n.Left.Op == ONAME {
// An x=x assignment. No point in doing anything
// here. In addition, skipping this assignment
// prevents generating:
// VARDEF x
// COPY x -> x
// which is bad because x is incorrectly considered
// dead before the vardef. See issue #14904.
return
}
var t *Type
if n.Right != nil {
t = n.Right.Type
} else {
t = n.Left.Type
}
// Evaluate RHS.
rhs := n.Right
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
if rhs != nil {
switch rhs.Op {
case OSTRUCTLIT, OARRAYLIT, OSLICELIT:
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
// All literals with nonzero fields have already been
// rewritten during walk. Any that remain are just T{}
// or equivalents. Use the zero value.
if !iszero(rhs) {
Fatalf("literal with nonzero value in SSA: %v", rhs)
}
rhs = nil
case OAPPEND:
// If we're writing the result of an append back to the same slice,
// handle it specially to avoid write barriers on the fast (non-growth) path.
// If the slice can be SSA'd, it'll be on the stack,
// so there will be no write barriers,
// so there's no need to attempt to prevent them.
if samesafeexpr(n.Left, rhs.List.First()) {
if !s.canSSA(n.Left) {
if Debug_append > 0 {
Warnl(n.Pos, "append: len-only update")
}
s.append(rhs, true)
return
} else {
if Debug_append > 0 { // replicating old diagnostic message
Warnl(n.Pos, "append: len-only update (in local slice)")
}
}
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
}
}
}
var r *ssa.Value
deref := !canSSAType(t)
if deref {
if rhs == nil {
r = nil // Signal assign to use OpZero.
} else {
r = s.addr(rhs, false)
}
} else {
if rhs == nil {
r = s.zeroVal(t)
} else {
r = s.expr(rhs)
}
}
var skip skipMask
if rhs != nil && (rhs.Op == OSLICE || rhs.Op == OSLICE3 || rhs.Op == OSLICESTR) && samesafeexpr(rhs.Left, n.Left) {
// We're assigning a slicing operation back to its source.
// Don't write back fields we aren't changing. See issue #14855.
i, j, k := rhs.SliceBounds()
if i != nil && (i.Op == OLITERAL && i.Val().Ctype() == CTINT && i.Int64() == 0) {
// [0:...] is the same as [:...]
i = nil
}
// TODO: detect defaults for len/cap also.
// Currently doesn't really work because (*p)[:len(*p)] appears here as:
// tmp = len(*p)
// (*p)[:tmp]
//if j != nil && (j.Op == OLEN && samesafeexpr(j.Left, n.Left)) {
// j = nil
//}
//if k != nil && (k.Op == OCAP && samesafeexpr(k.Left, n.Left)) {
// k = nil
//}
if i == nil {
skip |= skipPtr
if j == nil {
skip |= skipLen
}
if k == nil {
skip |= skipCap
}
}
}
s.assign(n.Left, r, deref, skip)
case OIF:
bThen := s.f.NewBlock(ssa.BlockPlain)
bEnd := s.f.NewBlock(ssa.BlockPlain)
var bElse *ssa.Block
if n.Rlist.Len() != 0 {
bElse = s.f.NewBlock(ssa.BlockPlain)
s.condBranch(n.Left, bThen, bElse, n.Likely)
} else {
s.condBranch(n.Left, bThen, bEnd, n.Likely)
}
s.startBlock(bThen)
s.stmtList(n.Nbody)
if b := s.endBlock(); b != nil {
b.AddEdgeTo(bEnd)
}
if n.Rlist.Len() != 0 {
s.startBlock(bElse)
s.stmtList(n.Rlist)
if b := s.endBlock(); b != nil {
b.AddEdgeTo(bEnd)
}
}
s.startBlock(bEnd)
case ORETURN:
s.stmtList(n.List)
s.exit()
case ORETJMP:
s.stmtList(n.List)
b := s.exit()
b.Kind = ssa.BlockRetJmp // override BlockRet
b.Aux = Linksym(n.Left.Sym)
case OCONTINUE, OBREAK:
var to *ssa.Block
if n.Left == nil {
// plain break/continue
cmd/compile: check labels and gotos before building SSA This CL introduces yet another compiler pass, which checks for correct control flow constructs prior to converting from AST to SSA form. It cannot be integrated with walk, since walk rewrites switch and select statements on the fly. To reduce code duplication, this CL also does some minor refactoring. With this pass in place, the AST to SSA converter can now stop generating SSA for any known-dead code. This minor savings pays for the minor cost of the new pass. Performance is almost a wash: name old time/op new time/op delta Template 206ms ± 4% 205ms ± 4% ~ (p=0.108 n=43+43) Unicode 84.0ms ± 4% 84.0ms ± 4% ~ (p=0.979 n=43+43) GoTypes 550ms ± 3% 553ms ± 3% ~ (p=0.065 n=40+41) Compiler 2.57s ± 4% 2.58s ± 2% ~ (p=0.103 n=44+41) SSA 3.94s ± 3% 3.93s ± 2% ~ (p=0.833 n=44+42) Flate 126ms ± 6% 125ms ± 4% ~ (p=0.941 n=43+39) GoParser 147ms ± 4% 148ms ± 3% ~ (p=0.164 n=42+39) Reflect 359ms ± 3% 357ms ± 5% ~ (p=0.241 n=43+44) Tar 106ms ± 5% 106ms ± 7% ~ (p=0.853 n=40+43) XML 202ms ± 3% 203ms ± 3% ~ (p=0.488 n=42+41) name old user-ns/op new user-ns/op delta Template 240M ± 4% 239M ± 4% ~ (p=0.844 n=42+43) Unicode 107M ± 5% 107M ± 4% ~ (p=0.332 n=40+43) GoTypes 735M ± 3% 731M ± 4% ~ (p=0.141 n=43+44) Compiler 3.51G ± 3% 3.52G ± 3% ~ (p=0.208 n=42+43) SSA 5.72G ± 4% 5.72G ± 3% ~ (p=0.928 n=44+42) Flate 151M ± 7% 150M ± 8% ~ (p=0.662 n=44+43) GoParser 181M ± 5% 181M ± 4% ~ (p=0.379 n=41+44) Reflect 447M ± 4% 445M ± 4% ~ (p=0.344 n=43+43) Tar 125M ± 7% 124M ± 6% ~ (p=0.353 n=43+43) XML 248M ± 4% 250M ± 6% ~ (p=0.158 n=44+44) name old alloc/op new alloc/op delta Template 40.3MB ± 0% 40.2MB ± 0% -0.27% (p=0.000 n=10+10) Unicode 30.3MB ± 0% 30.2MB ± 0% -0.10% (p=0.015 n=10+10) GoTypes 114MB ± 0% 114MB ± 0% -0.06% (p=0.000 n=7+9) Compiler 480MB ± 0% 481MB ± 0% +0.07% (p=0.000 n=10+10) SSA 864MB ± 0% 862MB ± 0% -0.25% (p=0.000 n=9+10) Flate 25.9MB ± 0% 25.9MB ± 0% ~ (p=0.123 n=10+10) GoParser 32.1MB ± 0% 32.1MB ± 0% ~ (p=0.631 n=10+10) Reflect 79.9MB ± 0% 79.6MB ± 0% -0.39% (p=0.000 n=10+9) Tar 27.1MB ± 0% 27.0MB ± 0% -0.18% (p=0.003 n=10+10) XML 42.6MB ± 0% 42.6MB ± 0% ~ (p=0.143 n=10+10) name old allocs/op new allocs/op delta Template 401k ± 0% 401k ± 1% ~ (p=0.353 n=10+10) Unicode 322k ± 0% 322k ± 0% ~ (p=0.739 n=10+10) GoTypes 1.18M ± 0% 1.18M ± 0% +0.25% (p=0.001 n=7+8) Compiler 4.51M ± 0% 4.53M ± 0% +0.37% (p=0.000 n=10+10) SSA 7.91M ± 0% 7.93M ± 0% +0.20% (p=0.000 n=9+10) Flate 244k ± 0% 245k ± 0% ~ (p=0.123 n=10+10) GoParser 323k ± 1% 324k ± 1% +0.40% (p=0.035 n=10+10) Reflect 1.01M ± 0% 1.02M ± 0% +0.37% (p=0.000 n=10+9) Tar 258k ± 1% 258k ± 1% ~ (p=0.661 n=10+9) XML 403k ± 0% 405k ± 0% +0.47% (p=0.004 n=10+10) Updates #15756 Updates #19250 Change-Id: I647bfbb745c35630447eb79dfcaa994b490ce942 Reviewed-on: https://go-review.googlesource.com/38159 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-14 11:05:03 -07:00
switch n.Op {
case OCONTINUE:
to = s.continueTo
case OBREAK:
to = s.breakTo
}
} else {
// labeled break/continue; look up the target
sym := n.Left.Sym
lab := s.label(sym)
switch n.Op {
case OCONTINUE:
to = lab.continueTarget
case OBREAK:
to = lab.breakTarget
}
}
cmd/compile: check labels and gotos before building SSA This CL introduces yet another compiler pass, which checks for correct control flow constructs prior to converting from AST to SSA form. It cannot be integrated with walk, since walk rewrites switch and select statements on the fly. To reduce code duplication, this CL also does some minor refactoring. With this pass in place, the AST to SSA converter can now stop generating SSA for any known-dead code. This minor savings pays for the minor cost of the new pass. Performance is almost a wash: name old time/op new time/op delta Template 206ms ± 4% 205ms ± 4% ~ (p=0.108 n=43+43) Unicode 84.0ms ± 4% 84.0ms ± 4% ~ (p=0.979 n=43+43) GoTypes 550ms ± 3% 553ms ± 3% ~ (p=0.065 n=40+41) Compiler 2.57s ± 4% 2.58s ± 2% ~ (p=0.103 n=44+41) SSA 3.94s ± 3% 3.93s ± 2% ~ (p=0.833 n=44+42) Flate 126ms ± 6% 125ms ± 4% ~ (p=0.941 n=43+39) GoParser 147ms ± 4% 148ms ± 3% ~ (p=0.164 n=42+39) Reflect 359ms ± 3% 357ms ± 5% ~ (p=0.241 n=43+44) Tar 106ms ± 5% 106ms ± 7% ~ (p=0.853 n=40+43) XML 202ms ± 3% 203ms ± 3% ~ (p=0.488 n=42+41) name old user-ns/op new user-ns/op delta Template 240M ± 4% 239M ± 4% ~ (p=0.844 n=42+43) Unicode 107M ± 5% 107M ± 4% ~ (p=0.332 n=40+43) GoTypes 735M ± 3% 731M ± 4% ~ (p=0.141 n=43+44) Compiler 3.51G ± 3% 3.52G ± 3% ~ (p=0.208 n=42+43) SSA 5.72G ± 4% 5.72G ± 3% ~ (p=0.928 n=44+42) Flate 151M ± 7% 150M ± 8% ~ (p=0.662 n=44+43) GoParser 181M ± 5% 181M ± 4% ~ (p=0.379 n=41+44) Reflect 447M ± 4% 445M ± 4% ~ (p=0.344 n=43+43) Tar 125M ± 7% 124M ± 6% ~ (p=0.353 n=43+43) XML 248M ± 4% 250M ± 6% ~ (p=0.158 n=44+44) name old alloc/op new alloc/op delta Template 40.3MB ± 0% 40.2MB ± 0% -0.27% (p=0.000 n=10+10) Unicode 30.3MB ± 0% 30.2MB ± 0% -0.10% (p=0.015 n=10+10) GoTypes 114MB ± 0% 114MB ± 0% -0.06% (p=0.000 n=7+9) Compiler 480MB ± 0% 481MB ± 0% +0.07% (p=0.000 n=10+10) SSA 864MB ± 0% 862MB ± 0% -0.25% (p=0.000 n=9+10) Flate 25.9MB ± 0% 25.9MB ± 0% ~ (p=0.123 n=10+10) GoParser 32.1MB ± 0% 32.1MB ± 0% ~ (p=0.631 n=10+10) Reflect 79.9MB ± 0% 79.6MB ± 0% -0.39% (p=0.000 n=10+9) Tar 27.1MB ± 0% 27.0MB ± 0% -0.18% (p=0.003 n=10+10) XML 42.6MB ± 0% 42.6MB ± 0% ~ (p=0.143 n=10+10) name old allocs/op new allocs/op delta Template 401k ± 0% 401k ± 1% ~ (p=0.353 n=10+10) Unicode 322k ± 0% 322k ± 0% ~ (p=0.739 n=10+10) GoTypes 1.18M ± 0% 1.18M ± 0% +0.25% (p=0.001 n=7+8) Compiler 4.51M ± 0% 4.53M ± 0% +0.37% (p=0.000 n=10+10) SSA 7.91M ± 0% 7.93M ± 0% +0.20% (p=0.000 n=9+10) Flate 244k ± 0% 245k ± 0% ~ (p=0.123 n=10+10) GoParser 323k ± 1% 324k ± 1% +0.40% (p=0.035 n=10+10) Reflect 1.01M ± 0% 1.02M ± 0% +0.37% (p=0.000 n=10+9) Tar 258k ± 1% 258k ± 1% ~ (p=0.661 n=10+9) XML 403k ± 0% 405k ± 0% +0.47% (p=0.004 n=10+10) Updates #15756 Updates #19250 Change-Id: I647bfbb745c35630447eb79dfcaa994b490ce942 Reviewed-on: https://go-review.googlesource.com/38159 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-14 11:05:03 -07:00
b := s.endBlock()
b.AddEdgeTo(to)
case OFOR, OFORUNTIL:
// OFOR: for Ninit; Left; Right { Nbody }
// For = cond; body; incr
// Foruntil = body; incr; cond
bCond := s.f.NewBlock(ssa.BlockPlain)
bBody := s.f.NewBlock(ssa.BlockPlain)
bIncr := s.f.NewBlock(ssa.BlockPlain)
bEnd := s.f.NewBlock(ssa.BlockPlain)
// first, jump to condition test (OFOR) or body (OFORUNTIL)
b := s.endBlock()
if n.Op == OFOR {
b.AddEdgeTo(bCond)
// generate code to test condition
s.startBlock(bCond)
if n.Left != nil {
s.condBranch(n.Left, bBody, bEnd, 1)
} else {
b := s.endBlock()
b.Kind = ssa.BlockPlain
b.AddEdgeTo(bBody)
}
} else {
b.AddEdgeTo(bBody)
}
// set up for continue/break in body
prevContinue := s.continueTo
prevBreak := s.breakTo
s.continueTo = bIncr
s.breakTo = bEnd
lab := s.labeledNodes[n]
if lab != nil {
// labeled for loop
lab.continueTarget = bIncr
lab.breakTarget = bEnd
}
// generate body
s.startBlock(bBody)
s.stmtList(n.Nbody)
// tear down continue/break
s.continueTo = prevContinue
s.breakTo = prevBreak
if lab != nil {
lab.continueTarget = nil
lab.breakTarget = nil
}
// done with body, goto incr
if b := s.endBlock(); b != nil {
b.AddEdgeTo(bIncr)
}
// generate incr
s.startBlock(bIncr)
if n.Right != nil {
s.stmt(n.Right)
}
if b := s.endBlock(); b != nil {
b.AddEdgeTo(bCond)
}
if n.Op == OFORUNTIL {
// generate code to test condition
s.startBlock(bCond)
if n.Left != nil {
s.condBranch(n.Left, bBody, bEnd, 1)
} else {
b := s.endBlock()
b.Kind = ssa.BlockPlain
b.AddEdgeTo(bBody)
}
}
s.startBlock(bEnd)
case OSWITCH, OSELECT:
// These have been mostly rewritten by the front end into their Nbody fields.
// Our main task is to correctly hook up any break statements.
bEnd := s.f.NewBlock(ssa.BlockPlain)
prevBreak := s.breakTo
s.breakTo = bEnd
lab := s.labeledNodes[n]
if lab != nil {
// labeled
lab.breakTarget = bEnd
}
// generate body code
s.stmtList(n.Nbody)
s.breakTo = prevBreak
if lab != nil {
lab.breakTarget = nil
}
// walk adds explicit OBREAK nodes to the end of all reachable code paths.
// If we still have a current block here, then mark it unreachable.
if s.curBlock != nil {
m := s.mem()
b := s.endBlock()
b.Kind = ssa.BlockExit
b.SetControl(m)
}
s.startBlock(bEnd)
case OVARKILL:
// Insert a varkill op to record that a variable is no longer live.
// We only care about liveness info at call sites, so putting the
// varkill in the store chain is enough to keep it correctly ordered
// with respect to call ops.
if !s.canSSA(n.Left) {
s.vars[&memVar] = s.newValue1A(ssa.OpVarKill, ssa.TypeMem, n.Left, s.mem())
}
case OVARLIVE:
// Insert a varlive op to record that a variable is still live.
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
if !n.Left.Addrtaken() {
s.Fatalf("VARLIVE variable %v must have Addrtaken set", n.Left)
}
s.vars[&memVar] = s.newValue1A(ssa.OpVarLive, ssa.TypeMem, n.Left, s.mem())
case OCHECKNIL:
p := s.expr(n.Left)
s.nilCheck(p)
default:
s.Fatalf("unhandled stmt %v", n.Op)
}
}
// exit processes any code that needs to be generated just before returning.
// It returns a BlockRet block that ends the control flow. Its control value
// will be set to the final memory state.
func (s *state) exit() *ssa.Block {
if s.hasdefer {
s.rtcall(Deferreturn, true, nil)
}
// Run exit code. Typically, this code copies heap-allocated PPARAMOUT
// variables back to the stack.
s.stmtList(s.exitCode)
// Store SSAable PPARAMOUT variables back to stack locations.
for _, n := range s.returns {
addr := s.decladdrs[n]
val := s.variable(n, n.Type)
s.vars[&memVar] = s.newValue1A(ssa.OpVarDef, ssa.TypeMem, n, s.mem())
s.vars[&memVar] = s.newValue3A(ssa.OpStore, ssa.TypeMem, n.Type, addr, val, s.mem())
// TODO: if val is ever spilled, we'd like to use the
// PPARAMOUT slot for spilling it. That won't happen
// currently.
}
// Do actual return.
m := s.mem()
b := s.endBlock()
b.Kind = ssa.BlockRet
b.SetControl(m)
return b
}
type opAndType struct {
op Op
etype EType
}
var opToSSA = map[opAndType]ssa.Op{
opAndType{OADD, TINT8}: ssa.OpAdd8,
opAndType{OADD, TUINT8}: ssa.OpAdd8,
opAndType{OADD, TINT16}: ssa.OpAdd16,
opAndType{OADD, TUINT16}: ssa.OpAdd16,
opAndType{OADD, TINT32}: ssa.OpAdd32,
opAndType{OADD, TUINT32}: ssa.OpAdd32,
opAndType{OADD, TPTR32}: ssa.OpAdd32,
opAndType{OADD, TINT64}: ssa.OpAdd64,
opAndType{OADD, TUINT64}: ssa.OpAdd64,
opAndType{OADD, TPTR64}: ssa.OpAdd64,
opAndType{OADD, TFLOAT32}: ssa.OpAdd32F,
opAndType{OADD, TFLOAT64}: ssa.OpAdd64F,
opAndType{OSUB, TINT8}: ssa.OpSub8,
opAndType{OSUB, TUINT8}: ssa.OpSub8,
opAndType{OSUB, TINT16}: ssa.OpSub16,
opAndType{OSUB, TUINT16}: ssa.OpSub16,
opAndType{OSUB, TINT32}: ssa.OpSub32,
opAndType{OSUB, TUINT32}: ssa.OpSub32,
opAndType{OSUB, TINT64}: ssa.OpSub64,
opAndType{OSUB, TUINT64}: ssa.OpSub64,
opAndType{OSUB, TFLOAT32}: ssa.OpSub32F,
opAndType{OSUB, TFLOAT64}: ssa.OpSub64F,
opAndType{ONOT, TBOOL}: ssa.OpNot,
opAndType{OMINUS, TINT8}: ssa.OpNeg8,
opAndType{OMINUS, TUINT8}: ssa.OpNeg8,
opAndType{OMINUS, TINT16}: ssa.OpNeg16,
opAndType{OMINUS, TUINT16}: ssa.OpNeg16,
opAndType{OMINUS, TINT32}: ssa.OpNeg32,
opAndType{OMINUS, TUINT32}: ssa.OpNeg32,
opAndType{OMINUS, TINT64}: ssa.OpNeg64,
opAndType{OMINUS, TUINT64}: ssa.OpNeg64,
opAndType{OMINUS, TFLOAT32}: ssa.OpNeg32F,
opAndType{OMINUS, TFLOAT64}: ssa.OpNeg64F,
opAndType{OCOM, TINT8}: ssa.OpCom8,
opAndType{OCOM, TUINT8}: ssa.OpCom8,
opAndType{OCOM, TINT16}: ssa.OpCom16,
opAndType{OCOM, TUINT16}: ssa.OpCom16,
opAndType{OCOM, TINT32}: ssa.OpCom32,
opAndType{OCOM, TUINT32}: ssa.OpCom32,
opAndType{OCOM, TINT64}: ssa.OpCom64,
opAndType{OCOM, TUINT64}: ssa.OpCom64,
opAndType{OIMAG, TCOMPLEX64}: ssa.OpComplexImag,
opAndType{OIMAG, TCOMPLEX128}: ssa.OpComplexImag,
opAndType{OREAL, TCOMPLEX64}: ssa.OpComplexReal,
opAndType{OREAL, TCOMPLEX128}: ssa.OpComplexReal,
opAndType{OMUL, TINT8}: ssa.OpMul8,
opAndType{OMUL, TUINT8}: ssa.OpMul8,
opAndType{OMUL, TINT16}: ssa.OpMul16,
opAndType{OMUL, TUINT16}: ssa.OpMul16,
opAndType{OMUL, TINT32}: ssa.OpMul32,
opAndType{OMUL, TUINT32}: ssa.OpMul32,
opAndType{OMUL, TINT64}: ssa.OpMul64,
opAndType{OMUL, TUINT64}: ssa.OpMul64,
opAndType{OMUL, TFLOAT32}: ssa.OpMul32F,
opAndType{OMUL, TFLOAT64}: ssa.OpMul64F,
opAndType{ODIV, TFLOAT32}: ssa.OpDiv32F,
opAndType{ODIV, TFLOAT64}: ssa.OpDiv64F,
opAndType{ODIV, TINT8}: ssa.OpDiv8,
opAndType{ODIV, TUINT8}: ssa.OpDiv8u,
opAndType{ODIV, TINT16}: ssa.OpDiv16,
opAndType{ODIV, TUINT16}: ssa.OpDiv16u,
opAndType{ODIV, TINT32}: ssa.OpDiv32,
opAndType{ODIV, TUINT32}: ssa.OpDiv32u,
opAndType{ODIV, TINT64}: ssa.OpDiv64,
opAndType{ODIV, TUINT64}: ssa.OpDiv64u,
opAndType{OMOD, TINT8}: ssa.OpMod8,
opAndType{OMOD, TUINT8}: ssa.OpMod8u,
opAndType{OMOD, TINT16}: ssa.OpMod16,
opAndType{OMOD, TUINT16}: ssa.OpMod16u,
opAndType{OMOD, TINT32}: ssa.OpMod32,
opAndType{OMOD, TUINT32}: ssa.OpMod32u,
opAndType{OMOD, TINT64}: ssa.OpMod64,
opAndType{OMOD, TUINT64}: ssa.OpMod64u,
opAndType{OAND, TINT8}: ssa.OpAnd8,
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
opAndType{OAND, TUINT8}: ssa.OpAnd8,
opAndType{OAND, TINT16}: ssa.OpAnd16,
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
opAndType{OAND, TUINT16}: ssa.OpAnd16,
opAndType{OAND, TINT32}: ssa.OpAnd32,
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
opAndType{OAND, TUINT32}: ssa.OpAnd32,
opAndType{OAND, TINT64}: ssa.OpAnd64,
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
opAndType{OAND, TUINT64}: ssa.OpAnd64,
opAndType{OOR, TINT8}: ssa.OpOr8,
opAndType{OOR, TUINT8}: ssa.OpOr8,
opAndType{OOR, TINT16}: ssa.OpOr16,
opAndType{OOR, TUINT16}: ssa.OpOr16,
opAndType{OOR, TINT32}: ssa.OpOr32,
opAndType{OOR, TUINT32}: ssa.OpOr32,
opAndType{OOR, TINT64}: ssa.OpOr64,
opAndType{OOR, TUINT64}: ssa.OpOr64,
opAndType{OXOR, TINT8}: ssa.OpXor8,
opAndType{OXOR, TUINT8}: ssa.OpXor8,
opAndType{OXOR, TINT16}: ssa.OpXor16,
opAndType{OXOR, TUINT16}: ssa.OpXor16,
opAndType{OXOR, TINT32}: ssa.OpXor32,
opAndType{OXOR, TUINT32}: ssa.OpXor32,
opAndType{OXOR, TINT64}: ssa.OpXor64,
opAndType{OXOR, TUINT64}: ssa.OpXor64,
opAndType{OEQ, TBOOL}: ssa.OpEqB,
opAndType{OEQ, TINT8}: ssa.OpEq8,
opAndType{OEQ, TUINT8}: ssa.OpEq8,
opAndType{OEQ, TINT16}: ssa.OpEq16,
opAndType{OEQ, TUINT16}: ssa.OpEq16,
opAndType{OEQ, TINT32}: ssa.OpEq32,
opAndType{OEQ, TUINT32}: ssa.OpEq32,
opAndType{OEQ, TINT64}: ssa.OpEq64,
opAndType{OEQ, TUINT64}: ssa.OpEq64,
opAndType{OEQ, TINTER}: ssa.OpEqInter,
opAndType{OEQ, TSLICE}: ssa.OpEqSlice,
opAndType{OEQ, TFUNC}: ssa.OpEqPtr,
opAndType{OEQ, TMAP}: ssa.OpEqPtr,
opAndType{OEQ, TCHAN}: ssa.OpEqPtr,
opAndType{OEQ, TPTR32}: ssa.OpEqPtr,
opAndType{OEQ, TPTR64}: ssa.OpEqPtr,
opAndType{OEQ, TUINTPTR}: ssa.OpEqPtr,
opAndType{OEQ, TUNSAFEPTR}: ssa.OpEqPtr,
opAndType{OEQ, TFLOAT64}: ssa.OpEq64F,
opAndType{OEQ, TFLOAT32}: ssa.OpEq32F,
opAndType{ONE, TBOOL}: ssa.OpNeqB,
opAndType{ONE, TINT8}: ssa.OpNeq8,
opAndType{ONE, TUINT8}: ssa.OpNeq8,
opAndType{ONE, TINT16}: ssa.OpNeq16,
opAndType{ONE, TUINT16}: ssa.OpNeq16,
opAndType{ONE, TINT32}: ssa.OpNeq32,
opAndType{ONE, TUINT32}: ssa.OpNeq32,
opAndType{ONE, TINT64}: ssa.OpNeq64,
opAndType{ONE, TUINT64}: ssa.OpNeq64,
opAndType{ONE, TINTER}: ssa.OpNeqInter,
opAndType{ONE, TSLICE}: ssa.OpNeqSlice,
opAndType{ONE, TFUNC}: ssa.OpNeqPtr,
opAndType{ONE, TMAP}: ssa.OpNeqPtr,
opAndType{ONE, TCHAN}: ssa.OpNeqPtr,
opAndType{ONE, TPTR32}: ssa.OpNeqPtr,
opAndType{ONE, TPTR64}: ssa.OpNeqPtr,
opAndType{ONE, TUINTPTR}: ssa.OpNeqPtr,
opAndType{ONE, TUNSAFEPTR}: ssa.OpNeqPtr,
opAndType{ONE, TFLOAT64}: ssa.OpNeq64F,
opAndType{ONE, TFLOAT32}: ssa.OpNeq32F,
opAndType{OLT, TINT8}: ssa.OpLess8,
opAndType{OLT, TUINT8}: ssa.OpLess8U,
opAndType{OLT, TINT16}: ssa.OpLess16,
opAndType{OLT, TUINT16}: ssa.OpLess16U,
opAndType{OLT, TINT32}: ssa.OpLess32,
opAndType{OLT, TUINT32}: ssa.OpLess32U,
opAndType{OLT, TINT64}: ssa.OpLess64,
opAndType{OLT, TUINT64}: ssa.OpLess64U,
opAndType{OLT, TFLOAT64}: ssa.OpLess64F,
opAndType{OLT, TFLOAT32}: ssa.OpLess32F,
opAndType{OGT, TINT8}: ssa.OpGreater8,
opAndType{OGT, TUINT8}: ssa.OpGreater8U,
opAndType{OGT, TINT16}: ssa.OpGreater16,
opAndType{OGT, TUINT16}: ssa.OpGreater16U,
opAndType{OGT, TINT32}: ssa.OpGreater32,
opAndType{OGT, TUINT32}: ssa.OpGreater32U,
opAndType{OGT, TINT64}: ssa.OpGreater64,
opAndType{OGT, TUINT64}: ssa.OpGreater64U,
opAndType{OGT, TFLOAT64}: ssa.OpGreater64F,
opAndType{OGT, TFLOAT32}: ssa.OpGreater32F,
opAndType{OLE, TINT8}: ssa.OpLeq8,
opAndType{OLE, TUINT8}: ssa.OpLeq8U,
opAndType{OLE, TINT16}: ssa.OpLeq16,
opAndType{OLE, TUINT16}: ssa.OpLeq16U,
opAndType{OLE, TINT32}: ssa.OpLeq32,
opAndType{OLE, TUINT32}: ssa.OpLeq32U,
opAndType{OLE, TINT64}: ssa.OpLeq64,
opAndType{OLE, TUINT64}: ssa.OpLeq64U,
opAndType{OLE, TFLOAT64}: ssa.OpLeq64F,
opAndType{OLE, TFLOAT32}: ssa.OpLeq32F,
opAndType{OGE, TINT8}: ssa.OpGeq8,
opAndType{OGE, TUINT8}: ssa.OpGeq8U,
opAndType{OGE, TINT16}: ssa.OpGeq16,
opAndType{OGE, TUINT16}: ssa.OpGeq16U,
opAndType{OGE, TINT32}: ssa.OpGeq32,
opAndType{OGE, TUINT32}: ssa.OpGeq32U,
opAndType{OGE, TINT64}: ssa.OpGeq64,
opAndType{OGE, TUINT64}: ssa.OpGeq64U,
opAndType{OGE, TFLOAT64}: ssa.OpGeq64F,
opAndType{OGE, TFLOAT32}: ssa.OpGeq32F,
}
func (s *state) concreteEtype(t *Type) EType {
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
e := t.Etype
switch e {
default:
return e
case TINT:
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
if s.config.IntSize == 8 {
return TINT64
}
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
return TINT32
case TUINT:
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
if s.config.IntSize == 8 {
return TUINT64
}
return TUINT32
case TUINTPTR:
if s.config.PtrSize == 8 {
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
return TUINT64
}
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
return TUINT32
}
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
}
func (s *state) ssaOp(op Op, t *Type) ssa.Op {
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
etype := s.concreteEtype(t)
x, ok := opToSSA[opAndType{op, etype}]
if !ok {
s.Fatalf("unhandled binary op %v %s", op, etype)
}
return x
}
func floatForComplex(t *Type) *Type {
if t.Size() == 8 {
return Types[TFLOAT32]
} else {
return Types[TFLOAT64]
}
}
type opAndTwoTypes struct {
op Op
etype1 EType
etype2 EType
}
type twoTypes struct {
etype1 EType
etype2 EType
}
type twoOpsAndType struct {
op1 ssa.Op
op2 ssa.Op
intermediateType EType
}
var fpConvOpToSSA = map[twoTypes]twoOpsAndType{
twoTypes{TINT8, TFLOAT32}: twoOpsAndType{ssa.OpSignExt8to32, ssa.OpCvt32to32F, TINT32},
twoTypes{TINT16, TFLOAT32}: twoOpsAndType{ssa.OpSignExt16to32, ssa.OpCvt32to32F, TINT32},
twoTypes{TINT32, TFLOAT32}: twoOpsAndType{ssa.OpCopy, ssa.OpCvt32to32F, TINT32},
twoTypes{TINT64, TFLOAT32}: twoOpsAndType{ssa.OpCopy, ssa.OpCvt64to32F, TINT64},
twoTypes{TINT8, TFLOAT64}: twoOpsAndType{ssa.OpSignExt8to32, ssa.OpCvt32to64F, TINT32},
twoTypes{TINT16, TFLOAT64}: twoOpsAndType{ssa.OpSignExt16to32, ssa.OpCvt32to64F, TINT32},
twoTypes{TINT32, TFLOAT64}: twoOpsAndType{ssa.OpCopy, ssa.OpCvt32to64F, TINT32},
twoTypes{TINT64, TFLOAT64}: twoOpsAndType{ssa.OpCopy, ssa.OpCvt64to64F, TINT64},
twoTypes{TFLOAT32, TINT8}: twoOpsAndType{ssa.OpCvt32Fto32, ssa.OpTrunc32to8, TINT32},
twoTypes{TFLOAT32, TINT16}: twoOpsAndType{ssa.OpCvt32Fto32, ssa.OpTrunc32to16, TINT32},
twoTypes{TFLOAT32, TINT32}: twoOpsAndType{ssa.OpCvt32Fto32, ssa.OpCopy, TINT32},
twoTypes{TFLOAT32, TINT64}: twoOpsAndType{ssa.OpCvt32Fto64, ssa.OpCopy, TINT64},
twoTypes{TFLOAT64, TINT8}: twoOpsAndType{ssa.OpCvt64Fto32, ssa.OpTrunc32to8, TINT32},
twoTypes{TFLOAT64, TINT16}: twoOpsAndType{ssa.OpCvt64Fto32, ssa.OpTrunc32to16, TINT32},
twoTypes{TFLOAT64, TINT32}: twoOpsAndType{ssa.OpCvt64Fto32, ssa.OpCopy, TINT32},
twoTypes{TFLOAT64, TINT64}: twoOpsAndType{ssa.OpCvt64Fto64, ssa.OpCopy, TINT64},
// unsigned
twoTypes{TUINT8, TFLOAT32}: twoOpsAndType{ssa.OpZeroExt8to32, ssa.OpCvt32to32F, TINT32},
twoTypes{TUINT16, TFLOAT32}: twoOpsAndType{ssa.OpZeroExt16to32, ssa.OpCvt32to32F, TINT32},
twoTypes{TUINT32, TFLOAT32}: twoOpsAndType{ssa.OpZeroExt32to64, ssa.OpCvt64to32F, TINT64}, // go wide to dodge unsigned
twoTypes{TUINT64, TFLOAT32}: twoOpsAndType{ssa.OpCopy, ssa.OpInvalid, TUINT64}, // Cvt64Uto32F, branchy code expansion instead
twoTypes{TUINT8, TFLOAT64}: twoOpsAndType{ssa.OpZeroExt8to32, ssa.OpCvt32to64F, TINT32},
twoTypes{TUINT16, TFLOAT64}: twoOpsAndType{ssa.OpZeroExt16to32, ssa.OpCvt32to64F, TINT32},
twoTypes{TUINT32, TFLOAT64}: twoOpsAndType{ssa.OpZeroExt32to64, ssa.OpCvt64to64F, TINT64}, // go wide to dodge unsigned
twoTypes{TUINT64, TFLOAT64}: twoOpsAndType{ssa.OpCopy, ssa.OpInvalid, TUINT64}, // Cvt64Uto64F, branchy code expansion instead
twoTypes{TFLOAT32, TUINT8}: twoOpsAndType{ssa.OpCvt32Fto32, ssa.OpTrunc32to8, TINT32},
twoTypes{TFLOAT32, TUINT16}: twoOpsAndType{ssa.OpCvt32Fto32, ssa.OpTrunc32to16, TINT32},
twoTypes{TFLOAT32, TUINT32}: twoOpsAndType{ssa.OpCvt32Fto64, ssa.OpTrunc64to32, TINT64}, // go wide to dodge unsigned
twoTypes{TFLOAT32, TUINT64}: twoOpsAndType{ssa.OpInvalid, ssa.OpCopy, TUINT64}, // Cvt32Fto64U, branchy code expansion instead
twoTypes{TFLOAT64, TUINT8}: twoOpsAndType{ssa.OpCvt64Fto32, ssa.OpTrunc32to8, TINT32},
twoTypes{TFLOAT64, TUINT16}: twoOpsAndType{ssa.OpCvt64Fto32, ssa.OpTrunc32to16, TINT32},
twoTypes{TFLOAT64, TUINT32}: twoOpsAndType{ssa.OpCvt64Fto64, ssa.OpTrunc64to32, TINT64}, // go wide to dodge unsigned
twoTypes{TFLOAT64, TUINT64}: twoOpsAndType{ssa.OpInvalid, ssa.OpCopy, TUINT64}, // Cvt64Fto64U, branchy code expansion instead
// float
twoTypes{TFLOAT64, TFLOAT32}: twoOpsAndType{ssa.OpCvt64Fto32F, ssa.OpCopy, TFLOAT32},
twoTypes{TFLOAT64, TFLOAT64}: twoOpsAndType{ssa.OpRound64F, ssa.OpCopy, TFLOAT64},
twoTypes{TFLOAT32, TFLOAT32}: twoOpsAndType{ssa.OpRound32F, ssa.OpCopy, TFLOAT32},
twoTypes{TFLOAT32, TFLOAT64}: twoOpsAndType{ssa.OpCvt32Fto64F, ssa.OpCopy, TFLOAT64},
}
// this map is used only for 32-bit arch, and only includes the difference
// on 32-bit arch, don't use int64<->float conversion for uint32
var fpConvOpToSSA32 = map[twoTypes]twoOpsAndType{
twoTypes{TUINT32, TFLOAT32}: twoOpsAndType{ssa.OpCopy, ssa.OpCvt32Uto32F, TUINT32},
twoTypes{TUINT32, TFLOAT64}: twoOpsAndType{ssa.OpCopy, ssa.OpCvt32Uto64F, TUINT32},
twoTypes{TFLOAT32, TUINT32}: twoOpsAndType{ssa.OpCvt32Fto32U, ssa.OpCopy, TUINT32},
twoTypes{TFLOAT64, TUINT32}: twoOpsAndType{ssa.OpCvt64Fto32U, ssa.OpCopy, TUINT32},
}
// uint64<->float conversions, only on machines that have intructions for that
var uint64fpConvOpToSSA = map[twoTypes]twoOpsAndType{
twoTypes{TUINT64, TFLOAT32}: twoOpsAndType{ssa.OpCopy, ssa.OpCvt64Uto32F, TUINT64},
twoTypes{TUINT64, TFLOAT64}: twoOpsAndType{ssa.OpCopy, ssa.OpCvt64Uto64F, TUINT64},
twoTypes{TFLOAT32, TUINT64}: twoOpsAndType{ssa.OpCvt32Fto64U, ssa.OpCopy, TUINT64},
twoTypes{TFLOAT64, TUINT64}: twoOpsAndType{ssa.OpCvt64Fto64U, ssa.OpCopy, TUINT64},
}
var shiftOpToSSA = map[opAndTwoTypes]ssa.Op{
opAndTwoTypes{OLSH, TINT8, TUINT8}: ssa.OpLsh8x8,
opAndTwoTypes{OLSH, TUINT8, TUINT8}: ssa.OpLsh8x8,
opAndTwoTypes{OLSH, TINT8, TUINT16}: ssa.OpLsh8x16,
opAndTwoTypes{OLSH, TUINT8, TUINT16}: ssa.OpLsh8x16,
opAndTwoTypes{OLSH, TINT8, TUINT32}: ssa.OpLsh8x32,
opAndTwoTypes{OLSH, TUINT8, TUINT32}: ssa.OpLsh8x32,
opAndTwoTypes{OLSH, TINT8, TUINT64}: ssa.OpLsh8x64,
opAndTwoTypes{OLSH, TUINT8, TUINT64}: ssa.OpLsh8x64,
opAndTwoTypes{OLSH, TINT16, TUINT8}: ssa.OpLsh16x8,
opAndTwoTypes{OLSH, TUINT16, TUINT8}: ssa.OpLsh16x8,
opAndTwoTypes{OLSH, TINT16, TUINT16}: ssa.OpLsh16x16,
opAndTwoTypes{OLSH, TUINT16, TUINT16}: ssa.OpLsh16x16,
opAndTwoTypes{OLSH, TINT16, TUINT32}: ssa.OpLsh16x32,
opAndTwoTypes{OLSH, TUINT16, TUINT32}: ssa.OpLsh16x32,
opAndTwoTypes{OLSH, TINT16, TUINT64}: ssa.OpLsh16x64,
opAndTwoTypes{OLSH, TUINT16, TUINT64}: ssa.OpLsh16x64,
opAndTwoTypes{OLSH, TINT32, TUINT8}: ssa.OpLsh32x8,
opAndTwoTypes{OLSH, TUINT32, TUINT8}: ssa.OpLsh32x8,
opAndTwoTypes{OLSH, TINT32, TUINT16}: ssa.OpLsh32x16,
opAndTwoTypes{OLSH, TUINT32, TUINT16}: ssa.OpLsh32x16,
opAndTwoTypes{OLSH, TINT32, TUINT32}: ssa.OpLsh32x32,
opAndTwoTypes{OLSH, TUINT32, TUINT32}: ssa.OpLsh32x32,
opAndTwoTypes{OLSH, TINT32, TUINT64}: ssa.OpLsh32x64,
opAndTwoTypes{OLSH, TUINT32, TUINT64}: ssa.OpLsh32x64,
opAndTwoTypes{OLSH, TINT64, TUINT8}: ssa.OpLsh64x8,
opAndTwoTypes{OLSH, TUINT64, TUINT8}: ssa.OpLsh64x8,
opAndTwoTypes{OLSH, TINT64, TUINT16}: ssa.OpLsh64x16,
opAndTwoTypes{OLSH, TUINT64, TUINT16}: ssa.OpLsh64x16,
opAndTwoTypes{OLSH, TINT64, TUINT32}: ssa.OpLsh64x32,
opAndTwoTypes{OLSH, TUINT64, TUINT32}: ssa.OpLsh64x32,
opAndTwoTypes{OLSH, TINT64, TUINT64}: ssa.OpLsh64x64,
opAndTwoTypes{OLSH, TUINT64, TUINT64}: ssa.OpLsh64x64,
opAndTwoTypes{ORSH, TINT8, TUINT8}: ssa.OpRsh8x8,
opAndTwoTypes{ORSH, TUINT8, TUINT8}: ssa.OpRsh8Ux8,
opAndTwoTypes{ORSH, TINT8, TUINT16}: ssa.OpRsh8x16,
opAndTwoTypes{ORSH, TUINT8, TUINT16}: ssa.OpRsh8Ux16,
opAndTwoTypes{ORSH, TINT8, TUINT32}: ssa.OpRsh8x32,
opAndTwoTypes{ORSH, TUINT8, TUINT32}: ssa.OpRsh8Ux32,
opAndTwoTypes{ORSH, TINT8, TUINT64}: ssa.OpRsh8x64,
opAndTwoTypes{ORSH, TUINT8, TUINT64}: ssa.OpRsh8Ux64,
opAndTwoTypes{ORSH, TINT16, TUINT8}: ssa.OpRsh16x8,
opAndTwoTypes{ORSH, TUINT16, TUINT8}: ssa.OpRsh16Ux8,
opAndTwoTypes{ORSH, TINT16, TUINT16}: ssa.OpRsh16x16,
opAndTwoTypes{ORSH, TUINT16, TUINT16}: ssa.OpRsh16Ux16,
opAndTwoTypes{ORSH, TINT16, TUINT32}: ssa.OpRsh16x32,
opAndTwoTypes{ORSH, TUINT16, TUINT32}: ssa.OpRsh16Ux32,
opAndTwoTypes{ORSH, TINT16, TUINT64}: ssa.OpRsh16x64,
opAndTwoTypes{ORSH, TUINT16, TUINT64}: ssa.OpRsh16Ux64,
opAndTwoTypes{ORSH, TINT32, TUINT8}: ssa.OpRsh32x8,
opAndTwoTypes{ORSH, TUINT32, TUINT8}: ssa.OpRsh32Ux8,
opAndTwoTypes{ORSH, TINT32, TUINT16}: ssa.OpRsh32x16,
opAndTwoTypes{ORSH, TUINT32, TUINT16}: ssa.OpRsh32Ux16,
opAndTwoTypes{ORSH, TINT32, TUINT32}: ssa.OpRsh32x32,
opAndTwoTypes{ORSH, TUINT32, TUINT32}: ssa.OpRsh32Ux32,
opAndTwoTypes{ORSH, TINT32, TUINT64}: ssa.OpRsh32x64,
opAndTwoTypes{ORSH, TUINT32, TUINT64}: ssa.OpRsh32Ux64,
opAndTwoTypes{ORSH, TINT64, TUINT8}: ssa.OpRsh64x8,
opAndTwoTypes{ORSH, TUINT64, TUINT8}: ssa.OpRsh64Ux8,
opAndTwoTypes{ORSH, TINT64, TUINT16}: ssa.OpRsh64x16,
opAndTwoTypes{ORSH, TUINT64, TUINT16}: ssa.OpRsh64Ux16,
opAndTwoTypes{ORSH, TINT64, TUINT32}: ssa.OpRsh64x32,
opAndTwoTypes{ORSH, TUINT64, TUINT32}: ssa.OpRsh64Ux32,
opAndTwoTypes{ORSH, TINT64, TUINT64}: ssa.OpRsh64x64,
opAndTwoTypes{ORSH, TUINT64, TUINT64}: ssa.OpRsh64Ux64,
}
func (s *state) ssaShiftOp(op Op, t *Type, u *Type) ssa.Op {
etype1 := s.concreteEtype(t)
etype2 := s.concreteEtype(u)
x, ok := shiftOpToSSA[opAndTwoTypes{op, etype1, etype2}]
if !ok {
s.Fatalf("unhandled shift op %v etype=%s/%s", op, etype1, etype2)
}
return x
}
// expr converts the expression n to ssa, adds it to s and returns the ssa result.
func (s *state) expr(n *Node) *ssa.Value {
if !(n.Op == ONAME || n.Op == OLITERAL && n.Sym != nil) {
// ONAMEs and named OLITERALs have the line number
// of the decl, not the use. See issue 14742.
s.pushLine(n.Pos)
defer s.popLine()
}
s.stmtList(n.Ninit)
switch n.Op {
case OARRAYBYTESTRTMP:
slice := s.expr(n.Left)
ptr := s.newValue1(ssa.OpSlicePtr, ptrto(Types[TUINT8]), slice)
len := s.newValue1(ssa.OpSliceLen, Types[TINT], slice)
return s.newValue2(ssa.OpStringMake, n.Type, ptr, len)
case OSTRARRAYBYTETMP:
str := s.expr(n.Left)
ptr := s.newValue1(ssa.OpStringPtr, ptrto(Types[TUINT8]), str)
len := s.newValue1(ssa.OpStringLen, Types[TINT], str)
return s.newValue3(ssa.OpSliceMake, n.Type, ptr, len, len)
case OCFUNC:
aux := s.lookupSymbol(n, &ssa.ExternSymbol{Typ: n.Type, Sym: Linksym(n.Left.Sym)})
return s.entryNewValue1A(ssa.OpAddr, n.Type, aux, s.sb)
case ONAME:
if n.Class == PFUNC {
// "value" of a function is the address of the function's closure
sym := Linksym(funcsym(n.Sym))
aux := &ssa.ExternSymbol{Typ: n.Type, Sym: sym}
return s.entryNewValue1A(ssa.OpAddr, ptrto(n.Type), aux, s.sb)
}
if s.canSSA(n) {
return s.variable(n, n.Type)
}
addr := s.addr(n, false)
return s.newValue2(ssa.OpLoad, n.Type, addr, s.mem())
case OCLOSUREVAR:
addr := s.addr(n, false)
return s.newValue2(ssa.OpLoad, n.Type, addr, s.mem())
case OLITERAL:
switch u := n.Val().U.(type) {
case *Mpint:
i := u.Int64()
switch n.Type.Size() {
case 1:
return s.constInt8(n.Type, int8(i))
case 2:
return s.constInt16(n.Type, int16(i))
case 4:
return s.constInt32(n.Type, int32(i))
case 8:
return s.constInt64(n.Type, i)
default:
s.Fatalf("bad integer size %d", n.Type.Size())
return nil
}
case string:
if u == "" {
return s.constEmptyString(n.Type)
}
return s.entryNewValue0A(ssa.OpConstString, n.Type, u)
case bool:
return s.constBool(u)
case *NilVal:
t := n.Type
switch {
case t.IsSlice():
return s.constSlice(t)
case t.IsInterface():
return s.constInterface(t)
default:
return s.constNil(t)
}
case *Mpflt:
switch n.Type.Size() {
case 4:
return s.constFloat32(n.Type, u.Float32())
case 8:
return s.constFloat64(n.Type, u.Float64())
default:
s.Fatalf("bad float size %d", n.Type.Size())
return nil
}
case *Mpcplx:
r := &u.Real
i := &u.Imag
switch n.Type.Size() {
case 8:
pt := Types[TFLOAT32]
return s.newValue2(ssa.OpComplexMake, n.Type,
s.constFloat32(pt, r.Float32()),
s.constFloat32(pt, i.Float32()))
case 16:
pt := Types[TFLOAT64]
return s.newValue2(ssa.OpComplexMake, n.Type,
s.constFloat64(pt, r.Float64()),
s.constFloat64(pt, i.Float64()))
default:
s.Fatalf("bad float size %d", n.Type.Size())
return nil
}
default:
s.Fatalf("unhandled OLITERAL %v", n.Val().Ctype())
return nil
}
case OCONVNOP:
to := n.Type
from := n.Left.Type
// Assume everything will work out, so set up our return value.
// Anything interesting that happens from here is a fatal.
x := s.expr(n.Left)
// Special case for not confusing GC and liveness.
// We don't want pointers accidentally classified
// as not-pointers or vice-versa because of copy
// elision.
if to.IsPtrShaped() != from.IsPtrShaped() {
return s.newValue2(ssa.OpConvert, to, x, s.mem())
}
v := s.newValue1(ssa.OpCopy, to, x) // ensure that v has the right type
// CONVNOP closure
if to.Etype == TFUNC && from.IsPtrShaped() {
return v
}
// named <--> unnamed type or typed <--> untyped const
if from.Etype == to.Etype {
return v
}
// unsafe.Pointer <--> *T
if to.Etype == TUNSAFEPTR && from.IsPtr() || from.Etype == TUNSAFEPTR && to.IsPtr() {
return v
}
dowidth(from)
dowidth(to)
if from.Width != to.Width {
s.Fatalf("CONVNOP width mismatch %v (%d) -> %v (%d)\n", from, from.Width, to, to.Width)
return nil
}
if etypesign(from.Etype) != etypesign(to.Etype) {
s.Fatalf("CONVNOP sign mismatch %v (%s) -> %v (%s)\n", from, from.Etype, to, to.Etype)
return nil
}
if instrumenting {
// These appear to be fine, but they fail the
// integer constraint below, so okay them here.
// Sample non-integer conversion: map[string]string -> *uint8
return v
}
if etypesign(from.Etype) == 0 {
s.Fatalf("CONVNOP unrecognized non-integer %v -> %v\n", from, to)
return nil
}
// integer, same width, same sign
return v
case OCONV:
x := s.expr(n.Left)
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
ft := n.Left.Type // from type
tt := n.Type // to type
if ft.IsBoolean() && tt.IsKind(TUINT8) {
// Bool -> uint8 is generated internally when indexing into runtime.staticbyte.
return s.newValue1(ssa.OpCopy, n.Type, x)
}
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
if ft.IsInteger() && tt.IsInteger() {
var op ssa.Op
if tt.Size() == ft.Size() {
op = ssa.OpCopy
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
} else if tt.Size() < ft.Size() {
// truncation
switch 10*ft.Size() + tt.Size() {
case 21:
op = ssa.OpTrunc16to8
case 41:
op = ssa.OpTrunc32to8
case 42:
op = ssa.OpTrunc32to16
case 81:
op = ssa.OpTrunc64to8
case 82:
op = ssa.OpTrunc64to16
case 84:
op = ssa.OpTrunc64to32
default:
s.Fatalf("weird integer truncation %v -> %v", ft, tt)
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
}
} else if ft.IsSigned() {
// sign extension
switch 10*ft.Size() + tt.Size() {
case 12:
op = ssa.OpSignExt8to16
case 14:
op = ssa.OpSignExt8to32
case 18:
op = ssa.OpSignExt8to64
case 24:
op = ssa.OpSignExt16to32
case 28:
op = ssa.OpSignExt16to64
case 48:
op = ssa.OpSignExt32to64
default:
s.Fatalf("bad integer sign extension %v -> %v", ft, tt)
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
}
} else {
// zero extension
switch 10*ft.Size() + tt.Size() {
case 12:
op = ssa.OpZeroExt8to16
case 14:
op = ssa.OpZeroExt8to32
case 18:
op = ssa.OpZeroExt8to64
case 24:
op = ssa.OpZeroExt16to32
case 28:
op = ssa.OpZeroExt16to64
case 48:
op = ssa.OpZeroExt32to64
default:
s.Fatalf("weird integer sign extension %v -> %v", ft, tt)
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
}
}
return s.newValue1(op, n.Type, x)
}
if ft.IsFloat() || tt.IsFloat() {
conv, ok := fpConvOpToSSA[twoTypes{s.concreteEtype(ft), s.concreteEtype(tt)}]
if s.config.IntSize == 4 && Thearch.LinkArch.Name != "amd64p32" && Thearch.LinkArch.Family != sys.MIPS {
if conv1, ok1 := fpConvOpToSSA32[twoTypes{s.concreteEtype(ft), s.concreteEtype(tt)}]; ok1 {
conv = conv1
}
}
if Thearch.LinkArch.Name == "arm64" {
if conv1, ok1 := uint64fpConvOpToSSA[twoTypes{s.concreteEtype(ft), s.concreteEtype(tt)}]; ok1 {
conv = conv1
}
}
if Thearch.LinkArch.Family == sys.MIPS {
if ft.Size() == 4 && ft.IsInteger() && !ft.IsSigned() {
// tt is float32 or float64, and ft is also unsigned
if tt.Size() == 4 {
return s.uint32Tofloat32(n, x, ft, tt)
}
if tt.Size() == 8 {
return s.uint32Tofloat64(n, x, ft, tt)
}
} else if tt.Size() == 4 && tt.IsInteger() && !tt.IsSigned() {
// ft is float32 or float64, and tt is unsigned integer
if ft.Size() == 4 {
return s.float32ToUint32(n, x, ft, tt)
}
if ft.Size() == 8 {
return s.float64ToUint32(n, x, ft, tt)
}
}
}
if !ok {
s.Fatalf("weird float conversion %v -> %v", ft, tt)
}
op1, op2, it := conv.op1, conv.op2, conv.intermediateType
if op1 != ssa.OpInvalid && op2 != ssa.OpInvalid {
// normal case, not tripping over unsigned 64
if op1 == ssa.OpCopy {
if op2 == ssa.OpCopy {
return x
}
return s.newValue1(op2, n.Type, x)
}
if op2 == ssa.OpCopy {
return s.newValue1(op1, n.Type, x)
}
return s.newValue1(op2, n.Type, s.newValue1(op1, Types[it], x))
}
// Tricky 64-bit unsigned cases.
if ft.IsInteger() {
// tt is float32 or float64, and ft is also unsigned
if tt.Size() == 4 {
return s.uint64Tofloat32(n, x, ft, tt)
}
if tt.Size() == 8 {
return s.uint64Tofloat64(n, x, ft, tt)
}
s.Fatalf("weird unsigned integer to float conversion %v -> %v", ft, tt)
}
// ft is float32 or float64, and tt is unsigned integer
if ft.Size() == 4 {
return s.float32ToUint64(n, x, ft, tt)
}
if ft.Size() == 8 {
return s.float64ToUint64(n, x, ft, tt)
}
s.Fatalf("weird float to unsigned integer conversion %v -> %v", ft, tt)
return nil
}
if ft.IsComplex() && tt.IsComplex() {
var op ssa.Op
if ft.Size() == tt.Size() {
switch ft.Size() {
case 8:
op = ssa.OpRound32F
case 16:
op = ssa.OpRound64F
default:
s.Fatalf("weird complex conversion %v -> %v", ft, tt)
}
} else if ft.Size() == 8 && tt.Size() == 16 {
op = ssa.OpCvt32Fto64F
} else if ft.Size() == 16 && tt.Size() == 8 {
op = ssa.OpCvt64Fto32F
} else {
s.Fatalf("weird complex conversion %v -> %v", ft, tt)
}
ftp := floatForComplex(ft)
ttp := floatForComplex(tt)
return s.newValue2(ssa.OpComplexMake, tt,
s.newValue1(op, ttp, s.newValue1(ssa.OpComplexReal, ftp, x)),
s.newValue1(op, ttp, s.newValue1(ssa.OpComplexImag, ftp, x)))
}
s.Fatalf("unhandled OCONV %s -> %s", n.Left.Type.Etype, n.Type.Etype)
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
return nil
case ODOTTYPE:
res, _ := s.dottype(n, false)
return res
// binary ops
case OLT, OEQ, ONE, OLE, OGE, OGT:
a := s.expr(n.Left)
b := s.expr(n.Right)
if n.Left.Type.IsComplex() {
pt := floatForComplex(n.Left.Type)
op := s.ssaOp(OEQ, pt)
r := s.newValue2(op, Types[TBOOL], s.newValue1(ssa.OpComplexReal, pt, a), s.newValue1(ssa.OpComplexReal, pt, b))
i := s.newValue2(op, Types[TBOOL], s.newValue1(ssa.OpComplexImag, pt, a), s.newValue1(ssa.OpComplexImag, pt, b))
c := s.newValue2(ssa.OpAndB, Types[TBOOL], r, i)
switch n.Op {
case OEQ:
return c
case ONE:
return s.newValue1(ssa.OpNot, Types[TBOOL], c)
default:
s.Fatalf("ordered complex compare %v", n.Op)
}
}
return s.newValue2(s.ssaOp(n.Op, n.Left.Type), Types[TBOOL], a, b)
case OMUL:
a := s.expr(n.Left)
b := s.expr(n.Right)
if n.Type.IsComplex() {
mulop := ssa.OpMul64F
addop := ssa.OpAdd64F
subop := ssa.OpSub64F
pt := floatForComplex(n.Type) // Could be Float32 or Float64
wt := Types[TFLOAT64] // Compute in Float64 to minimize cancelation error
areal := s.newValue1(ssa.OpComplexReal, pt, a)
breal := s.newValue1(ssa.OpComplexReal, pt, b)
aimag := s.newValue1(ssa.OpComplexImag, pt, a)
bimag := s.newValue1(ssa.OpComplexImag, pt, b)
if pt != wt { // Widen for calculation
areal = s.newValue1(ssa.OpCvt32Fto64F, wt, areal)
breal = s.newValue1(ssa.OpCvt32Fto64F, wt, breal)
aimag = s.newValue1(ssa.OpCvt32Fto64F, wt, aimag)
bimag = s.newValue1(ssa.OpCvt32Fto64F, wt, bimag)
}
xreal := s.newValue2(subop, wt, s.newValue2(mulop, wt, areal, breal), s.newValue2(mulop, wt, aimag, bimag))
ximag := s.newValue2(addop, wt, s.newValue2(mulop, wt, areal, bimag), s.newValue2(mulop, wt, aimag, breal))
if pt != wt { // Narrow to store back
xreal = s.newValue1(ssa.OpCvt64Fto32F, pt, xreal)
ximag = s.newValue1(ssa.OpCvt64Fto32F, pt, ximag)
}
return s.newValue2(ssa.OpComplexMake, n.Type, xreal, ximag)
}
return s.newValue2(s.ssaOp(n.Op, n.Type), a.Type, a, b)
case ODIV:
a := s.expr(n.Left)
b := s.expr(n.Right)
if n.Type.IsComplex() {
// TODO this is not executed because the front-end substitutes a runtime call.
// That probably ought to change; with modest optimization the widen/narrow
// conversions could all be elided in larger expression trees.
mulop := ssa.OpMul64F
addop := ssa.OpAdd64F
subop := ssa.OpSub64F
divop := ssa.OpDiv64F
pt := floatForComplex(n.Type) // Could be Float32 or Float64
wt := Types[TFLOAT64] // Compute in Float64 to minimize cancelation error
areal := s.newValue1(ssa.OpComplexReal, pt, a)
breal := s.newValue1(ssa.OpComplexReal, pt, b)
aimag := s.newValue1(ssa.OpComplexImag, pt, a)
bimag := s.newValue1(ssa.OpComplexImag, pt, b)
if pt != wt { // Widen for calculation
areal = s.newValue1(ssa.OpCvt32Fto64F, wt, areal)
breal = s.newValue1(ssa.OpCvt32Fto64F, wt, breal)
aimag = s.newValue1(ssa.OpCvt32Fto64F, wt, aimag)
bimag = s.newValue1(ssa.OpCvt32Fto64F, wt, bimag)
}
denom := s.newValue2(addop, wt, s.newValue2(mulop, wt, breal, breal), s.newValue2(mulop, wt, bimag, bimag))
xreal := s.newValue2(addop, wt, s.newValue2(mulop, wt, areal, breal), s.newValue2(mulop, wt, aimag, bimag))
ximag := s.newValue2(subop, wt, s.newValue2(mulop, wt, aimag, breal), s.newValue2(mulop, wt, areal, bimag))
// TODO not sure if this is best done in wide precision or narrow
// Double-rounding might be an issue.
// Note that the pre-SSA implementation does the entire calculation
// in wide format, so wide is compatible.
xreal = s.newValue2(divop, wt, xreal, denom)
ximag = s.newValue2(divop, wt, ximag, denom)
if pt != wt { // Narrow to store back
xreal = s.newValue1(ssa.OpCvt64Fto32F, pt, xreal)
ximag = s.newValue1(ssa.OpCvt64Fto32F, pt, ximag)
}
return s.newValue2(ssa.OpComplexMake, n.Type, xreal, ximag)
}
if n.Type.IsFloat() {
return s.newValue2(s.ssaOp(n.Op, n.Type), a.Type, a, b)
}
return s.intDivide(n, a, b)
case OMOD:
a := s.expr(n.Left)
b := s.expr(n.Right)
return s.intDivide(n, a, b)
case OADD, OSUB:
a := s.expr(n.Left)
b := s.expr(n.Right)
if n.Type.IsComplex() {
pt := floatForComplex(n.Type)
op := s.ssaOp(n.Op, pt)
return s.newValue2(ssa.OpComplexMake, n.Type,
s.newValue2(op, pt, s.newValue1(ssa.OpComplexReal, pt, a), s.newValue1(ssa.OpComplexReal, pt, b)),
s.newValue2(op, pt, s.newValue1(ssa.OpComplexImag, pt, a), s.newValue1(ssa.OpComplexImag, pt, b)))
}
return s.newValue2(s.ssaOp(n.Op, n.Type), a.Type, a, b)
case OAND, OOR, OXOR:
a := s.expr(n.Left)
b := s.expr(n.Right)
return s.newValue2(s.ssaOp(n.Op, n.Type), a.Type, a, b)
case OLSH, ORSH:
a := s.expr(n.Left)
b := s.expr(n.Right)
return s.newValue2(s.ssaShiftOp(n.Op, n.Type, n.Right.Type), a.Type, a, b)
case OANDAND, OOROR:
// To implement OANDAND (and OOROR), we introduce a
// new temporary variable to hold the result. The
// variable is associated with the OANDAND node in the
// s.vars table (normally variables are only
// associated with ONAME nodes). We convert
// A && B
// to
// var = A
// if var {
// var = B
// }
// Using var in the subsequent block introduces the
// necessary phi variable.
el := s.expr(n.Left)
s.vars[n] = el
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(el)
// In theory, we should set b.Likely here based on context.
// However, gc only gives us likeliness hints
// in a single place, for plain OIF statements,
// and passing around context is finnicky, so don't bother for now.
bRight := s.f.NewBlock(ssa.BlockPlain)
bResult := s.f.NewBlock(ssa.BlockPlain)
if n.Op == OANDAND {
b.AddEdgeTo(bRight)
b.AddEdgeTo(bResult)
} else if n.Op == OOROR {
b.AddEdgeTo(bResult)
b.AddEdgeTo(bRight)
}
s.startBlock(bRight)
er := s.expr(n.Right)
s.vars[n] = er
b = s.endBlock()
b.AddEdgeTo(bResult)
s.startBlock(bResult)
return s.variable(n, Types[TBOOL])
case OCOMPLEX:
r := s.expr(n.Left)
i := s.expr(n.Right)
return s.newValue2(ssa.OpComplexMake, n.Type, r, i)
// unary ops
case OMINUS:
a := s.expr(n.Left)
if n.Type.IsComplex() {
tp := floatForComplex(n.Type)
negop := s.ssaOp(n.Op, tp)
return s.newValue2(ssa.OpComplexMake, n.Type,
s.newValue1(negop, tp, s.newValue1(ssa.OpComplexReal, tp, a)),
s.newValue1(negop, tp, s.newValue1(ssa.OpComplexImag, tp, a)))
}
return s.newValue1(s.ssaOp(n.Op, n.Type), a.Type, a)
case ONOT, OCOM:
a := s.expr(n.Left)
return s.newValue1(s.ssaOp(n.Op, n.Type), a.Type, a)
case OIMAG, OREAL:
a := s.expr(n.Left)
return s.newValue1(s.ssaOp(n.Op, n.Left.Type), n.Type, a)
case OPLUS:
return s.expr(n.Left)
case OADDR:
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
return s.addr(n.Left, n.Bounded())
case OINDREGSP:
cmd/compile: add OpOffPtr [c] SP to constant cache They accounted for almost 30% of all CSE'd values. By never creating the duplicates in the first place, we reduce the high water mark of Value IDs, which in turn makes all SSA phases cheaper, particularly regalloc. name old time/op new time/op delta Template 200ms ± 3% 198ms ± 4% -0.87% (p=0.016 n=50+49) Unicode 86.9ms ± 2% 85.5ms ± 3% -1.56% (p=0.000 n=49+50) GoTypes 553ms ± 4% 551ms ± 4% ~ (p=0.183 n=50+49) SSA 3.97s ± 3% 3.93s ± 2% -1.06% (p=0.000 n=48+48) Flate 124ms ± 4% 124ms ± 3% ~ (p=0.545 n=48+50) GoParser 146ms ± 4% 146ms ± 4% ~ (p=0.810 n=49+49) Reflect 357ms ± 3% 355ms ± 3% -0.59% (p=0.049 n=50+48) Tar 106ms ± 4% 107ms ± 5% ~ (p=0.454 n=49+50) XML 203ms ± 4% 203ms ± 4% ~ (p=0.726 n=48+50) name old user-ns/op new user-ns/op delta Template 237M ± 3% 235M ± 4% ~ (p=0.208 n=47+48) Unicode 111M ± 4% 108M ± 9% -2.50% (p=0.000 n=47+50) GoTypes 736M ± 5% 729M ± 4% -0.95% (p=0.017 n=50+46) SSA 5.73G ± 4% 5.74G ± 4% ~ (p=0.765 n=50+50) Flate 150M ± 5% 148M ± 6% -0.89% (p=0.045 n=48+47) GoParser 180M ± 5% 178M ± 7% -1.34% (p=0.012 n=50+50) Reflect 450M ± 4% 444M ± 4% -1.40% (p=0.000 n=50+49) Tar 124M ± 7% 123M ± 7% ~ (p=0.092 n=50+50) XML 248M ± 6% 245M ± 5% ~ (p=0.057 n=50+50) name old alloc/op new alloc/op delta Template 39.4MB ± 0% 39.3MB ± 0% -0.37% (p=0.000 n=50+50) Unicode 30.9MB ± 0% 30.9MB ± 0% -0.27% (p=0.000 n=48+50) GoTypes 114MB ± 0% 113MB ± 0% -1.03% (p=0.000 n=50+49) SSA 882MB ± 0% 865MB ± 0% -1.95% (p=0.000 n=49+49) Flate 25.8MB ± 0% 25.7MB ± 0% -0.21% (p=0.000 n=50+50) GoParser 31.7MB ± 0% 31.6MB ± 0% -0.33% (p=0.000 n=50+50) Reflect 79.7MB ± 0% 79.3MB ± 0% -0.49% (p=0.000 n=44+49) Tar 27.2MB ± 0% 27.1MB ± 0% -0.31% (p=0.000 n=50+50) XML 42.7MB ± 0% 42.3MB ± 0% -1.05% (p=0.000 n=48+49) name old allocs/op new allocs/op delta Template 379k ± 1% 380k ± 1% +0.26% (p=0.000 n=50+50) Unicode 324k ± 1% 324k ± 1% ~ (p=0.964 n=49+50) GoTypes 1.14M ± 0% 1.15M ± 0% +0.14% (p=0.000 n=50+49) SSA 7.89M ± 0% 7.89M ± 0% -0.05% (p=0.000 n=49+49) Flate 240k ± 1% 241k ± 1% +0.27% (p=0.001 n=50+50) GoParser 310k ± 1% 311k ± 1% +0.48% (p=0.000 n=50+49) Reflect 1.00M ± 0% 1.00M ± 0% +0.17% (p=0.000 n=48+50) Tar 254k ± 1% 255k ± 1% +0.23% (p=0.005 n=50+50) XML 395k ± 1% 395k ± 1% +0.19% (p=0.002 n=49+47) Change-Id: Iaa8f5f37e23bd81983409f7359f9dcd4dfe2961f Reviewed-on: https://go-review.googlesource.com/38003 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-08 12:50:00 -08:00
addr := s.constOffPtrSP(ptrto(n.Type), n.Xoffset)
return s.newValue2(ssa.OpLoad, n.Type, addr, s.mem())
case OIND:
p := s.exprPtr(n.Left, false, n.Pos)
return s.newValue2(ssa.OpLoad, n.Type, p, s.mem())
case ODOT:
t := n.Left.Type
if canSSAType(t) {
v := s.expr(n.Left)
return s.newValue1I(ssa.OpStructSelect, n.Type, int64(fieldIdx(n)), v)
}
if n.Left.Op == OSTRUCTLIT {
// All literals with nonzero fields have already been
// rewritten during walk. Any that remain are just T{}
// or equivalents. Use the zero value.
if !iszero(n.Left) {
Fatalf("literal with nonzero value in SSA: %v", n.Left)
}
return s.zeroVal(n.Type)
}
p := s.addr(n, false)
return s.newValue2(ssa.OpLoad, n.Type, p, s.mem())
case ODOTPTR:
p := s.exprPtr(n.Left, false, n.Pos)
p = s.newValue1I(ssa.OpOffPtr, ptrto(n.Type), n.Xoffset, p)
return s.newValue2(ssa.OpLoad, n.Type, p, s.mem())
case OINDEX:
switch {
case n.Left.Type.IsString():
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
if n.Bounded() && Isconst(n.Left, CTSTR) && Isconst(n.Right, CTINT) {
// Replace "abc"[1] with 'b'.
// Delayed until now because "abc"[1] is not an ideal constant.
// See test/fixedbugs/issue11370.go.
return s.newValue0I(ssa.OpConst8, Types[TUINT8], int64(int8(n.Left.Val().U.(string)[n.Right.Int64()])))
}
a := s.expr(n.Left)
i := s.expr(n.Right)
i = s.extendIndex(i, panicindex)
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
if !n.Bounded() {
len := s.newValue1(ssa.OpStringLen, Types[TINT], a)
s.boundsCheck(i, len)
}
ptrtyp := ptrto(Types[TUINT8])
ptr := s.newValue1(ssa.OpStringPtr, ptrtyp, a)
if Isconst(n.Right, CTINT) {
ptr = s.newValue1I(ssa.OpOffPtr, ptrtyp, n.Right.Int64(), ptr)
} else {
ptr = s.newValue2(ssa.OpAddPtr, ptrtyp, ptr, i)
}
return s.newValue2(ssa.OpLoad, Types[TUINT8], ptr, s.mem())
case n.Left.Type.IsSlice():
p := s.addr(n, false)
return s.newValue2(ssa.OpLoad, n.Left.Type.Elem(), p, s.mem())
case n.Left.Type.IsArray():
if bound := n.Left.Type.NumElem(); bound <= 1 {
// SSA can handle arrays of length at most 1.
a := s.expr(n.Left)
i := s.expr(n.Right)
if bound == 0 {
// Bounds check will never succeed. Might as well
// use constants for the bounds check.
z := s.constInt(Types[TINT], 0)
s.boundsCheck(z, z)
// The return value won't be live, return junk.
return s.newValue0(ssa.OpUnknown, n.Type)
}
i = s.extendIndex(i, panicindex)
s.boundsCheck(i, s.constInt(Types[TINT], bound))
return s.newValue1I(ssa.OpArraySelect, n.Type, 0, a)
}
p := s.addr(n, false)
return s.newValue2(ssa.OpLoad, n.Left.Type.Elem(), p, s.mem())
default:
s.Fatalf("bad type for index %v", n.Left.Type)
return nil
}
case OLEN, OCAP:
switch {
case n.Left.Type.IsSlice():
op := ssa.OpSliceLen
if n.Op == OCAP {
op = ssa.OpSliceCap
}
return s.newValue1(op, Types[TINT], s.expr(n.Left))
case n.Left.Type.IsString(): // string; not reachable for OCAP
return s.newValue1(ssa.OpStringLen, Types[TINT], s.expr(n.Left))
case n.Left.Type.IsMap(), n.Left.Type.IsChan():
return s.referenceTypeBuiltin(n, s.expr(n.Left))
default: // array
return s.constInt(Types[TINT], n.Left.Type.NumElem())
}
case OSPTR:
a := s.expr(n.Left)
if n.Left.Type.IsSlice() {
return s.newValue1(ssa.OpSlicePtr, n.Type, a)
} else {
return s.newValue1(ssa.OpStringPtr, n.Type, a)
}
case OITAB:
a := s.expr(n.Left)
return s.newValue1(ssa.OpITab, n.Type, a)
case OIDATA:
a := s.expr(n.Left)
return s.newValue1(ssa.OpIData, n.Type, a)
case OEFACE:
tab := s.expr(n.Left)
data := s.expr(n.Right)
return s.newValue2(ssa.OpIMake, n.Type, tab, data)
case OSLICE, OSLICEARR, OSLICE3, OSLICE3ARR:
v := s.expr(n.Left)
var i, j, k *ssa.Value
low, high, max := n.SliceBounds()
if low != nil {
i = s.extendIndex(s.expr(low), panicslice)
}
if high != nil {
j = s.extendIndex(s.expr(high), panicslice)
}
if max != nil {
k = s.extendIndex(s.expr(max), panicslice)
}
p, l, c := s.slice(n.Left.Type, v, i, j, k)
return s.newValue3(ssa.OpSliceMake, n.Type, p, l, c)
case OSLICESTR:
v := s.expr(n.Left)
var i, j *ssa.Value
low, high, _ := n.SliceBounds()
if low != nil {
i = s.extendIndex(s.expr(low), panicslice)
}
if high != nil {
j = s.extendIndex(s.expr(high), panicslice)
}
p, l, _ := s.slice(n.Left.Type, v, i, j, nil)
return s.newValue2(ssa.OpStringMake, n.Type, p, l)
case OCALLFUNC:
if isIntrinsicCall(n) {
return s.intrinsicCall(n)
}
fallthrough
case OCALLINTER, OCALLMETH:
a := s.call(n, callNormal)
return s.newValue2(ssa.OpLoad, n.Type, a, s.mem())
case OGETG:
return s.newValue1(ssa.OpGetG, n.Type, s.mem())
case OAPPEND:
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
return s.append(n, false)
case OSTRUCTLIT, OARRAYLIT:
// All literals with nonzero fields have already been
// rewritten during walk. Any that remain are just T{}
// or equivalents. Use the zero value.
if !iszero(n) {
Fatalf("literal with nonzero value in SSA: %v", n)
}
return s.zeroVal(n.Type)
default:
s.Fatalf("unhandled expr %v", n.Op)
return nil
}
}
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
// append converts an OAPPEND node to SSA.
// If inplace is false, it converts the OAPPEND expression n to an ssa.Value,
// adds it to s, and returns the Value.
// If inplace is true, it writes the result of the OAPPEND expression n
// back to the slice being appended to, and returns nil.
// inplace MUST be set to false if the slice can be SSA'd.
func (s *state) append(n *Node, inplace bool) *ssa.Value {
// If inplace is false, process as expression "append(s, e1, e2, e3)":
//
cmd/compile: avoid a spill in append fast path Instead of spilling newlen, recalculate it. This removes a spill from the fast path, at the cost of a cheap recalculation on the (rare) growth path. This uses 8 bytes less of stack space. It generates two more bytes of code, but that is due to suboptimal register allocation; see far below. Runtime append microbenchmarks are all over the map, presumably due to incidental code movement. Sample code: func s(b []byte) []byte { b = append(b, 1, 2, 3) return b } Before: "".s t=1 size=160 args=0x30 locals=0x48 0x0000 00000 (append.go:8) TEXT "".s(SB), $72-48 0x0000 00000 (append.go:8) MOVQ (TLS), CX 0x0009 00009 (append.go:8) CMPQ SP, 16(CX) 0x000d 00013 (append.go:8) JLS 149 0x0013 00019 (append.go:8) SUBQ $72, SP 0x0017 00023 (append.go:8) FUNCDATA $0, gclocals·6432f8c6a0d23fa7bee6c5d96f21a92a(SB) 0x0017 00023 (append.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:9) MOVQ "".b+88(FP), CX 0x001c 00028 (append.go:9) LEAQ 3(CX), DX 0x0020 00032 (append.go:9) MOVQ DX, "".autotmp_0+64(SP) 0x0025 00037 (append.go:9) MOVQ "".b+96(FP), BX 0x002a 00042 (append.go:9) CMPQ DX, BX 0x002d 00045 (append.go:9) JGT $0, 86 0x002f 00047 (append.go:8) MOVQ "".b+80(FP), AX 0x0034 00052 (append.go:9) MOVB $1, (AX)(CX*1) 0x0038 00056 (append.go:9) MOVB $2, 1(AX)(CX*1) 0x003d 00061 (append.go:9) MOVB $3, 2(AX)(CX*1) 0x0042 00066 (append.go:10) MOVQ AX, "".~r1+104(FP) 0x0047 00071 (append.go:10) MOVQ DX, "".~r1+112(FP) 0x004c 00076 (append.go:10) MOVQ BX, "".~r1+120(FP) 0x0051 00081 (append.go:10) ADDQ $72, SP 0x0055 00085 (append.go:10) RET 0x0056 00086 (append.go:9) LEAQ type.[]uint8(SB), AX 0x005d 00093 (append.go:9) MOVQ AX, (SP) 0x0061 00097 (append.go:9) MOVQ "".b+80(FP), BP 0x0066 00102 (append.go:9) MOVQ BP, 8(SP) 0x006b 00107 (append.go:9) MOVQ CX, 16(SP) 0x0070 00112 (append.go:9) MOVQ BX, 24(SP) 0x0075 00117 (append.go:9) MOVQ DX, 32(SP) 0x007a 00122 (append.go:9) PCDATA $0, $0 0x007a 00122 (append.go:9) CALL runtime.growslice(SB) 0x007f 00127 (append.go:9) MOVQ 40(SP), AX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:8) MOVQ "".b+88(FP), CX 0x008e 00142 (append.go:9) MOVQ "".autotmp_0+64(SP), DX 0x0093 00147 (append.go:9) JMP 52 0x0095 00149 (append.go:9) NOP 0x0095 00149 (append.go:8) CALL runtime.morestack_noctxt(SB) 0x009a 00154 (append.go:8) JMP 0 After: "".s t=1 size=176 args=0x30 locals=0x40 0x0000 00000 (append.go:8) TEXT "".s(SB), $64-48 0x0000 00000 (append.go:8) MOVQ (TLS), CX 0x0009 00009 (append.go:8) CMPQ SP, 16(CX) 0x000d 00013 (append.go:8) JLS 151 0x0013 00019 (append.go:8) SUBQ $64, SP 0x0017 00023 (append.go:8) FUNCDATA $0, gclocals·6432f8c6a0d23fa7bee6c5d96f21a92a(SB) 0x0017 00023 (append.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:9) MOVQ "".b+80(FP), CX 0x001c 00028 (append.go:9) LEAQ 3(CX), DX 0x0020 00032 (append.go:9) MOVQ "".b+88(FP), BX 0x0025 00037 (append.go:9) CMPQ DX, BX 0x0028 00040 (append.go:9) JGT $0, 81 0x002a 00042 (append.go:8) MOVQ "".b+72(FP), AX 0x002f 00047 (append.go:9) MOVB $1, (AX)(CX*1) 0x0033 00051 (append.go:9) MOVB $2, 1(AX)(CX*1) 0x0038 00056 (append.go:9) MOVB $3, 2(AX)(CX*1) 0x003d 00061 (append.go:10) MOVQ AX, "".~r1+96(FP) 0x0042 00066 (append.go:10) MOVQ DX, "".~r1+104(FP) 0x0047 00071 (append.go:10) MOVQ BX, "".~r1+112(FP) 0x004c 00076 (append.go:10) ADDQ $64, SP 0x0050 00080 (append.go:10) RET 0x0051 00081 (append.go:9) LEAQ type.[]uint8(SB), AX 0x0058 00088 (append.go:9) MOVQ AX, (SP) 0x005c 00092 (append.go:9) MOVQ "".b+72(FP), BP 0x0061 00097 (append.go:9) MOVQ BP, 8(SP) 0x0066 00102 (append.go:9) MOVQ CX, 16(SP) 0x006b 00107 (append.go:9) MOVQ BX, 24(SP) 0x0070 00112 (append.go:9) MOVQ DX, 32(SP) 0x0075 00117 (append.go:9) PCDATA $0, $0 0x0075 00117 (append.go:9) CALL runtime.growslice(SB) 0x007a 00122 (append.go:9) MOVQ 40(SP), AX 0x007f 00127 (append.go:9) MOVQ 48(SP), CX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:9) ADDQ $3, CX 0x008d 00141 (append.go:9) MOVQ CX, DX 0x0090 00144 (append.go:8) MOVQ "".b+80(FP), CX 0x0095 00149 (append.go:9) JMP 47 0x0097 00151 (append.go:9) NOP 0x0097 00151 (append.go:8) CALL runtime.morestack_noctxt(SB) 0x009c 00156 (append.go:8) JMP 0 Observe that in the following sequence, we should use DX directly instead of using CX as a temporary register, which would make the new code a strict improvement on the old: 0x007f 00127 (append.go:9) MOVQ 48(SP), CX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:9) ADDQ $3, CX 0x008d 00141 (append.go:9) MOVQ CX, DX 0x0090 00144 (append.go:8) MOVQ "".b+80(FP), CX Change-Id: I4ee50b18fa53865901d2d7f86c2cbb54c6fa6924 Reviewed-on: https://go-review.googlesource.com/21812 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:08:00 -07:00
// ptr, len, cap := s
// newlen := len + 3
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
// if newlen > cap {
cmd/compile: avoid a spill in append fast path Instead of spilling newlen, recalculate it. This removes a spill from the fast path, at the cost of a cheap recalculation on the (rare) growth path. This uses 8 bytes less of stack space. It generates two more bytes of code, but that is due to suboptimal register allocation; see far below. Runtime append microbenchmarks are all over the map, presumably due to incidental code movement. Sample code: func s(b []byte) []byte { b = append(b, 1, 2, 3) return b } Before: "".s t=1 size=160 args=0x30 locals=0x48 0x0000 00000 (append.go:8) TEXT "".s(SB), $72-48 0x0000 00000 (append.go:8) MOVQ (TLS), CX 0x0009 00009 (append.go:8) CMPQ SP, 16(CX) 0x000d 00013 (append.go:8) JLS 149 0x0013 00019 (append.go:8) SUBQ $72, SP 0x0017 00023 (append.go:8) FUNCDATA $0, gclocals·6432f8c6a0d23fa7bee6c5d96f21a92a(SB) 0x0017 00023 (append.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:9) MOVQ "".b+88(FP), CX 0x001c 00028 (append.go:9) LEAQ 3(CX), DX 0x0020 00032 (append.go:9) MOVQ DX, "".autotmp_0+64(SP) 0x0025 00037 (append.go:9) MOVQ "".b+96(FP), BX 0x002a 00042 (append.go:9) CMPQ DX, BX 0x002d 00045 (append.go:9) JGT $0, 86 0x002f 00047 (append.go:8) MOVQ "".b+80(FP), AX 0x0034 00052 (append.go:9) MOVB $1, (AX)(CX*1) 0x0038 00056 (append.go:9) MOVB $2, 1(AX)(CX*1) 0x003d 00061 (append.go:9) MOVB $3, 2(AX)(CX*1) 0x0042 00066 (append.go:10) MOVQ AX, "".~r1+104(FP) 0x0047 00071 (append.go:10) MOVQ DX, "".~r1+112(FP) 0x004c 00076 (append.go:10) MOVQ BX, "".~r1+120(FP) 0x0051 00081 (append.go:10) ADDQ $72, SP 0x0055 00085 (append.go:10) RET 0x0056 00086 (append.go:9) LEAQ type.[]uint8(SB), AX 0x005d 00093 (append.go:9) MOVQ AX, (SP) 0x0061 00097 (append.go:9) MOVQ "".b+80(FP), BP 0x0066 00102 (append.go:9) MOVQ BP, 8(SP) 0x006b 00107 (append.go:9) MOVQ CX, 16(SP) 0x0070 00112 (append.go:9) MOVQ BX, 24(SP) 0x0075 00117 (append.go:9) MOVQ DX, 32(SP) 0x007a 00122 (append.go:9) PCDATA $0, $0 0x007a 00122 (append.go:9) CALL runtime.growslice(SB) 0x007f 00127 (append.go:9) MOVQ 40(SP), AX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:8) MOVQ "".b+88(FP), CX 0x008e 00142 (append.go:9) MOVQ "".autotmp_0+64(SP), DX 0x0093 00147 (append.go:9) JMP 52 0x0095 00149 (append.go:9) NOP 0x0095 00149 (append.go:8) CALL runtime.morestack_noctxt(SB) 0x009a 00154 (append.go:8) JMP 0 After: "".s t=1 size=176 args=0x30 locals=0x40 0x0000 00000 (append.go:8) TEXT "".s(SB), $64-48 0x0000 00000 (append.go:8) MOVQ (TLS), CX 0x0009 00009 (append.go:8) CMPQ SP, 16(CX) 0x000d 00013 (append.go:8) JLS 151 0x0013 00019 (append.go:8) SUBQ $64, SP 0x0017 00023 (append.go:8) FUNCDATA $0, gclocals·6432f8c6a0d23fa7bee6c5d96f21a92a(SB) 0x0017 00023 (append.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:9) MOVQ "".b+80(FP), CX 0x001c 00028 (append.go:9) LEAQ 3(CX), DX 0x0020 00032 (append.go:9) MOVQ "".b+88(FP), BX 0x0025 00037 (append.go:9) CMPQ DX, BX 0x0028 00040 (append.go:9) JGT $0, 81 0x002a 00042 (append.go:8) MOVQ "".b+72(FP), AX 0x002f 00047 (append.go:9) MOVB $1, (AX)(CX*1) 0x0033 00051 (append.go:9) MOVB $2, 1(AX)(CX*1) 0x0038 00056 (append.go:9) MOVB $3, 2(AX)(CX*1) 0x003d 00061 (append.go:10) MOVQ AX, "".~r1+96(FP) 0x0042 00066 (append.go:10) MOVQ DX, "".~r1+104(FP) 0x0047 00071 (append.go:10) MOVQ BX, "".~r1+112(FP) 0x004c 00076 (append.go:10) ADDQ $64, SP 0x0050 00080 (append.go:10) RET 0x0051 00081 (append.go:9) LEAQ type.[]uint8(SB), AX 0x0058 00088 (append.go:9) MOVQ AX, (SP) 0x005c 00092 (append.go:9) MOVQ "".b+72(FP), BP 0x0061 00097 (append.go:9) MOVQ BP, 8(SP) 0x0066 00102 (append.go:9) MOVQ CX, 16(SP) 0x006b 00107 (append.go:9) MOVQ BX, 24(SP) 0x0070 00112 (append.go:9) MOVQ DX, 32(SP) 0x0075 00117 (append.go:9) PCDATA $0, $0 0x0075 00117 (append.go:9) CALL runtime.growslice(SB) 0x007a 00122 (append.go:9) MOVQ 40(SP), AX 0x007f 00127 (append.go:9) MOVQ 48(SP), CX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:9) ADDQ $3, CX 0x008d 00141 (append.go:9) MOVQ CX, DX 0x0090 00144 (append.go:8) MOVQ "".b+80(FP), CX 0x0095 00149 (append.go:9) JMP 47 0x0097 00151 (append.go:9) NOP 0x0097 00151 (append.go:8) CALL runtime.morestack_noctxt(SB) 0x009c 00156 (append.go:8) JMP 0 Observe that in the following sequence, we should use DX directly instead of using CX as a temporary register, which would make the new code a strict improvement on the old: 0x007f 00127 (append.go:9) MOVQ 48(SP), CX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:9) ADDQ $3, CX 0x008d 00141 (append.go:9) MOVQ CX, DX 0x0090 00144 (append.go:8) MOVQ "".b+80(FP), CX Change-Id: I4ee50b18fa53865901d2d7f86c2cbb54c6fa6924 Reviewed-on: https://go-review.googlesource.com/21812 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:08:00 -07:00
// ptr, len, cap = growslice(s, newlen)
// newlen = len + 3 // recalculate to avoid a spill
// }
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
// // with write barriers, if needed:
// *(ptr+len) = e1
// *(ptr+len+1) = e2
// *(ptr+len+2) = e3
// return makeslice(ptr, newlen, cap)
//
//
// If inplace is true, process as statement "s = append(s, e1, e2, e3)":
//
// a := &s
// ptr, len, cap := s
// newlen := len + 3
// if newlen > cap {
cmd/compile: re-enable in-place append optimization CL 21891 was too clever in its attempts to avoid spills. Storing newlen too early caused uses of append in the runtime itself to receive an inconsistent view of a slice, leading to corruption. This CL makes the generate code much more similar to the old backend. It spills more than before, but those spills have been contained to the grow path. It recalculates newlen unnecessarily on the fast path, but that's measurably cheaper than spilling it. CL 21891 caused runtime failures in 6 of 2000 runs of net/http and crypto/x509 in my test setup. This CL has gone 6000 runs without a failure. Benchmarks going from master to this CL: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 439ns ± 2% 436ns ± 2% -0.72% (p=0.001 n=28+27) AppendInPlace/NoGrow/1Ptr-8 901ns ± 0% 856ns ± 0% -4.95% (p=0.000 n=26+29) AppendInPlace/NoGrow/2Ptr-8 2.15µs ± 1% 1.95µs ± 0% -9.07% (p=0.000 n=28+30) AppendInPlace/NoGrow/3Ptr-8 2.66µs ± 0% 2.45µs ± 0% -7.93% (p=0.000 n=29+26) AppendInPlace/NoGrow/4Ptr-8 3.24µs ± 1% 3.02µs ± 1% -6.75% (p=0.000 n=28+30) AppendInPlace/Grow/Byte-8 269ns ± 1% 271ns ± 1% +0.84% (p=0.000 n=30+29) AppendInPlace/Grow/1Ptr-8 275ns ± 1% 280ns ± 1% +1.75% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 384ns ± 0% 391ns ± 0% +1.94% (p=0.000 n=27+30) AppendInPlace/Grow/3Ptr-8 455ns ± 0% 462ns ± 0% +1.43% (p=0.000 n=29+29) AppendInPlace/Grow/4Ptr-8 478ns ± 0% 479ns ± 0% +0.23% (p=0.000 n=30+27) However, for the large no-grow cases, there is still more work to be done. Going from this CL to the non-SSA backend: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 436ns ± 2% 436ns ± 2% ~ (p=0.967 n=27+29) AppendInPlace/NoGrow/1Ptr-8 856ns ± 0% 884ns ± 0% +3.28% (p=0.000 n=29+26) AppendInPlace/NoGrow/2Ptr-8 1.95µs ± 0% 1.56µs ± 0% -20.28% (p=0.000 n=30+29) AppendInPlace/NoGrow/3Ptr-8 2.45µs ± 0% 1.89µs ± 0% -22.88% (p=0.000 n=26+28) AppendInPlace/NoGrow/4Ptr-8 3.02µs ± 1% 2.56µs ± 1% -15.35% (p=0.000 n=30+28) AppendInPlace/Grow/Byte-8 271ns ± 1% 283ns ± 1% +4.56% (p=0.000 n=29+29) AppendInPlace/Grow/1Ptr-8 280ns ± 1% 288ns ± 1% +2.99% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 391ns ± 0% 409ns ± 0% +4.66% (p=0.000 n=30+29) AppendInPlace/Grow/3Ptr-8 462ns ± 0% 481ns ± 0% +4.13% (p=0.000 n=29+30) AppendInPlace/Grow/4Ptr-8 479ns ± 0% 502ns ± 0% +4.81% (p=0.000 n=27+26) New generated code: var x []byte func a() { x = append(x, 1) } "".a t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (a.go:5) TEXT "".a(SB), $72-0 0x0000 00000 (a.go:5) MOVQ (TLS), CX 0x0009 00009 (a.go:5) CMPQ SP, 16(CX) 0x000d 00013 (a.go:5) JLS 190 0x0013 00019 (a.go:5) SUBQ $72, SP 0x0017 00023 (a.go:5) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:5) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:6) MOVQ "".x+16(SB), CX 0x001e 00030 (a.go:6) MOVQ "".x+8(SB), DX 0x0025 00037 (a.go:6) MOVQ "".x(SB), BX 0x002c 00044 (a.go:6) LEAQ 1(DX), BP 0x0030 00048 (a.go:6) CMPQ BP, CX 0x0033 00051 (a.go:6) JGT $0, 73 0x0035 00053 (a.go:6) LEAQ 1(DX), AX 0x0039 00057 (a.go:6) MOVQ AX, "".x+8(SB) 0x0040 00064 (a.go:6) MOVB $1, (BX)(DX*1) 0x0044 00068 (a.go:7) ADDQ $72, SP 0x0048 00072 (a.go:7) RET 0x0049 00073 (a.go:6) LEAQ type.[]uint8(SB), AX 0x0050 00080 (a.go:6) MOVQ AX, (SP) 0x0054 00084 (a.go:6) MOVQ BX, 8(SP) 0x0059 00089 (a.go:6) MOVQ DX, 16(SP) 0x005e 00094 (a.go:6) MOVQ CX, 24(SP) 0x0063 00099 (a.go:6) MOVQ BP, 32(SP) 0x0068 00104 (a.go:6) PCDATA $0, $0 0x0068 00104 (a.go:6) CALL runtime.growslice(SB) 0x006d 00109 (a.go:6) MOVQ 40(SP), CX 0x0072 00114 (a.go:6) MOVQ 48(SP), DX 0x0077 00119 (a.go:6) MOVQ DX, "".autotmp_0+64(SP) 0x007c 00124 (a.go:6) MOVQ 56(SP), BX 0x0081 00129 (a.go:6) MOVQ BX, "".x+16(SB) 0x0088 00136 (a.go:6) MOVL runtime.writeBarrier(SB), AX 0x008e 00142 (a.go:6) TESTB AL, AL 0x0090 00144 (a.go:6) JNE $0, 162 0x0092 00146 (a.go:6) MOVQ CX, "".x(SB) 0x0099 00153 (a.go:6) MOVQ "".x(SB), BX 0x00a0 00160 (a.go:6) JMP 53 0x00a2 00162 (a.go:6) LEAQ "".x(SB), BX 0x00a9 00169 (a.go:6) MOVQ BX, (SP) 0x00ad 00173 (a.go:6) MOVQ CX, 8(SP) 0x00b2 00178 (a.go:6) PCDATA $0, $0 0x00b2 00178 (a.go:6) CALL runtime.writebarrierptr(SB) 0x00b7 00183 (a.go:6) MOVQ "".autotmp_0+64(SP), DX 0x00bc 00188 (a.go:6) JMP 153 0x00be 00190 (a.go:6) NOP 0x00be 00190 (a.go:5) CALL runtime.morestack_noctxt(SB) 0x00c3 00195 (a.go:5) JMP 0 Fixes #14969 again Change-Id: Ia50463b1f506011aad0718a4fef1d4738e43c32d Reviewed-on: https://go-review.googlesource.com/22197 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-18 09:40:30 -07:00
// newptr, len, newcap = growslice(ptr, len, cap, newlen)
// vardef(a) // if necessary, advise liveness we are writing a new a
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
// *a.cap = newcap // write before ptr to avoid a spill
// *a.ptr = newptr // with write barrier
// }
cmd/compile: re-enable in-place append optimization CL 21891 was too clever in its attempts to avoid spills. Storing newlen too early caused uses of append in the runtime itself to receive an inconsistent view of a slice, leading to corruption. This CL makes the generate code much more similar to the old backend. It spills more than before, but those spills have been contained to the grow path. It recalculates newlen unnecessarily on the fast path, but that's measurably cheaper than spilling it. CL 21891 caused runtime failures in 6 of 2000 runs of net/http and crypto/x509 in my test setup. This CL has gone 6000 runs without a failure. Benchmarks going from master to this CL: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 439ns ± 2% 436ns ± 2% -0.72% (p=0.001 n=28+27) AppendInPlace/NoGrow/1Ptr-8 901ns ± 0% 856ns ± 0% -4.95% (p=0.000 n=26+29) AppendInPlace/NoGrow/2Ptr-8 2.15µs ± 1% 1.95µs ± 0% -9.07% (p=0.000 n=28+30) AppendInPlace/NoGrow/3Ptr-8 2.66µs ± 0% 2.45µs ± 0% -7.93% (p=0.000 n=29+26) AppendInPlace/NoGrow/4Ptr-8 3.24µs ± 1% 3.02µs ± 1% -6.75% (p=0.000 n=28+30) AppendInPlace/Grow/Byte-8 269ns ± 1% 271ns ± 1% +0.84% (p=0.000 n=30+29) AppendInPlace/Grow/1Ptr-8 275ns ± 1% 280ns ± 1% +1.75% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 384ns ± 0% 391ns ± 0% +1.94% (p=0.000 n=27+30) AppendInPlace/Grow/3Ptr-8 455ns ± 0% 462ns ± 0% +1.43% (p=0.000 n=29+29) AppendInPlace/Grow/4Ptr-8 478ns ± 0% 479ns ± 0% +0.23% (p=0.000 n=30+27) However, for the large no-grow cases, there is still more work to be done. Going from this CL to the non-SSA backend: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 436ns ± 2% 436ns ± 2% ~ (p=0.967 n=27+29) AppendInPlace/NoGrow/1Ptr-8 856ns ± 0% 884ns ± 0% +3.28% (p=0.000 n=29+26) AppendInPlace/NoGrow/2Ptr-8 1.95µs ± 0% 1.56µs ± 0% -20.28% (p=0.000 n=30+29) AppendInPlace/NoGrow/3Ptr-8 2.45µs ± 0% 1.89µs ± 0% -22.88% (p=0.000 n=26+28) AppendInPlace/NoGrow/4Ptr-8 3.02µs ± 1% 2.56µs ± 1% -15.35% (p=0.000 n=30+28) AppendInPlace/Grow/Byte-8 271ns ± 1% 283ns ± 1% +4.56% (p=0.000 n=29+29) AppendInPlace/Grow/1Ptr-8 280ns ± 1% 288ns ± 1% +2.99% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 391ns ± 0% 409ns ± 0% +4.66% (p=0.000 n=30+29) AppendInPlace/Grow/3Ptr-8 462ns ± 0% 481ns ± 0% +4.13% (p=0.000 n=29+30) AppendInPlace/Grow/4Ptr-8 479ns ± 0% 502ns ± 0% +4.81% (p=0.000 n=27+26) New generated code: var x []byte func a() { x = append(x, 1) } "".a t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (a.go:5) TEXT "".a(SB), $72-0 0x0000 00000 (a.go:5) MOVQ (TLS), CX 0x0009 00009 (a.go:5) CMPQ SP, 16(CX) 0x000d 00013 (a.go:5) JLS 190 0x0013 00019 (a.go:5) SUBQ $72, SP 0x0017 00023 (a.go:5) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:5) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:6) MOVQ "".x+16(SB), CX 0x001e 00030 (a.go:6) MOVQ "".x+8(SB), DX 0x0025 00037 (a.go:6) MOVQ "".x(SB), BX 0x002c 00044 (a.go:6) LEAQ 1(DX), BP 0x0030 00048 (a.go:6) CMPQ BP, CX 0x0033 00051 (a.go:6) JGT $0, 73 0x0035 00053 (a.go:6) LEAQ 1(DX), AX 0x0039 00057 (a.go:6) MOVQ AX, "".x+8(SB) 0x0040 00064 (a.go:6) MOVB $1, (BX)(DX*1) 0x0044 00068 (a.go:7) ADDQ $72, SP 0x0048 00072 (a.go:7) RET 0x0049 00073 (a.go:6) LEAQ type.[]uint8(SB), AX 0x0050 00080 (a.go:6) MOVQ AX, (SP) 0x0054 00084 (a.go:6) MOVQ BX, 8(SP) 0x0059 00089 (a.go:6) MOVQ DX, 16(SP) 0x005e 00094 (a.go:6) MOVQ CX, 24(SP) 0x0063 00099 (a.go:6) MOVQ BP, 32(SP) 0x0068 00104 (a.go:6) PCDATA $0, $0 0x0068 00104 (a.go:6) CALL runtime.growslice(SB) 0x006d 00109 (a.go:6) MOVQ 40(SP), CX 0x0072 00114 (a.go:6) MOVQ 48(SP), DX 0x0077 00119 (a.go:6) MOVQ DX, "".autotmp_0+64(SP) 0x007c 00124 (a.go:6) MOVQ 56(SP), BX 0x0081 00129 (a.go:6) MOVQ BX, "".x+16(SB) 0x0088 00136 (a.go:6) MOVL runtime.writeBarrier(SB), AX 0x008e 00142 (a.go:6) TESTB AL, AL 0x0090 00144 (a.go:6) JNE $0, 162 0x0092 00146 (a.go:6) MOVQ CX, "".x(SB) 0x0099 00153 (a.go:6) MOVQ "".x(SB), BX 0x00a0 00160 (a.go:6) JMP 53 0x00a2 00162 (a.go:6) LEAQ "".x(SB), BX 0x00a9 00169 (a.go:6) MOVQ BX, (SP) 0x00ad 00173 (a.go:6) MOVQ CX, 8(SP) 0x00b2 00178 (a.go:6) PCDATA $0, $0 0x00b2 00178 (a.go:6) CALL runtime.writebarrierptr(SB) 0x00b7 00183 (a.go:6) MOVQ "".autotmp_0+64(SP), DX 0x00bc 00188 (a.go:6) JMP 153 0x00be 00190 (a.go:6) NOP 0x00be 00190 (a.go:5) CALL runtime.morestack_noctxt(SB) 0x00c3 00195 (a.go:5) JMP 0 Fixes #14969 again Change-Id: Ia50463b1f506011aad0718a4fef1d4738e43c32d Reviewed-on: https://go-review.googlesource.com/22197 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-18 09:40:30 -07:00
// newlen = len + 3 // recalculate to avoid a spill
// *a.len = newlen
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
// // with write barriers, if needed:
// *(ptr+len) = e1
// *(ptr+len+1) = e2
// *(ptr+len+2) = e3
et := n.Type.Elem()
pt := ptrto(et)
// Evaluate slice
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
sn := n.List.First() // the slice node is the first in the list
var slice, addr *ssa.Value
if inplace {
addr = s.addr(sn, false)
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
slice = s.newValue2(ssa.OpLoad, n.Type, addr, s.mem())
} else {
slice = s.expr(sn)
}
// Allocate new blocks
grow := s.f.NewBlock(ssa.BlockPlain)
assign := s.f.NewBlock(ssa.BlockPlain)
// Decide if we need to grow
nargs := int64(n.List.Len() - 1)
p := s.newValue1(ssa.OpSlicePtr, pt, slice)
l := s.newValue1(ssa.OpSliceLen, Types[TINT], slice)
c := s.newValue1(ssa.OpSliceCap, Types[TINT], slice)
nl := s.newValue2(s.ssaOp(OADD, Types[TINT]), Types[TINT], l, s.constInt(Types[TINT], nargs))
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
cmp := s.newValue2(s.ssaOp(OGT, Types[TINT]), Types[TBOOL], nl, c)
s.vars[&ptrVar] = p
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
if !inplace {
s.vars[&newlenVar] = nl
s.vars[&capVar] = c
cmd/compile: re-enable in-place append optimization CL 21891 was too clever in its attempts to avoid spills. Storing newlen too early caused uses of append in the runtime itself to receive an inconsistent view of a slice, leading to corruption. This CL makes the generate code much more similar to the old backend. It spills more than before, but those spills have been contained to the grow path. It recalculates newlen unnecessarily on the fast path, but that's measurably cheaper than spilling it. CL 21891 caused runtime failures in 6 of 2000 runs of net/http and crypto/x509 in my test setup. This CL has gone 6000 runs without a failure. Benchmarks going from master to this CL: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 439ns ± 2% 436ns ± 2% -0.72% (p=0.001 n=28+27) AppendInPlace/NoGrow/1Ptr-8 901ns ± 0% 856ns ± 0% -4.95% (p=0.000 n=26+29) AppendInPlace/NoGrow/2Ptr-8 2.15µs ± 1% 1.95µs ± 0% -9.07% (p=0.000 n=28+30) AppendInPlace/NoGrow/3Ptr-8 2.66µs ± 0% 2.45µs ± 0% -7.93% (p=0.000 n=29+26) AppendInPlace/NoGrow/4Ptr-8 3.24µs ± 1% 3.02µs ± 1% -6.75% (p=0.000 n=28+30) AppendInPlace/Grow/Byte-8 269ns ± 1% 271ns ± 1% +0.84% (p=0.000 n=30+29) AppendInPlace/Grow/1Ptr-8 275ns ± 1% 280ns ± 1% +1.75% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 384ns ± 0% 391ns ± 0% +1.94% (p=0.000 n=27+30) AppendInPlace/Grow/3Ptr-8 455ns ± 0% 462ns ± 0% +1.43% (p=0.000 n=29+29) AppendInPlace/Grow/4Ptr-8 478ns ± 0% 479ns ± 0% +0.23% (p=0.000 n=30+27) However, for the large no-grow cases, there is still more work to be done. Going from this CL to the non-SSA backend: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 436ns ± 2% 436ns ± 2% ~ (p=0.967 n=27+29) AppendInPlace/NoGrow/1Ptr-8 856ns ± 0% 884ns ± 0% +3.28% (p=0.000 n=29+26) AppendInPlace/NoGrow/2Ptr-8 1.95µs ± 0% 1.56µs ± 0% -20.28% (p=0.000 n=30+29) AppendInPlace/NoGrow/3Ptr-8 2.45µs ± 0% 1.89µs ± 0% -22.88% (p=0.000 n=26+28) AppendInPlace/NoGrow/4Ptr-8 3.02µs ± 1% 2.56µs ± 1% -15.35% (p=0.000 n=30+28) AppendInPlace/Grow/Byte-8 271ns ± 1% 283ns ± 1% +4.56% (p=0.000 n=29+29) AppendInPlace/Grow/1Ptr-8 280ns ± 1% 288ns ± 1% +2.99% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 391ns ± 0% 409ns ± 0% +4.66% (p=0.000 n=30+29) AppendInPlace/Grow/3Ptr-8 462ns ± 0% 481ns ± 0% +4.13% (p=0.000 n=29+30) AppendInPlace/Grow/4Ptr-8 479ns ± 0% 502ns ± 0% +4.81% (p=0.000 n=27+26) New generated code: var x []byte func a() { x = append(x, 1) } "".a t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (a.go:5) TEXT "".a(SB), $72-0 0x0000 00000 (a.go:5) MOVQ (TLS), CX 0x0009 00009 (a.go:5) CMPQ SP, 16(CX) 0x000d 00013 (a.go:5) JLS 190 0x0013 00019 (a.go:5) SUBQ $72, SP 0x0017 00023 (a.go:5) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:5) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:6) MOVQ "".x+16(SB), CX 0x001e 00030 (a.go:6) MOVQ "".x+8(SB), DX 0x0025 00037 (a.go:6) MOVQ "".x(SB), BX 0x002c 00044 (a.go:6) LEAQ 1(DX), BP 0x0030 00048 (a.go:6) CMPQ BP, CX 0x0033 00051 (a.go:6) JGT $0, 73 0x0035 00053 (a.go:6) LEAQ 1(DX), AX 0x0039 00057 (a.go:6) MOVQ AX, "".x+8(SB) 0x0040 00064 (a.go:6) MOVB $1, (BX)(DX*1) 0x0044 00068 (a.go:7) ADDQ $72, SP 0x0048 00072 (a.go:7) RET 0x0049 00073 (a.go:6) LEAQ type.[]uint8(SB), AX 0x0050 00080 (a.go:6) MOVQ AX, (SP) 0x0054 00084 (a.go:6) MOVQ BX, 8(SP) 0x0059 00089 (a.go:6) MOVQ DX, 16(SP) 0x005e 00094 (a.go:6) MOVQ CX, 24(SP) 0x0063 00099 (a.go:6) MOVQ BP, 32(SP) 0x0068 00104 (a.go:6) PCDATA $0, $0 0x0068 00104 (a.go:6) CALL runtime.growslice(SB) 0x006d 00109 (a.go:6) MOVQ 40(SP), CX 0x0072 00114 (a.go:6) MOVQ 48(SP), DX 0x0077 00119 (a.go:6) MOVQ DX, "".autotmp_0+64(SP) 0x007c 00124 (a.go:6) MOVQ 56(SP), BX 0x0081 00129 (a.go:6) MOVQ BX, "".x+16(SB) 0x0088 00136 (a.go:6) MOVL runtime.writeBarrier(SB), AX 0x008e 00142 (a.go:6) TESTB AL, AL 0x0090 00144 (a.go:6) JNE $0, 162 0x0092 00146 (a.go:6) MOVQ CX, "".x(SB) 0x0099 00153 (a.go:6) MOVQ "".x(SB), BX 0x00a0 00160 (a.go:6) JMP 53 0x00a2 00162 (a.go:6) LEAQ "".x(SB), BX 0x00a9 00169 (a.go:6) MOVQ BX, (SP) 0x00ad 00173 (a.go:6) MOVQ CX, 8(SP) 0x00b2 00178 (a.go:6) PCDATA $0, $0 0x00b2 00178 (a.go:6) CALL runtime.writebarrierptr(SB) 0x00b7 00183 (a.go:6) MOVQ "".autotmp_0+64(SP), DX 0x00bc 00188 (a.go:6) JMP 153 0x00be 00190 (a.go:6) NOP 0x00be 00190 (a.go:5) CALL runtime.morestack_noctxt(SB) 0x00c3 00195 (a.go:5) JMP 0 Fixes #14969 again Change-Id: Ia50463b1f506011aad0718a4fef1d4738e43c32d Reviewed-on: https://go-review.googlesource.com/22197 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-18 09:40:30 -07:00
} else {
s.vars[&lenVar] = l
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
}
b := s.endBlock()
b.Kind = ssa.BlockIf
b.Likely = ssa.BranchUnlikely
b.SetControl(cmp)
b.AddEdgeTo(grow)
b.AddEdgeTo(assign)
// Call growslice
s.startBlock(grow)
taddr := s.newValue1A(ssa.OpAddr, Types[TUINTPTR], &ssa.ExternSymbol{Typ: Types[TUINTPTR], Sym: Linksym(typenamesym(n.Type.Elem()))}, s.sb)
r := s.rtcall(growslice, true, []*Type{pt, Types[TINT], Types[TINT]}, taddr, p, l, c, nl)
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
if inplace {
cmd/compile: re-enable in-place append optimization CL 21891 was too clever in its attempts to avoid spills. Storing newlen too early caused uses of append in the runtime itself to receive an inconsistent view of a slice, leading to corruption. This CL makes the generate code much more similar to the old backend. It spills more than before, but those spills have been contained to the grow path. It recalculates newlen unnecessarily on the fast path, but that's measurably cheaper than spilling it. CL 21891 caused runtime failures in 6 of 2000 runs of net/http and crypto/x509 in my test setup. This CL has gone 6000 runs without a failure. Benchmarks going from master to this CL: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 439ns ± 2% 436ns ± 2% -0.72% (p=0.001 n=28+27) AppendInPlace/NoGrow/1Ptr-8 901ns ± 0% 856ns ± 0% -4.95% (p=0.000 n=26+29) AppendInPlace/NoGrow/2Ptr-8 2.15µs ± 1% 1.95µs ± 0% -9.07% (p=0.000 n=28+30) AppendInPlace/NoGrow/3Ptr-8 2.66µs ± 0% 2.45µs ± 0% -7.93% (p=0.000 n=29+26) AppendInPlace/NoGrow/4Ptr-8 3.24µs ± 1% 3.02µs ± 1% -6.75% (p=0.000 n=28+30) AppendInPlace/Grow/Byte-8 269ns ± 1% 271ns ± 1% +0.84% (p=0.000 n=30+29) AppendInPlace/Grow/1Ptr-8 275ns ± 1% 280ns ± 1% +1.75% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 384ns ± 0% 391ns ± 0% +1.94% (p=0.000 n=27+30) AppendInPlace/Grow/3Ptr-8 455ns ± 0% 462ns ± 0% +1.43% (p=0.000 n=29+29) AppendInPlace/Grow/4Ptr-8 478ns ± 0% 479ns ± 0% +0.23% (p=0.000 n=30+27) However, for the large no-grow cases, there is still more work to be done. Going from this CL to the non-SSA backend: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 436ns ± 2% 436ns ± 2% ~ (p=0.967 n=27+29) AppendInPlace/NoGrow/1Ptr-8 856ns ± 0% 884ns ± 0% +3.28% (p=0.000 n=29+26) AppendInPlace/NoGrow/2Ptr-8 1.95µs ± 0% 1.56µs ± 0% -20.28% (p=0.000 n=30+29) AppendInPlace/NoGrow/3Ptr-8 2.45µs ± 0% 1.89µs ± 0% -22.88% (p=0.000 n=26+28) AppendInPlace/NoGrow/4Ptr-8 3.02µs ± 1% 2.56µs ± 1% -15.35% (p=0.000 n=30+28) AppendInPlace/Grow/Byte-8 271ns ± 1% 283ns ± 1% +4.56% (p=0.000 n=29+29) AppendInPlace/Grow/1Ptr-8 280ns ± 1% 288ns ± 1% +2.99% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 391ns ± 0% 409ns ± 0% +4.66% (p=0.000 n=30+29) AppendInPlace/Grow/3Ptr-8 462ns ± 0% 481ns ± 0% +4.13% (p=0.000 n=29+30) AppendInPlace/Grow/4Ptr-8 479ns ± 0% 502ns ± 0% +4.81% (p=0.000 n=27+26) New generated code: var x []byte func a() { x = append(x, 1) } "".a t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (a.go:5) TEXT "".a(SB), $72-0 0x0000 00000 (a.go:5) MOVQ (TLS), CX 0x0009 00009 (a.go:5) CMPQ SP, 16(CX) 0x000d 00013 (a.go:5) JLS 190 0x0013 00019 (a.go:5) SUBQ $72, SP 0x0017 00023 (a.go:5) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:5) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:6) MOVQ "".x+16(SB), CX 0x001e 00030 (a.go:6) MOVQ "".x+8(SB), DX 0x0025 00037 (a.go:6) MOVQ "".x(SB), BX 0x002c 00044 (a.go:6) LEAQ 1(DX), BP 0x0030 00048 (a.go:6) CMPQ BP, CX 0x0033 00051 (a.go:6) JGT $0, 73 0x0035 00053 (a.go:6) LEAQ 1(DX), AX 0x0039 00057 (a.go:6) MOVQ AX, "".x+8(SB) 0x0040 00064 (a.go:6) MOVB $1, (BX)(DX*1) 0x0044 00068 (a.go:7) ADDQ $72, SP 0x0048 00072 (a.go:7) RET 0x0049 00073 (a.go:6) LEAQ type.[]uint8(SB), AX 0x0050 00080 (a.go:6) MOVQ AX, (SP) 0x0054 00084 (a.go:6) MOVQ BX, 8(SP) 0x0059 00089 (a.go:6) MOVQ DX, 16(SP) 0x005e 00094 (a.go:6) MOVQ CX, 24(SP) 0x0063 00099 (a.go:6) MOVQ BP, 32(SP) 0x0068 00104 (a.go:6) PCDATA $0, $0 0x0068 00104 (a.go:6) CALL runtime.growslice(SB) 0x006d 00109 (a.go:6) MOVQ 40(SP), CX 0x0072 00114 (a.go:6) MOVQ 48(SP), DX 0x0077 00119 (a.go:6) MOVQ DX, "".autotmp_0+64(SP) 0x007c 00124 (a.go:6) MOVQ 56(SP), BX 0x0081 00129 (a.go:6) MOVQ BX, "".x+16(SB) 0x0088 00136 (a.go:6) MOVL runtime.writeBarrier(SB), AX 0x008e 00142 (a.go:6) TESTB AL, AL 0x0090 00144 (a.go:6) JNE $0, 162 0x0092 00146 (a.go:6) MOVQ CX, "".x(SB) 0x0099 00153 (a.go:6) MOVQ "".x(SB), BX 0x00a0 00160 (a.go:6) JMP 53 0x00a2 00162 (a.go:6) LEAQ "".x(SB), BX 0x00a9 00169 (a.go:6) MOVQ BX, (SP) 0x00ad 00173 (a.go:6) MOVQ CX, 8(SP) 0x00b2 00178 (a.go:6) PCDATA $0, $0 0x00b2 00178 (a.go:6) CALL runtime.writebarrierptr(SB) 0x00b7 00183 (a.go:6) MOVQ "".autotmp_0+64(SP), DX 0x00bc 00188 (a.go:6) JMP 153 0x00be 00190 (a.go:6) NOP 0x00be 00190 (a.go:5) CALL runtime.morestack_noctxt(SB) 0x00c3 00195 (a.go:5) JMP 0 Fixes #14969 again Change-Id: Ia50463b1f506011aad0718a4fef1d4738e43c32d Reviewed-on: https://go-review.googlesource.com/22197 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-18 09:40:30 -07:00
if sn.Op == ONAME {
// Tell liveness we're about to build a new slice
s.vars[&memVar] = s.newValue1A(ssa.OpVarDef, ssa.TypeMem, sn, s.mem())
}
capaddr := s.newValue1I(ssa.OpOffPtr, ptrto(Types[TINT]), int64(array_cap), addr)
s.vars[&memVar] = s.newValue3A(ssa.OpStore, ssa.TypeMem, Types[TINT], capaddr, r[2], s.mem())
s.vars[&memVar] = s.newValue3A(ssa.OpStore, ssa.TypeMem, pt, addr, r[0], s.mem())
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
// load the value we just stored to avoid having to spill it
s.vars[&ptrVar] = s.newValue2(ssa.OpLoad, pt, addr, s.mem())
cmd/compile: re-enable in-place append optimization CL 21891 was too clever in its attempts to avoid spills. Storing newlen too early caused uses of append in the runtime itself to receive an inconsistent view of a slice, leading to corruption. This CL makes the generate code much more similar to the old backend. It spills more than before, but those spills have been contained to the grow path. It recalculates newlen unnecessarily on the fast path, but that's measurably cheaper than spilling it. CL 21891 caused runtime failures in 6 of 2000 runs of net/http and crypto/x509 in my test setup. This CL has gone 6000 runs without a failure. Benchmarks going from master to this CL: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 439ns ± 2% 436ns ± 2% -0.72% (p=0.001 n=28+27) AppendInPlace/NoGrow/1Ptr-8 901ns ± 0% 856ns ± 0% -4.95% (p=0.000 n=26+29) AppendInPlace/NoGrow/2Ptr-8 2.15µs ± 1% 1.95µs ± 0% -9.07% (p=0.000 n=28+30) AppendInPlace/NoGrow/3Ptr-8 2.66µs ± 0% 2.45µs ± 0% -7.93% (p=0.000 n=29+26) AppendInPlace/NoGrow/4Ptr-8 3.24µs ± 1% 3.02µs ± 1% -6.75% (p=0.000 n=28+30) AppendInPlace/Grow/Byte-8 269ns ± 1% 271ns ± 1% +0.84% (p=0.000 n=30+29) AppendInPlace/Grow/1Ptr-8 275ns ± 1% 280ns ± 1% +1.75% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 384ns ± 0% 391ns ± 0% +1.94% (p=0.000 n=27+30) AppendInPlace/Grow/3Ptr-8 455ns ± 0% 462ns ± 0% +1.43% (p=0.000 n=29+29) AppendInPlace/Grow/4Ptr-8 478ns ± 0% 479ns ± 0% +0.23% (p=0.000 n=30+27) However, for the large no-grow cases, there is still more work to be done. Going from this CL to the non-SSA backend: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 436ns ± 2% 436ns ± 2% ~ (p=0.967 n=27+29) AppendInPlace/NoGrow/1Ptr-8 856ns ± 0% 884ns ± 0% +3.28% (p=0.000 n=29+26) AppendInPlace/NoGrow/2Ptr-8 1.95µs ± 0% 1.56µs ± 0% -20.28% (p=0.000 n=30+29) AppendInPlace/NoGrow/3Ptr-8 2.45µs ± 0% 1.89µs ± 0% -22.88% (p=0.000 n=26+28) AppendInPlace/NoGrow/4Ptr-8 3.02µs ± 1% 2.56µs ± 1% -15.35% (p=0.000 n=30+28) AppendInPlace/Grow/Byte-8 271ns ± 1% 283ns ± 1% +4.56% (p=0.000 n=29+29) AppendInPlace/Grow/1Ptr-8 280ns ± 1% 288ns ± 1% +2.99% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 391ns ± 0% 409ns ± 0% +4.66% (p=0.000 n=30+29) AppendInPlace/Grow/3Ptr-8 462ns ± 0% 481ns ± 0% +4.13% (p=0.000 n=29+30) AppendInPlace/Grow/4Ptr-8 479ns ± 0% 502ns ± 0% +4.81% (p=0.000 n=27+26) New generated code: var x []byte func a() { x = append(x, 1) } "".a t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (a.go:5) TEXT "".a(SB), $72-0 0x0000 00000 (a.go:5) MOVQ (TLS), CX 0x0009 00009 (a.go:5) CMPQ SP, 16(CX) 0x000d 00013 (a.go:5) JLS 190 0x0013 00019 (a.go:5) SUBQ $72, SP 0x0017 00023 (a.go:5) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:5) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:6) MOVQ "".x+16(SB), CX 0x001e 00030 (a.go:6) MOVQ "".x+8(SB), DX 0x0025 00037 (a.go:6) MOVQ "".x(SB), BX 0x002c 00044 (a.go:6) LEAQ 1(DX), BP 0x0030 00048 (a.go:6) CMPQ BP, CX 0x0033 00051 (a.go:6) JGT $0, 73 0x0035 00053 (a.go:6) LEAQ 1(DX), AX 0x0039 00057 (a.go:6) MOVQ AX, "".x+8(SB) 0x0040 00064 (a.go:6) MOVB $1, (BX)(DX*1) 0x0044 00068 (a.go:7) ADDQ $72, SP 0x0048 00072 (a.go:7) RET 0x0049 00073 (a.go:6) LEAQ type.[]uint8(SB), AX 0x0050 00080 (a.go:6) MOVQ AX, (SP) 0x0054 00084 (a.go:6) MOVQ BX, 8(SP) 0x0059 00089 (a.go:6) MOVQ DX, 16(SP) 0x005e 00094 (a.go:6) MOVQ CX, 24(SP) 0x0063 00099 (a.go:6) MOVQ BP, 32(SP) 0x0068 00104 (a.go:6) PCDATA $0, $0 0x0068 00104 (a.go:6) CALL runtime.growslice(SB) 0x006d 00109 (a.go:6) MOVQ 40(SP), CX 0x0072 00114 (a.go:6) MOVQ 48(SP), DX 0x0077 00119 (a.go:6) MOVQ DX, "".autotmp_0+64(SP) 0x007c 00124 (a.go:6) MOVQ 56(SP), BX 0x0081 00129 (a.go:6) MOVQ BX, "".x+16(SB) 0x0088 00136 (a.go:6) MOVL runtime.writeBarrier(SB), AX 0x008e 00142 (a.go:6) TESTB AL, AL 0x0090 00144 (a.go:6) JNE $0, 162 0x0092 00146 (a.go:6) MOVQ CX, "".x(SB) 0x0099 00153 (a.go:6) MOVQ "".x(SB), BX 0x00a0 00160 (a.go:6) JMP 53 0x00a2 00162 (a.go:6) LEAQ "".x(SB), BX 0x00a9 00169 (a.go:6) MOVQ BX, (SP) 0x00ad 00173 (a.go:6) MOVQ CX, 8(SP) 0x00b2 00178 (a.go:6) PCDATA $0, $0 0x00b2 00178 (a.go:6) CALL runtime.writebarrierptr(SB) 0x00b7 00183 (a.go:6) MOVQ "".autotmp_0+64(SP), DX 0x00bc 00188 (a.go:6) JMP 153 0x00be 00190 (a.go:6) NOP 0x00be 00190 (a.go:5) CALL runtime.morestack_noctxt(SB) 0x00c3 00195 (a.go:5) JMP 0 Fixes #14969 again Change-Id: Ia50463b1f506011aad0718a4fef1d4738e43c32d Reviewed-on: https://go-review.googlesource.com/22197 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-18 09:40:30 -07:00
s.vars[&lenVar] = r[1] // avoid a spill in the fast path
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
} else {
s.vars[&ptrVar] = r[0]
s.vars[&newlenVar] = s.newValue2(s.ssaOp(OADD, Types[TINT]), Types[TINT], r[1], s.constInt(Types[TINT], nargs))
s.vars[&capVar] = r[2]
}
b = s.endBlock()
b.AddEdgeTo(assign)
// assign new elements to slots
s.startBlock(assign)
cmd/compile: re-enable in-place append optimization CL 21891 was too clever in its attempts to avoid spills. Storing newlen too early caused uses of append in the runtime itself to receive an inconsistent view of a slice, leading to corruption. This CL makes the generate code much more similar to the old backend. It spills more than before, but those spills have been contained to the grow path. It recalculates newlen unnecessarily on the fast path, but that's measurably cheaper than spilling it. CL 21891 caused runtime failures in 6 of 2000 runs of net/http and crypto/x509 in my test setup. This CL has gone 6000 runs without a failure. Benchmarks going from master to this CL: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 439ns ± 2% 436ns ± 2% -0.72% (p=0.001 n=28+27) AppendInPlace/NoGrow/1Ptr-8 901ns ± 0% 856ns ± 0% -4.95% (p=0.000 n=26+29) AppendInPlace/NoGrow/2Ptr-8 2.15µs ± 1% 1.95µs ± 0% -9.07% (p=0.000 n=28+30) AppendInPlace/NoGrow/3Ptr-8 2.66µs ± 0% 2.45µs ± 0% -7.93% (p=0.000 n=29+26) AppendInPlace/NoGrow/4Ptr-8 3.24µs ± 1% 3.02µs ± 1% -6.75% (p=0.000 n=28+30) AppendInPlace/Grow/Byte-8 269ns ± 1% 271ns ± 1% +0.84% (p=0.000 n=30+29) AppendInPlace/Grow/1Ptr-8 275ns ± 1% 280ns ± 1% +1.75% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 384ns ± 0% 391ns ± 0% +1.94% (p=0.000 n=27+30) AppendInPlace/Grow/3Ptr-8 455ns ± 0% 462ns ± 0% +1.43% (p=0.000 n=29+29) AppendInPlace/Grow/4Ptr-8 478ns ± 0% 479ns ± 0% +0.23% (p=0.000 n=30+27) However, for the large no-grow cases, there is still more work to be done. Going from this CL to the non-SSA backend: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 436ns ± 2% 436ns ± 2% ~ (p=0.967 n=27+29) AppendInPlace/NoGrow/1Ptr-8 856ns ± 0% 884ns ± 0% +3.28% (p=0.000 n=29+26) AppendInPlace/NoGrow/2Ptr-8 1.95µs ± 0% 1.56µs ± 0% -20.28% (p=0.000 n=30+29) AppendInPlace/NoGrow/3Ptr-8 2.45µs ± 0% 1.89µs ± 0% -22.88% (p=0.000 n=26+28) AppendInPlace/NoGrow/4Ptr-8 3.02µs ± 1% 2.56µs ± 1% -15.35% (p=0.000 n=30+28) AppendInPlace/Grow/Byte-8 271ns ± 1% 283ns ± 1% +4.56% (p=0.000 n=29+29) AppendInPlace/Grow/1Ptr-8 280ns ± 1% 288ns ± 1% +2.99% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 391ns ± 0% 409ns ± 0% +4.66% (p=0.000 n=30+29) AppendInPlace/Grow/3Ptr-8 462ns ± 0% 481ns ± 0% +4.13% (p=0.000 n=29+30) AppendInPlace/Grow/4Ptr-8 479ns ± 0% 502ns ± 0% +4.81% (p=0.000 n=27+26) New generated code: var x []byte func a() { x = append(x, 1) } "".a t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (a.go:5) TEXT "".a(SB), $72-0 0x0000 00000 (a.go:5) MOVQ (TLS), CX 0x0009 00009 (a.go:5) CMPQ SP, 16(CX) 0x000d 00013 (a.go:5) JLS 190 0x0013 00019 (a.go:5) SUBQ $72, SP 0x0017 00023 (a.go:5) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:5) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:6) MOVQ "".x+16(SB), CX 0x001e 00030 (a.go:6) MOVQ "".x+8(SB), DX 0x0025 00037 (a.go:6) MOVQ "".x(SB), BX 0x002c 00044 (a.go:6) LEAQ 1(DX), BP 0x0030 00048 (a.go:6) CMPQ BP, CX 0x0033 00051 (a.go:6) JGT $0, 73 0x0035 00053 (a.go:6) LEAQ 1(DX), AX 0x0039 00057 (a.go:6) MOVQ AX, "".x+8(SB) 0x0040 00064 (a.go:6) MOVB $1, (BX)(DX*1) 0x0044 00068 (a.go:7) ADDQ $72, SP 0x0048 00072 (a.go:7) RET 0x0049 00073 (a.go:6) LEAQ type.[]uint8(SB), AX 0x0050 00080 (a.go:6) MOVQ AX, (SP) 0x0054 00084 (a.go:6) MOVQ BX, 8(SP) 0x0059 00089 (a.go:6) MOVQ DX, 16(SP) 0x005e 00094 (a.go:6) MOVQ CX, 24(SP) 0x0063 00099 (a.go:6) MOVQ BP, 32(SP) 0x0068 00104 (a.go:6) PCDATA $0, $0 0x0068 00104 (a.go:6) CALL runtime.growslice(SB) 0x006d 00109 (a.go:6) MOVQ 40(SP), CX 0x0072 00114 (a.go:6) MOVQ 48(SP), DX 0x0077 00119 (a.go:6) MOVQ DX, "".autotmp_0+64(SP) 0x007c 00124 (a.go:6) MOVQ 56(SP), BX 0x0081 00129 (a.go:6) MOVQ BX, "".x+16(SB) 0x0088 00136 (a.go:6) MOVL runtime.writeBarrier(SB), AX 0x008e 00142 (a.go:6) TESTB AL, AL 0x0090 00144 (a.go:6) JNE $0, 162 0x0092 00146 (a.go:6) MOVQ CX, "".x(SB) 0x0099 00153 (a.go:6) MOVQ "".x(SB), BX 0x00a0 00160 (a.go:6) JMP 53 0x00a2 00162 (a.go:6) LEAQ "".x(SB), BX 0x00a9 00169 (a.go:6) MOVQ BX, (SP) 0x00ad 00173 (a.go:6) MOVQ CX, 8(SP) 0x00b2 00178 (a.go:6) PCDATA $0, $0 0x00b2 00178 (a.go:6) CALL runtime.writebarrierptr(SB) 0x00b7 00183 (a.go:6) MOVQ "".autotmp_0+64(SP), DX 0x00bc 00188 (a.go:6) JMP 153 0x00be 00190 (a.go:6) NOP 0x00be 00190 (a.go:5) CALL runtime.morestack_noctxt(SB) 0x00c3 00195 (a.go:5) JMP 0 Fixes #14969 again Change-Id: Ia50463b1f506011aad0718a4fef1d4738e43c32d Reviewed-on: https://go-review.googlesource.com/22197 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-18 09:40:30 -07:00
if inplace {
l = s.variable(&lenVar, Types[TINT]) // generates phi for len
nl = s.newValue2(s.ssaOp(OADD, Types[TINT]), Types[TINT], l, s.constInt(Types[TINT], nargs))
lenaddr := s.newValue1I(ssa.OpOffPtr, ptrto(Types[TINT]), int64(array_nel), addr)
s.vars[&memVar] = s.newValue3A(ssa.OpStore, ssa.TypeMem, Types[TINT], lenaddr, nl, s.mem())
cmd/compile: re-enable in-place append optimization CL 21891 was too clever in its attempts to avoid spills. Storing newlen too early caused uses of append in the runtime itself to receive an inconsistent view of a slice, leading to corruption. This CL makes the generate code much more similar to the old backend. It spills more than before, but those spills have been contained to the grow path. It recalculates newlen unnecessarily on the fast path, but that's measurably cheaper than spilling it. CL 21891 caused runtime failures in 6 of 2000 runs of net/http and crypto/x509 in my test setup. This CL has gone 6000 runs without a failure. Benchmarks going from master to this CL: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 439ns ± 2% 436ns ± 2% -0.72% (p=0.001 n=28+27) AppendInPlace/NoGrow/1Ptr-8 901ns ± 0% 856ns ± 0% -4.95% (p=0.000 n=26+29) AppendInPlace/NoGrow/2Ptr-8 2.15µs ± 1% 1.95µs ± 0% -9.07% (p=0.000 n=28+30) AppendInPlace/NoGrow/3Ptr-8 2.66µs ± 0% 2.45µs ± 0% -7.93% (p=0.000 n=29+26) AppendInPlace/NoGrow/4Ptr-8 3.24µs ± 1% 3.02µs ± 1% -6.75% (p=0.000 n=28+30) AppendInPlace/Grow/Byte-8 269ns ± 1% 271ns ± 1% +0.84% (p=0.000 n=30+29) AppendInPlace/Grow/1Ptr-8 275ns ± 1% 280ns ± 1% +1.75% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 384ns ± 0% 391ns ± 0% +1.94% (p=0.000 n=27+30) AppendInPlace/Grow/3Ptr-8 455ns ± 0% 462ns ± 0% +1.43% (p=0.000 n=29+29) AppendInPlace/Grow/4Ptr-8 478ns ± 0% 479ns ± 0% +0.23% (p=0.000 n=30+27) However, for the large no-grow cases, there is still more work to be done. Going from this CL to the non-SSA backend: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 436ns ± 2% 436ns ± 2% ~ (p=0.967 n=27+29) AppendInPlace/NoGrow/1Ptr-8 856ns ± 0% 884ns ± 0% +3.28% (p=0.000 n=29+26) AppendInPlace/NoGrow/2Ptr-8 1.95µs ± 0% 1.56µs ± 0% -20.28% (p=0.000 n=30+29) AppendInPlace/NoGrow/3Ptr-8 2.45µs ± 0% 1.89µs ± 0% -22.88% (p=0.000 n=26+28) AppendInPlace/NoGrow/4Ptr-8 3.02µs ± 1% 2.56µs ± 1% -15.35% (p=0.000 n=30+28) AppendInPlace/Grow/Byte-8 271ns ± 1% 283ns ± 1% +4.56% (p=0.000 n=29+29) AppendInPlace/Grow/1Ptr-8 280ns ± 1% 288ns ± 1% +2.99% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 391ns ± 0% 409ns ± 0% +4.66% (p=0.000 n=30+29) AppendInPlace/Grow/3Ptr-8 462ns ± 0% 481ns ± 0% +4.13% (p=0.000 n=29+30) AppendInPlace/Grow/4Ptr-8 479ns ± 0% 502ns ± 0% +4.81% (p=0.000 n=27+26) New generated code: var x []byte func a() { x = append(x, 1) } "".a t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (a.go:5) TEXT "".a(SB), $72-0 0x0000 00000 (a.go:5) MOVQ (TLS), CX 0x0009 00009 (a.go:5) CMPQ SP, 16(CX) 0x000d 00013 (a.go:5) JLS 190 0x0013 00019 (a.go:5) SUBQ $72, SP 0x0017 00023 (a.go:5) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:5) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:6) MOVQ "".x+16(SB), CX 0x001e 00030 (a.go:6) MOVQ "".x+8(SB), DX 0x0025 00037 (a.go:6) MOVQ "".x(SB), BX 0x002c 00044 (a.go:6) LEAQ 1(DX), BP 0x0030 00048 (a.go:6) CMPQ BP, CX 0x0033 00051 (a.go:6) JGT $0, 73 0x0035 00053 (a.go:6) LEAQ 1(DX), AX 0x0039 00057 (a.go:6) MOVQ AX, "".x+8(SB) 0x0040 00064 (a.go:6) MOVB $1, (BX)(DX*1) 0x0044 00068 (a.go:7) ADDQ $72, SP 0x0048 00072 (a.go:7) RET 0x0049 00073 (a.go:6) LEAQ type.[]uint8(SB), AX 0x0050 00080 (a.go:6) MOVQ AX, (SP) 0x0054 00084 (a.go:6) MOVQ BX, 8(SP) 0x0059 00089 (a.go:6) MOVQ DX, 16(SP) 0x005e 00094 (a.go:6) MOVQ CX, 24(SP) 0x0063 00099 (a.go:6) MOVQ BP, 32(SP) 0x0068 00104 (a.go:6) PCDATA $0, $0 0x0068 00104 (a.go:6) CALL runtime.growslice(SB) 0x006d 00109 (a.go:6) MOVQ 40(SP), CX 0x0072 00114 (a.go:6) MOVQ 48(SP), DX 0x0077 00119 (a.go:6) MOVQ DX, "".autotmp_0+64(SP) 0x007c 00124 (a.go:6) MOVQ 56(SP), BX 0x0081 00129 (a.go:6) MOVQ BX, "".x+16(SB) 0x0088 00136 (a.go:6) MOVL runtime.writeBarrier(SB), AX 0x008e 00142 (a.go:6) TESTB AL, AL 0x0090 00144 (a.go:6) JNE $0, 162 0x0092 00146 (a.go:6) MOVQ CX, "".x(SB) 0x0099 00153 (a.go:6) MOVQ "".x(SB), BX 0x00a0 00160 (a.go:6) JMP 53 0x00a2 00162 (a.go:6) LEAQ "".x(SB), BX 0x00a9 00169 (a.go:6) MOVQ BX, (SP) 0x00ad 00173 (a.go:6) MOVQ CX, 8(SP) 0x00b2 00178 (a.go:6) PCDATA $0, $0 0x00b2 00178 (a.go:6) CALL runtime.writebarrierptr(SB) 0x00b7 00183 (a.go:6) MOVQ "".autotmp_0+64(SP), DX 0x00bc 00188 (a.go:6) JMP 153 0x00be 00190 (a.go:6) NOP 0x00be 00190 (a.go:5) CALL runtime.morestack_noctxt(SB) 0x00c3 00195 (a.go:5) JMP 0 Fixes #14969 again Change-Id: Ia50463b1f506011aad0718a4fef1d4738e43c32d Reviewed-on: https://go-review.googlesource.com/22197 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-18 09:40:30 -07:00
}
// Evaluate args
type argRec struct {
// if store is true, we're appending the value v. If false, we're appending the
// value at *v.
v *ssa.Value
store bool
}
args := make([]argRec, 0, nargs)
for _, n := range n.List.Slice()[1:] {
if canSSAType(n.Type) {
args = append(args, argRec{v: s.expr(n), store: true})
} else {
v := s.addr(n, false)
args = append(args, argRec{v: v})
}
}
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
p = s.variable(&ptrVar, pt) // generates phi for ptr
if !inplace {
nl = s.variable(&newlenVar, Types[TINT]) // generates phi for nl
c = s.variable(&capVar, Types[TINT]) // generates phi for cap
}
p2 := s.newValue2(ssa.OpPtrIndex, pt, p, l)
for i, arg := range args {
addr := s.newValue2(ssa.OpPtrIndex, pt, p2, s.constInt(Types[TINT], int64(i)))
if arg.store {
s.storeType(et, addr, arg.v, 0)
} else {
store := s.newValue3I(ssa.OpMove, ssa.TypeMem, et.Size(), addr, arg.v, s.mem())
store.Aux = et
s.vars[&memVar] = store
}
}
delete(s.vars, &ptrVar)
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
if inplace {
cmd/compile: re-enable in-place append optimization CL 21891 was too clever in its attempts to avoid spills. Storing newlen too early caused uses of append in the runtime itself to receive an inconsistent view of a slice, leading to corruption. This CL makes the generate code much more similar to the old backend. It spills more than before, but those spills have been contained to the grow path. It recalculates newlen unnecessarily on the fast path, but that's measurably cheaper than spilling it. CL 21891 caused runtime failures in 6 of 2000 runs of net/http and crypto/x509 in my test setup. This CL has gone 6000 runs without a failure. Benchmarks going from master to this CL: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 439ns ± 2% 436ns ± 2% -0.72% (p=0.001 n=28+27) AppendInPlace/NoGrow/1Ptr-8 901ns ± 0% 856ns ± 0% -4.95% (p=0.000 n=26+29) AppendInPlace/NoGrow/2Ptr-8 2.15µs ± 1% 1.95µs ± 0% -9.07% (p=0.000 n=28+30) AppendInPlace/NoGrow/3Ptr-8 2.66µs ± 0% 2.45µs ± 0% -7.93% (p=0.000 n=29+26) AppendInPlace/NoGrow/4Ptr-8 3.24µs ± 1% 3.02µs ± 1% -6.75% (p=0.000 n=28+30) AppendInPlace/Grow/Byte-8 269ns ± 1% 271ns ± 1% +0.84% (p=0.000 n=30+29) AppendInPlace/Grow/1Ptr-8 275ns ± 1% 280ns ± 1% +1.75% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 384ns ± 0% 391ns ± 0% +1.94% (p=0.000 n=27+30) AppendInPlace/Grow/3Ptr-8 455ns ± 0% 462ns ± 0% +1.43% (p=0.000 n=29+29) AppendInPlace/Grow/4Ptr-8 478ns ± 0% 479ns ± 0% +0.23% (p=0.000 n=30+27) However, for the large no-grow cases, there is still more work to be done. Going from this CL to the non-SSA backend: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 436ns ± 2% 436ns ± 2% ~ (p=0.967 n=27+29) AppendInPlace/NoGrow/1Ptr-8 856ns ± 0% 884ns ± 0% +3.28% (p=0.000 n=29+26) AppendInPlace/NoGrow/2Ptr-8 1.95µs ± 0% 1.56µs ± 0% -20.28% (p=0.000 n=30+29) AppendInPlace/NoGrow/3Ptr-8 2.45µs ± 0% 1.89µs ± 0% -22.88% (p=0.000 n=26+28) AppendInPlace/NoGrow/4Ptr-8 3.02µs ± 1% 2.56µs ± 1% -15.35% (p=0.000 n=30+28) AppendInPlace/Grow/Byte-8 271ns ± 1% 283ns ± 1% +4.56% (p=0.000 n=29+29) AppendInPlace/Grow/1Ptr-8 280ns ± 1% 288ns ± 1% +2.99% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 391ns ± 0% 409ns ± 0% +4.66% (p=0.000 n=30+29) AppendInPlace/Grow/3Ptr-8 462ns ± 0% 481ns ± 0% +4.13% (p=0.000 n=29+30) AppendInPlace/Grow/4Ptr-8 479ns ± 0% 502ns ± 0% +4.81% (p=0.000 n=27+26) New generated code: var x []byte func a() { x = append(x, 1) } "".a t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (a.go:5) TEXT "".a(SB), $72-0 0x0000 00000 (a.go:5) MOVQ (TLS), CX 0x0009 00009 (a.go:5) CMPQ SP, 16(CX) 0x000d 00013 (a.go:5) JLS 190 0x0013 00019 (a.go:5) SUBQ $72, SP 0x0017 00023 (a.go:5) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:5) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:6) MOVQ "".x+16(SB), CX 0x001e 00030 (a.go:6) MOVQ "".x+8(SB), DX 0x0025 00037 (a.go:6) MOVQ "".x(SB), BX 0x002c 00044 (a.go:6) LEAQ 1(DX), BP 0x0030 00048 (a.go:6) CMPQ BP, CX 0x0033 00051 (a.go:6) JGT $0, 73 0x0035 00053 (a.go:6) LEAQ 1(DX), AX 0x0039 00057 (a.go:6) MOVQ AX, "".x+8(SB) 0x0040 00064 (a.go:6) MOVB $1, (BX)(DX*1) 0x0044 00068 (a.go:7) ADDQ $72, SP 0x0048 00072 (a.go:7) RET 0x0049 00073 (a.go:6) LEAQ type.[]uint8(SB), AX 0x0050 00080 (a.go:6) MOVQ AX, (SP) 0x0054 00084 (a.go:6) MOVQ BX, 8(SP) 0x0059 00089 (a.go:6) MOVQ DX, 16(SP) 0x005e 00094 (a.go:6) MOVQ CX, 24(SP) 0x0063 00099 (a.go:6) MOVQ BP, 32(SP) 0x0068 00104 (a.go:6) PCDATA $0, $0 0x0068 00104 (a.go:6) CALL runtime.growslice(SB) 0x006d 00109 (a.go:6) MOVQ 40(SP), CX 0x0072 00114 (a.go:6) MOVQ 48(SP), DX 0x0077 00119 (a.go:6) MOVQ DX, "".autotmp_0+64(SP) 0x007c 00124 (a.go:6) MOVQ 56(SP), BX 0x0081 00129 (a.go:6) MOVQ BX, "".x+16(SB) 0x0088 00136 (a.go:6) MOVL runtime.writeBarrier(SB), AX 0x008e 00142 (a.go:6) TESTB AL, AL 0x0090 00144 (a.go:6) JNE $0, 162 0x0092 00146 (a.go:6) MOVQ CX, "".x(SB) 0x0099 00153 (a.go:6) MOVQ "".x(SB), BX 0x00a0 00160 (a.go:6) JMP 53 0x00a2 00162 (a.go:6) LEAQ "".x(SB), BX 0x00a9 00169 (a.go:6) MOVQ BX, (SP) 0x00ad 00173 (a.go:6) MOVQ CX, 8(SP) 0x00b2 00178 (a.go:6) PCDATA $0, $0 0x00b2 00178 (a.go:6) CALL runtime.writebarrierptr(SB) 0x00b7 00183 (a.go:6) MOVQ "".autotmp_0+64(SP), DX 0x00bc 00188 (a.go:6) JMP 153 0x00be 00190 (a.go:6) NOP 0x00be 00190 (a.go:5) CALL runtime.morestack_noctxt(SB) 0x00c3 00195 (a.go:5) JMP 0 Fixes #14969 again Change-Id: Ia50463b1f506011aad0718a4fef1d4738e43c32d Reviewed-on: https://go-review.googlesource.com/22197 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-18 09:40:30 -07:00
delete(s.vars, &lenVar)
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
return nil
}
cmd/compile: avoid a spill in append fast path Instead of spilling newlen, recalculate it. This removes a spill from the fast path, at the cost of a cheap recalculation on the (rare) growth path. This uses 8 bytes less of stack space. It generates two more bytes of code, but that is due to suboptimal register allocation; see far below. Runtime append microbenchmarks are all over the map, presumably due to incidental code movement. Sample code: func s(b []byte) []byte { b = append(b, 1, 2, 3) return b } Before: "".s t=1 size=160 args=0x30 locals=0x48 0x0000 00000 (append.go:8) TEXT "".s(SB), $72-48 0x0000 00000 (append.go:8) MOVQ (TLS), CX 0x0009 00009 (append.go:8) CMPQ SP, 16(CX) 0x000d 00013 (append.go:8) JLS 149 0x0013 00019 (append.go:8) SUBQ $72, SP 0x0017 00023 (append.go:8) FUNCDATA $0, gclocals·6432f8c6a0d23fa7bee6c5d96f21a92a(SB) 0x0017 00023 (append.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:9) MOVQ "".b+88(FP), CX 0x001c 00028 (append.go:9) LEAQ 3(CX), DX 0x0020 00032 (append.go:9) MOVQ DX, "".autotmp_0+64(SP) 0x0025 00037 (append.go:9) MOVQ "".b+96(FP), BX 0x002a 00042 (append.go:9) CMPQ DX, BX 0x002d 00045 (append.go:9) JGT $0, 86 0x002f 00047 (append.go:8) MOVQ "".b+80(FP), AX 0x0034 00052 (append.go:9) MOVB $1, (AX)(CX*1) 0x0038 00056 (append.go:9) MOVB $2, 1(AX)(CX*1) 0x003d 00061 (append.go:9) MOVB $3, 2(AX)(CX*1) 0x0042 00066 (append.go:10) MOVQ AX, "".~r1+104(FP) 0x0047 00071 (append.go:10) MOVQ DX, "".~r1+112(FP) 0x004c 00076 (append.go:10) MOVQ BX, "".~r1+120(FP) 0x0051 00081 (append.go:10) ADDQ $72, SP 0x0055 00085 (append.go:10) RET 0x0056 00086 (append.go:9) LEAQ type.[]uint8(SB), AX 0x005d 00093 (append.go:9) MOVQ AX, (SP) 0x0061 00097 (append.go:9) MOVQ "".b+80(FP), BP 0x0066 00102 (append.go:9) MOVQ BP, 8(SP) 0x006b 00107 (append.go:9) MOVQ CX, 16(SP) 0x0070 00112 (append.go:9) MOVQ BX, 24(SP) 0x0075 00117 (append.go:9) MOVQ DX, 32(SP) 0x007a 00122 (append.go:9) PCDATA $0, $0 0x007a 00122 (append.go:9) CALL runtime.growslice(SB) 0x007f 00127 (append.go:9) MOVQ 40(SP), AX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:8) MOVQ "".b+88(FP), CX 0x008e 00142 (append.go:9) MOVQ "".autotmp_0+64(SP), DX 0x0093 00147 (append.go:9) JMP 52 0x0095 00149 (append.go:9) NOP 0x0095 00149 (append.go:8) CALL runtime.morestack_noctxt(SB) 0x009a 00154 (append.go:8) JMP 0 After: "".s t=1 size=176 args=0x30 locals=0x40 0x0000 00000 (append.go:8) TEXT "".s(SB), $64-48 0x0000 00000 (append.go:8) MOVQ (TLS), CX 0x0009 00009 (append.go:8) CMPQ SP, 16(CX) 0x000d 00013 (append.go:8) JLS 151 0x0013 00019 (append.go:8) SUBQ $64, SP 0x0017 00023 (append.go:8) FUNCDATA $0, gclocals·6432f8c6a0d23fa7bee6c5d96f21a92a(SB) 0x0017 00023 (append.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:9) MOVQ "".b+80(FP), CX 0x001c 00028 (append.go:9) LEAQ 3(CX), DX 0x0020 00032 (append.go:9) MOVQ "".b+88(FP), BX 0x0025 00037 (append.go:9) CMPQ DX, BX 0x0028 00040 (append.go:9) JGT $0, 81 0x002a 00042 (append.go:8) MOVQ "".b+72(FP), AX 0x002f 00047 (append.go:9) MOVB $1, (AX)(CX*1) 0x0033 00051 (append.go:9) MOVB $2, 1(AX)(CX*1) 0x0038 00056 (append.go:9) MOVB $3, 2(AX)(CX*1) 0x003d 00061 (append.go:10) MOVQ AX, "".~r1+96(FP) 0x0042 00066 (append.go:10) MOVQ DX, "".~r1+104(FP) 0x0047 00071 (append.go:10) MOVQ BX, "".~r1+112(FP) 0x004c 00076 (append.go:10) ADDQ $64, SP 0x0050 00080 (append.go:10) RET 0x0051 00081 (append.go:9) LEAQ type.[]uint8(SB), AX 0x0058 00088 (append.go:9) MOVQ AX, (SP) 0x005c 00092 (append.go:9) MOVQ "".b+72(FP), BP 0x0061 00097 (append.go:9) MOVQ BP, 8(SP) 0x0066 00102 (append.go:9) MOVQ CX, 16(SP) 0x006b 00107 (append.go:9) MOVQ BX, 24(SP) 0x0070 00112 (append.go:9) MOVQ DX, 32(SP) 0x0075 00117 (append.go:9) PCDATA $0, $0 0x0075 00117 (append.go:9) CALL runtime.growslice(SB) 0x007a 00122 (append.go:9) MOVQ 40(SP), AX 0x007f 00127 (append.go:9) MOVQ 48(SP), CX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:9) ADDQ $3, CX 0x008d 00141 (append.go:9) MOVQ CX, DX 0x0090 00144 (append.go:8) MOVQ "".b+80(FP), CX 0x0095 00149 (append.go:9) JMP 47 0x0097 00151 (append.go:9) NOP 0x0097 00151 (append.go:8) CALL runtime.morestack_noctxt(SB) 0x009c 00156 (append.go:8) JMP 0 Observe that in the following sequence, we should use DX directly instead of using CX as a temporary register, which would make the new code a strict improvement on the old: 0x007f 00127 (append.go:9) MOVQ 48(SP), CX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:9) ADDQ $3, CX 0x008d 00141 (append.go:9) MOVQ CX, DX 0x0090 00144 (append.go:8) MOVQ "".b+80(FP), CX Change-Id: I4ee50b18fa53865901d2d7f86c2cbb54c6fa6924 Reviewed-on: https://go-review.googlesource.com/21812 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:08:00 -07:00
delete(s.vars, &newlenVar)
delete(s.vars, &capVar)
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
// make result
return s.newValue3(ssa.OpSliceMake, n.Type, p, nl, c)
}
// condBranch evaluates the boolean expression cond and branches to yes
// if cond is true and no if cond is false.
// This function is intended to handle && and || better than just calling
// s.expr(cond) and branching on the result.
func (s *state) condBranch(cond *Node, yes, no *ssa.Block, likely int8) {
if cond.Op == OANDAND {
mid := s.f.NewBlock(ssa.BlockPlain)
s.stmtList(cond.Ninit)
s.condBranch(cond.Left, mid, no, max8(likely, 0))
s.startBlock(mid)
s.condBranch(cond.Right, yes, no, likely)
return
// Note: if likely==1, then both recursive calls pass 1.
// If likely==-1, then we don't have enough information to decide
// whether the first branch is likely or not. So we pass 0 for
// the likeliness of the first branch.
// TODO: have the frontend give us branch prediction hints for
// OANDAND and OOROR nodes (if it ever has such info).
}
if cond.Op == OOROR {
mid := s.f.NewBlock(ssa.BlockPlain)
s.stmtList(cond.Ninit)
s.condBranch(cond.Left, yes, mid, min8(likely, 0))
s.startBlock(mid)
s.condBranch(cond.Right, yes, no, likely)
return
// Note: if likely==-1, then both recursive calls pass -1.
// If likely==1, then we don't have enough info to decide
// the likelihood of the first branch.
}
if cond.Op == ONOT {
s.stmtList(cond.Ninit)
s.condBranch(cond.Left, no, yes, -likely)
return
}
c := s.expr(cond)
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(c)
b.Likely = ssa.BranchPrediction(likely) // gc and ssa both use -1/0/+1 for likeliness
b.AddEdgeTo(yes)
b.AddEdgeTo(no)
}
type skipMask uint8
const (
skipPtr skipMask = 1 << iota
skipLen
skipCap
)
// assign does left = right.
// Right has already been evaluated to ssa, left has not.
// If deref is true, then we do left = *right instead (and right has already been nil-checked).
// If deref is true and right == nil, just do left = 0.
// skip indicates assignments (at the top level) that can be avoided.
func (s *state) assign(left *Node, right *ssa.Value, deref bool, skip skipMask) {
if left.Op == ONAME && isblank(left) {
return
}
t := left.Type
dowidth(t)
if s.canSSA(left) {
if deref {
s.Fatalf("can SSA LHS %v but not RHS %s", left, right)
}
if left.Op == ODOT {
// We're assigning to a field of an ssa-able value.
// We need to build a new structure with the new value for the
// field we're assigning and the old values for the other fields.
// For instance:
// type T struct {a, b, c int}
// var T x
// x.b = 5
// For the x.b = 5 assignment we want to generate x = T{x.a, 5, x.c}
// Grab information about the structure type.
t := left.Left.Type
nf := t.NumFields()
idx := fieldIdx(left)
// Grab old value of structure.
old := s.expr(left.Left)
// Make new structure.
new := s.newValue0(ssa.StructMakeOp(t.NumFields()), t)
// Add fields as args.
for i := 0; i < nf; i++ {
if i == idx {
new.AddArg(right)
} else {
new.AddArg(s.newValue1I(ssa.OpStructSelect, t.FieldType(i), int64(i), old))
}
}
// Recursively assign the new value we've made to the base of the dot op.
s.assign(left.Left, new, false, 0)
// TODO: do we need to update named values here?
return
}
if left.Op == OINDEX && left.Left.Type.IsArray() {
// We're assigning to an element of an ssa-able array.
// a[i] = v
t := left.Left.Type
n := t.NumElem()
i := s.expr(left.Right) // index
if n == 0 {
// The bounds check must fail. Might as well
// ignore the actual index and just use zeros.
z := s.constInt(Types[TINT], 0)
s.boundsCheck(z, z)
return
}
if n != 1 {
s.Fatalf("assigning to non-1-length array")
}
// Rewrite to a = [1]{v}
i = s.extendIndex(i, panicindex)
s.boundsCheck(i, s.constInt(Types[TINT], 1))
v := s.newValue1(ssa.OpArrayMake1, t, right)
s.assign(left.Left, v, false, 0)
return
}
// Update variable assignment.
s.vars[left] = right
s.addNamedValue(left, right)
return
}
// Left is not ssa-able. Compute its address.
addr := s.addr(left, false)
if left.Op == ONAME && skip == 0 {
s.vars[&memVar] = s.newValue1A(ssa.OpVarDef, ssa.TypeMem, left, s.mem())
}
if isReflectHeaderDataField(left) {
// Package unsafe's documentation says storing pointers into
// reflect.SliceHeader and reflect.StringHeader's Data fields
// is valid, even though they have type uintptr (#19168).
// Mark it pointer type to signal the writebarrier pass to
// insert a write barrier.
t = Types[TUNSAFEPTR]
}
if deref {
// Treat as a mem->mem move.
var store *ssa.Value
if right == nil {
store = s.newValue2I(ssa.OpZero, ssa.TypeMem, t.Size(), addr, s.mem())
} else {
store = s.newValue3I(ssa.OpMove, ssa.TypeMem, t.Size(), addr, right, s.mem())
}
store.Aux = t
s.vars[&memVar] = store
return
}
// Treat as a store.
s.storeType(t, addr, right, skip)
}
// zeroVal returns the zero value for type t.
func (s *state) zeroVal(t *Type) *ssa.Value {
switch {
case t.IsInteger():
switch t.Size() {
case 1:
return s.constInt8(t, 0)
case 2:
return s.constInt16(t, 0)
case 4:
return s.constInt32(t, 0)
case 8:
return s.constInt64(t, 0)
default:
s.Fatalf("bad sized integer type %v", t)
}
case t.IsFloat():
switch t.Size() {
case 4:
return s.constFloat32(t, 0)
case 8:
return s.constFloat64(t, 0)
default:
s.Fatalf("bad sized float type %v", t)
}
case t.IsComplex():
switch t.Size() {
case 8:
z := s.constFloat32(Types[TFLOAT32], 0)
return s.entryNewValue2(ssa.OpComplexMake, t, z, z)
case 16:
z := s.constFloat64(Types[TFLOAT64], 0)
return s.entryNewValue2(ssa.OpComplexMake, t, z, z)
default:
s.Fatalf("bad sized complex type %v", t)
}
case t.IsString():
return s.constEmptyString(t)
case t.IsPtrShaped():
return s.constNil(t)
case t.IsBoolean():
return s.constBool(false)
case t.IsInterface():
return s.constInterface(t)
case t.IsSlice():
return s.constSlice(t)
case t.IsStruct():
n := t.NumFields()
v := s.entryNewValue0(ssa.StructMakeOp(t.NumFields()), t)
for i := 0; i < n; i++ {
v.AddArg(s.zeroVal(t.FieldType(i).(*Type)))
}
return v
case t.IsArray():
switch t.NumElem() {
case 0:
return s.entryNewValue0(ssa.OpArrayMake0, t)
case 1:
return s.entryNewValue1(ssa.OpArrayMake1, t, s.zeroVal(t.Elem()))
}
}
s.Fatalf("zero for type %v not implemented", t)
return nil
}
type callKind int8
const (
callNormal callKind = iota
callDefer
callGo
)
var intrinsics map[intrinsicKey]intrinsicBuilder
// An intrinsicBuilder converts a call node n into an ssa value that
// implements that call as an intrinsic. args is a list of arguments to the func.
type intrinsicBuilder func(s *state, n *Node, args []*ssa.Value) *ssa.Value
type intrinsicKey struct {
arch *sys.Arch
pkg string
fn string
}
func init() {
intrinsics = map[intrinsicKey]intrinsicBuilder{}
var all []*sys.Arch
var i4 []*sys.Arch
var i8 []*sys.Arch
var p4 []*sys.Arch
var p8 []*sys.Arch
for _, a := range sys.Archs {
all = append(all, a)
if a.IntSize == 4 {
i4 = append(i4, a)
} else {
i8 = append(i8, a)
}
if a.PtrSize == 4 {
p4 = append(p4, a)
} else {
p8 = append(p8, a)
}
}
// add adds the intrinsic b for pkg.fn for the given list of architectures.
add := func(pkg, fn string, b intrinsicBuilder, archs ...*sys.Arch) {
for _, a := range archs {
intrinsics[intrinsicKey{a, pkg, fn}] = b
}
}
// addF does the same as add but operates on architecture families.
addF := func(pkg, fn string, b intrinsicBuilder, archFamilies ...sys.ArchFamily) {
m := 0
for _, f := range archFamilies {
if f >= 32 {
panic("too many architecture families")
}
m |= 1 << uint(f)
}
for _, a := range all {
if m>>uint(a.Family)&1 != 0 {
intrinsics[intrinsicKey{a, pkg, fn}] = b
}
}
}
// alias defines pkg.fn = pkg2.fn2 for all architectures in archs for which pkg2.fn2 exists.
alias := func(pkg, fn, pkg2, fn2 string, archs ...*sys.Arch) {
for _, a := range archs {
if b, ok := intrinsics[intrinsicKey{a, pkg2, fn2}]; ok {
intrinsics[intrinsicKey{a, pkg, fn}] = b
}
}
}
/******** runtime ********/
if !instrumenting {
add("runtime", "slicebytetostringtmp",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
// Compiler frontend optimizations emit OARRAYBYTESTRTMP nodes
// for the backend instead of slicebytetostringtmp calls
// when not instrumenting.
slice := args[0]
ptr := s.newValue1(ssa.OpSlicePtr, ptrto(Types[TUINT8]), slice)
len := s.newValue1(ssa.OpSliceLen, Types[TINT], slice)
return s.newValue2(ssa.OpStringMake, n.Type, ptr, len)
},
all...)
}
add("runtime", "KeepAlive",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
data := s.newValue1(ssa.OpIData, ptrto(Types[TUINT8]), args[0])
s.vars[&memVar] = s.newValue2(ssa.OpKeepAlive, ssa.TypeMem, data, s.mem())
return nil
},
all...)
/******** runtime/internal/sys ********/
addF("runtime/internal/sys", "Ctz32",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpCtz32, Types[TINT], args[0])
},
sys.AMD64, sys.ARM64, sys.ARM, sys.S390X, sys.MIPS)
addF("runtime/internal/sys", "Ctz64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpCtz64, Types[TINT], args[0])
},
sys.AMD64, sys.ARM64, sys.ARM, sys.S390X, sys.MIPS)
addF("runtime/internal/sys", "Bswap32",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpBswap32, Types[TUINT32], args[0])
},
sys.AMD64, sys.ARM64, sys.ARM, sys.S390X)
addF("runtime/internal/sys", "Bswap64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpBswap64, Types[TUINT64], args[0])
},
sys.AMD64, sys.ARM64, sys.ARM, sys.S390X)
/******** runtime/internal/atomic ********/
addF("runtime/internal/atomic", "Load",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
v := s.newValue2(ssa.OpAtomicLoad32, ssa.MakeTuple(Types[TUINT32], ssa.TypeMem), args[0], s.mem())
s.vars[&memVar] = s.newValue1(ssa.OpSelect1, ssa.TypeMem, v)
return s.newValue1(ssa.OpSelect0, Types[TUINT32], v)
},
sys.AMD64, sys.ARM64, sys.S390X, sys.MIPS, sys.PPC64)
addF("runtime/internal/atomic", "Load64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
v := s.newValue2(ssa.OpAtomicLoad64, ssa.MakeTuple(Types[TUINT64], ssa.TypeMem), args[0], s.mem())
s.vars[&memVar] = s.newValue1(ssa.OpSelect1, ssa.TypeMem, v)
return s.newValue1(ssa.OpSelect0, Types[TUINT64], v)
},
sys.AMD64, sys.ARM64, sys.S390X, sys.PPC64)
addF("runtime/internal/atomic", "Loadp",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
v := s.newValue2(ssa.OpAtomicLoadPtr, ssa.MakeTuple(ptrto(Types[TUINT8]), ssa.TypeMem), args[0], s.mem())
s.vars[&memVar] = s.newValue1(ssa.OpSelect1, ssa.TypeMem, v)
return s.newValue1(ssa.OpSelect0, ptrto(Types[TUINT8]), v)
},
sys.AMD64, sys.ARM64, sys.S390X, sys.MIPS, sys.PPC64)
addF("runtime/internal/atomic", "Store",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
s.vars[&memVar] = s.newValue3(ssa.OpAtomicStore32, ssa.TypeMem, args[0], args[1], s.mem())
return nil
},
sys.AMD64, sys.ARM64, sys.S390X, sys.MIPS, sys.PPC64)
addF("runtime/internal/atomic", "Store64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
s.vars[&memVar] = s.newValue3(ssa.OpAtomicStore64, ssa.TypeMem, args[0], args[1], s.mem())
return nil
},
sys.AMD64, sys.ARM64, sys.S390X, sys.PPC64)
addF("runtime/internal/atomic", "StorepNoWB",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
s.vars[&memVar] = s.newValue3(ssa.OpAtomicStorePtrNoWB, ssa.TypeMem, args[0], args[1], s.mem())
return nil
},
sys.AMD64, sys.ARM64, sys.S390X, sys.MIPS)
addF("runtime/internal/atomic", "Xchg",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
v := s.newValue3(ssa.OpAtomicExchange32, ssa.MakeTuple(Types[TUINT32], ssa.TypeMem), args[0], args[1], s.mem())
s.vars[&memVar] = s.newValue1(ssa.OpSelect1, ssa.TypeMem, v)
return s.newValue1(ssa.OpSelect0, Types[TUINT32], v)
},
sys.AMD64, sys.ARM64, sys.S390X, sys.MIPS, sys.PPC64)
addF("runtime/internal/atomic", "Xchg64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
v := s.newValue3(ssa.OpAtomicExchange64, ssa.MakeTuple(Types[TUINT64], ssa.TypeMem), args[0], args[1], s.mem())
s.vars[&memVar] = s.newValue1(ssa.OpSelect1, ssa.TypeMem, v)
return s.newValue1(ssa.OpSelect0, Types[TUINT64], v)
},
sys.AMD64, sys.ARM64, sys.S390X, sys.PPC64)
addF("runtime/internal/atomic", "Xadd",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
v := s.newValue3(ssa.OpAtomicAdd32, ssa.MakeTuple(Types[TUINT32], ssa.TypeMem), args[0], args[1], s.mem())
s.vars[&memVar] = s.newValue1(ssa.OpSelect1, ssa.TypeMem, v)
return s.newValue1(ssa.OpSelect0, Types[TUINT32], v)
},
sys.AMD64, sys.ARM64, sys.S390X, sys.MIPS, sys.PPC64)
addF("runtime/internal/atomic", "Xadd64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
v := s.newValue3(ssa.OpAtomicAdd64, ssa.MakeTuple(Types[TUINT64], ssa.TypeMem), args[0], args[1], s.mem())
s.vars[&memVar] = s.newValue1(ssa.OpSelect1, ssa.TypeMem, v)
return s.newValue1(ssa.OpSelect0, Types[TUINT64], v)
},
sys.AMD64, sys.ARM64, sys.S390X, sys.PPC64)
addF("runtime/internal/atomic", "Cas",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
v := s.newValue4(ssa.OpAtomicCompareAndSwap32, ssa.MakeTuple(Types[TBOOL], ssa.TypeMem), args[0], args[1], args[2], s.mem())
s.vars[&memVar] = s.newValue1(ssa.OpSelect1, ssa.TypeMem, v)
return s.newValue1(ssa.OpSelect0, Types[TBOOL], v)
},
sys.AMD64, sys.ARM64, sys.S390X, sys.MIPS, sys.PPC64)
addF("runtime/internal/atomic", "Cas64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
v := s.newValue4(ssa.OpAtomicCompareAndSwap64, ssa.MakeTuple(Types[TBOOL], ssa.TypeMem), args[0], args[1], args[2], s.mem())
s.vars[&memVar] = s.newValue1(ssa.OpSelect1, ssa.TypeMem, v)
return s.newValue1(ssa.OpSelect0, Types[TBOOL], v)
},
sys.AMD64, sys.ARM64, sys.S390X, sys.PPC64)
addF("runtime/internal/atomic", "And8",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
s.vars[&memVar] = s.newValue3(ssa.OpAtomicAnd8, ssa.TypeMem, args[0], args[1], s.mem())
return nil
},
sys.AMD64, sys.ARM64, sys.MIPS, sys.PPC64)
addF("runtime/internal/atomic", "Or8",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
s.vars[&memVar] = s.newValue3(ssa.OpAtomicOr8, ssa.TypeMem, args[0], args[1], s.mem())
return nil
},
sys.AMD64, sys.ARM64, sys.MIPS, sys.PPC64)
alias("runtime/internal/atomic", "Loadint64", "runtime/internal/atomic", "Load64", all...)
alias("runtime/internal/atomic", "Xaddint64", "runtime/internal/atomic", "Xadd64", all...)
alias("runtime/internal/atomic", "Loaduint", "runtime/internal/atomic", "Load", i4...)
alias("runtime/internal/atomic", "Loaduint", "runtime/internal/atomic", "Load64", i8...)
alias("runtime/internal/atomic", "Loaduintptr", "runtime/internal/atomic", "Load", p4...)
alias("runtime/internal/atomic", "Loaduintptr", "runtime/internal/atomic", "Load64", p8...)
alias("runtime/internal/atomic", "Storeuintptr", "runtime/internal/atomic", "Store", p4...)
alias("runtime/internal/atomic", "Storeuintptr", "runtime/internal/atomic", "Store64", p8...)
alias("runtime/internal/atomic", "Xchguintptr", "runtime/internal/atomic", "Xchg", p4...)
alias("runtime/internal/atomic", "Xchguintptr", "runtime/internal/atomic", "Xchg64", p8...)
alias("runtime/internal/atomic", "Xadduintptr", "runtime/internal/atomic", "Xadd", p4...)
alias("runtime/internal/atomic", "Xadduintptr", "runtime/internal/atomic", "Xadd64", p8...)
alias("runtime/internal/atomic", "Casuintptr", "runtime/internal/atomic", "Cas", p4...)
alias("runtime/internal/atomic", "Casuintptr", "runtime/internal/atomic", "Cas64", p8...)
alias("runtime/internal/atomic", "Casp1", "runtime/internal/atomic", "Cas", p4...)
alias("runtime/internal/atomic", "Casp1", "runtime/internal/atomic", "Cas64", p8...)
/******** math ********/
addF("math", "Sqrt",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpSqrt, Types[TFLOAT64], args[0])
},
sys.AMD64, sys.ARM, sys.ARM64, sys.MIPS, sys.PPC64, sys.S390X)
/******** math/bits ********/
addF("math/bits", "TrailingZeros64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpCtz64, Types[TINT], args[0])
},
sys.AMD64, sys.ARM64, sys.ARM, sys.S390X, sys.MIPS)
addF("math/bits", "TrailingZeros32",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpCtz32, Types[TINT], args[0])
},
sys.AMD64, sys.ARM64, sys.ARM, sys.S390X, sys.MIPS)
addF("math/bits", "TrailingZeros16",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
x := s.newValue1(ssa.OpZeroExt16to32, Types[TUINT32], args[0])
c := s.constInt32(Types[TUINT32], 1<<16)
y := s.newValue2(ssa.OpOr32, Types[TUINT32], x, c)
return s.newValue1(ssa.OpCtz32, Types[TINT], y)
},
sys.ARM, sys.MIPS)
addF("math/bits", "TrailingZeros16",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
x := s.newValue1(ssa.OpZeroExt16to64, Types[TUINT64], args[0])
c := s.constInt64(Types[TUINT64], 1<<16)
y := s.newValue2(ssa.OpOr64, Types[TUINT64], x, c)
return s.newValue1(ssa.OpCtz64, Types[TINT], y)
},
sys.AMD64, sys.ARM64, sys.S390X)
addF("math/bits", "TrailingZeros8",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
x := s.newValue1(ssa.OpZeroExt8to32, Types[TUINT32], args[0])
c := s.constInt32(Types[TUINT32], 1<<8)
y := s.newValue2(ssa.OpOr32, Types[TUINT32], x, c)
return s.newValue1(ssa.OpCtz32, Types[TINT], y)
},
sys.ARM, sys.MIPS)
addF("math/bits", "TrailingZeros8",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
x := s.newValue1(ssa.OpZeroExt8to64, Types[TUINT64], args[0])
c := s.constInt64(Types[TUINT64], 1<<8)
y := s.newValue2(ssa.OpOr64, Types[TUINT64], x, c)
return s.newValue1(ssa.OpCtz64, Types[TINT], y)
},
sys.AMD64, sys.ARM64, sys.S390X)
alias("math/bits", "ReverseBytes64", "runtime/internal/sys", "Bswap64", all...)
alias("math/bits", "ReverseBytes32", "runtime/internal/sys", "Bswap32", all...)
// ReverseBytes inlines correctly, no need to intrinsify it.
// ReverseBytes16 lowers to a rotate, no need for anything special here.
addF("math/bits", "Len64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpBitLen64, Types[TINT], args[0])
},
sys.AMD64, sys.ARM64, sys.ARM, sys.S390X, sys.MIPS)
addF("math/bits", "Len32",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
if s.config.IntSize == 4 {
return s.newValue1(ssa.OpBitLen32, Types[TINT], args[0])
}
x := s.newValue1(ssa.OpZeroExt32to64, Types[TUINT64], args[0])
return s.newValue1(ssa.OpBitLen64, Types[TINT], x)
},
sys.AMD64, sys.ARM64, sys.ARM, sys.S390X, sys.MIPS)
addF("math/bits", "Len16",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
if s.config.IntSize == 4 {
x := s.newValue1(ssa.OpZeroExt16to32, Types[TUINT32], args[0])
return s.newValue1(ssa.OpBitLen32, Types[TINT], x)
}
x := s.newValue1(ssa.OpZeroExt16to64, Types[TUINT64], args[0])
return s.newValue1(ssa.OpBitLen64, Types[TINT], x)
},
sys.AMD64, sys.ARM64, sys.ARM, sys.S390X, sys.MIPS)
// Note: disabled on AMD64 because the Go code is faster!
addF("math/bits", "Len8",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
if s.config.IntSize == 4 {
x := s.newValue1(ssa.OpZeroExt8to32, Types[TUINT32], args[0])
return s.newValue1(ssa.OpBitLen32, Types[TINT], x)
}
x := s.newValue1(ssa.OpZeroExt8to64, Types[TUINT64], args[0])
return s.newValue1(ssa.OpBitLen64, Types[TINT], x)
},
sys.ARM64, sys.ARM, sys.S390X, sys.MIPS)
addF("math/bits", "Len",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
if s.config.IntSize == 4 {
return s.newValue1(ssa.OpBitLen32, Types[TINT], args[0])
}
return s.newValue1(ssa.OpBitLen64, Types[TINT], args[0])
},
sys.AMD64, sys.ARM64, sys.ARM, sys.S390X, sys.MIPS)
// LeadingZeros is handled because it trivially calls Len.
/******** sync/atomic ********/
// Note: these are disabled by flag_race in findIntrinsic below.
alias("sync/atomic", "LoadInt32", "runtime/internal/atomic", "Load", all...)
alias("sync/atomic", "LoadInt64", "runtime/internal/atomic", "Load64", all...)
alias("sync/atomic", "LoadPointer", "runtime/internal/atomic", "Loadp", all...)
alias("sync/atomic", "LoadUint32", "runtime/internal/atomic", "Load", all...)
alias("sync/atomic", "LoadUint64", "runtime/internal/atomic", "Load64", all...)
alias("sync/atomic", "LoadUintptr", "runtime/internal/atomic", "Load", p4...)
alias("sync/atomic", "LoadUintptr", "runtime/internal/atomic", "Load64", p8...)
alias("sync/atomic", "StoreInt32", "runtime/internal/atomic", "Store", all...)
alias("sync/atomic", "StoreInt64", "runtime/internal/atomic", "Store64", all...)
// Note: not StorePointer, that needs a write barrier. Same below for {CompareAnd}Swap.
alias("sync/atomic", "StoreUint32", "runtime/internal/atomic", "Store", all...)
alias("sync/atomic", "StoreUint64", "runtime/internal/atomic", "Store64", all...)
alias("sync/atomic", "StoreUintptr", "runtime/internal/atomic", "Store", p4...)
alias("sync/atomic", "StoreUintptr", "runtime/internal/atomic", "Store64", p8...)
alias("sync/atomic", "SwapInt32", "runtime/internal/atomic", "Xchg", all...)
alias("sync/atomic", "SwapInt64", "runtime/internal/atomic", "Xchg64", all...)
alias("sync/atomic", "SwapUint32", "runtime/internal/atomic", "Xchg", all...)
alias("sync/atomic", "SwapUint64", "runtime/internal/atomic", "Xchg64", all...)
alias("sync/atomic", "SwapUintptr", "runtime/internal/atomic", "Xchg", p4...)
alias("sync/atomic", "SwapUintptr", "runtime/internal/atomic", "Xchg64", p8...)
alias("sync/atomic", "CompareAndSwapInt32", "runtime/internal/atomic", "Cas", all...)
alias("sync/atomic", "CompareAndSwapInt64", "runtime/internal/atomic", "Cas64", all...)
alias("sync/atomic", "CompareAndSwapUint32", "runtime/internal/atomic", "Cas", all...)
alias("sync/atomic", "CompareAndSwapUint64", "runtime/internal/atomic", "Cas64", all...)
alias("sync/atomic", "CompareAndSwapUintptr", "runtime/internal/atomic", "Cas", p4...)
alias("sync/atomic", "CompareAndSwapUintptr", "runtime/internal/atomic", "Cas64", p8...)
alias("sync/atomic", "AddInt32", "runtime/internal/atomic", "Xadd", all...)
alias("sync/atomic", "AddInt64", "runtime/internal/atomic", "Xadd64", all...)
alias("sync/atomic", "AddUint32", "runtime/internal/atomic", "Xadd", all...)
alias("sync/atomic", "AddUint64", "runtime/internal/atomic", "Xadd64", all...)
alias("sync/atomic", "AddUintptr", "runtime/internal/atomic", "Xadd", p4...)
alias("sync/atomic", "AddUintptr", "runtime/internal/atomic", "Xadd64", p8...)
/******** math/big ********/
add("math/big", "mulWW",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue2(ssa.OpMul64uhilo, ssa.MakeTuple(Types[TUINT64], Types[TUINT64]), args[0], args[1])
},
sys.ArchAMD64)
add("math/big", "divWW",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue3(ssa.OpDiv128u, ssa.MakeTuple(Types[TUINT64], Types[TUINT64]), args[0], args[1], args[2])
},
sys.ArchAMD64)
}
// findIntrinsic returns a function which builds the SSA equivalent of the
// function identified by the symbol sym. If sym is not an intrinsic call, returns nil.
func findIntrinsic(sym *Sym) intrinsicBuilder {
if ssa.IntrinsicsDisable {
return nil
}
if sym == nil || sym.Pkg == nil {
return nil
}
pkg := sym.Pkg.Path
if sym.Pkg == localpkg {
pkg = myimportpath
}
if flag_race && pkg == "sync/atomic" {
// The race detector needs to be able to intercept these calls.
// We can't intrinsify them.
return nil
}
fn := sym.Name
return intrinsics[intrinsicKey{Thearch.LinkArch.Arch, pkg, fn}]
}
func isIntrinsicCall(n *Node) bool {
if n == nil || n.Left == nil {
return false
}
return findIntrinsic(n.Left.Sym) != nil
}
// intrinsicCall converts a call to a recognized intrinsic function into the intrinsic SSA operation.
func (s *state) intrinsicCall(n *Node) *ssa.Value {
v := findIntrinsic(n.Left.Sym)(s, n, s.intrinsicArgs(n))
if ssa.IntrinsicsDebug > 0 {
x := v
if x == nil {
x = s.mem()
}
if x.Op == ssa.OpSelect0 || x.Op == ssa.OpSelect1 {
x = x.Args[0]
}
Warnl(n.Pos, "intrinsic substitution for %v with %s", n.Left.Sym.Name, x.LongString())
}
return v
}
type callArg struct {
offset int64
v *ssa.Value
}
type byOffset []callArg
func (x byOffset) Len() int { return len(x) }
func (x byOffset) Swap(i, j int) { x[i], x[j] = x[j], x[i] }
func (x byOffset) Less(i, j int) bool {
return x[i].offset < x[j].offset
}
// intrinsicArgs extracts args from n, evaluates them to SSA values, and returns them.
func (s *state) intrinsicArgs(n *Node) []*ssa.Value {
// This code is complicated because of how walk transforms calls. For a call node,
// each entry in n.List is either an assignment to OINDREGSP which actually
// stores an arg, or an assignment to a temporary which computes an arg
// which is later assigned.
// The args can also be out of order.
// TODO: when walk goes away someday, this code can go away also.
var args []callArg
temps := map[*Node]*ssa.Value{}
for _, a := range n.List.Slice() {
if a.Op != OAS {
s.Fatalf("non-assignment as a function argument %s", opnames[a.Op])
}
l, r := a.Left, a.Right
switch l.Op {
case ONAME:
// Evaluate and store to "temporary".
// Walk ensures these temporaries are dead outside of n.
temps[l] = s.expr(r)
case OINDREGSP:
// Store a value to an argument slot.
var v *ssa.Value
if x, ok := temps[r]; ok {
// This is a previously computed temporary.
v = x
} else {
// This is an explicit value; evaluate it.
v = s.expr(r)
}
args = append(args, callArg{l.Xoffset, v})
default:
s.Fatalf("function argument assignment target not allowed: %s", opnames[l.Op])
}
}
sort.Sort(byOffset(args))
res := make([]*ssa.Value, len(args))
for i, a := range args {
res[i] = a.v
}
return res
}
// Calls the function n using the specified call type.
// Returns the address of the return value (or nil if none).
func (s *state) call(n *Node, k callKind) *ssa.Value {
var sym *Sym // target symbol (if static)
var closure *ssa.Value // ptr to closure to run (if dynamic)
var codeptr *ssa.Value // ptr to target code (if dynamic)
var rcvr *ssa.Value // receiver to set
fn := n.Left
switch n.Op {
case OCALLFUNC:
if k == callNormal && fn.Op == ONAME && fn.Class == PFUNC {
sym = fn.Sym
break
}
closure = s.expr(fn)
case OCALLMETH:
if fn.Op != ODOTMETH {
Fatalf("OCALLMETH: n.Left not an ODOTMETH: %v", fn)
}
if k == callNormal {
sym = fn.Sym
break
}
// Make a name n2 for the function.
// fn.Sym might be sync.(*Mutex).Unlock.
// Make a PFUNC node out of that, then evaluate it.
// We get back an SSA value representing &sync.(*Mutex).Unlock·f.
// We can then pass that to defer or go.
n2 := newname(fn.Sym)
n2.Class = PFUNC
n2.Pos = fn.Pos
n2.Type = Types[TUINT8] // dummy type for a static closure. Could use runtime.funcval if we had it.
closure = s.expr(n2)
// Note: receiver is already assigned in n.List, so we don't
// want to set it here.
case OCALLINTER:
if fn.Op != ODOTINTER {
Fatalf("OCALLINTER: n.Left not an ODOTINTER: %v", fn.Op)
}
i := s.expr(fn.Left)
itab := s.newValue1(ssa.OpITab, Types[TUINTPTR], i)
if k != callNormal {
s.nilCheck(itab)
}
itabidx := fn.Xoffset + 3*int64(Widthptr) + 8 // offset of fun field in runtime.itab
itab = s.newValue1I(ssa.OpOffPtr, ptrto(Types[TUINTPTR]), itabidx, itab)
if k == callNormal {
codeptr = s.newValue2(ssa.OpLoad, Types[TUINTPTR], itab, s.mem())
} else {
closure = itab
}
rcvr = s.newValue1(ssa.OpIData, Types[TUINTPTR], i)
}
dowidth(fn.Type)
stksize := fn.Type.ArgWidth() // includes receiver
// Run all argument assignments. The arg slots have already
// been offset by the appropriate amount (+2*widthptr for go/defer,
// +widthptr for interface calls).
// For OCALLMETH, the receiver is set in these statements.
s.stmtList(n.List)
// Set receiver (for interface calls)
if rcvr != nil {
argStart := Ctxt.FixedFrameSize()
if k != callNormal {
argStart += int64(2 * Widthptr)
}
cmd/compile: add OpOffPtr [c] SP to constant cache They accounted for almost 30% of all CSE'd values. By never creating the duplicates in the first place, we reduce the high water mark of Value IDs, which in turn makes all SSA phases cheaper, particularly regalloc. name old time/op new time/op delta Template 200ms ± 3% 198ms ± 4% -0.87% (p=0.016 n=50+49) Unicode 86.9ms ± 2% 85.5ms ± 3% -1.56% (p=0.000 n=49+50) GoTypes 553ms ± 4% 551ms ± 4% ~ (p=0.183 n=50+49) SSA 3.97s ± 3% 3.93s ± 2% -1.06% (p=0.000 n=48+48) Flate 124ms ± 4% 124ms ± 3% ~ (p=0.545 n=48+50) GoParser 146ms ± 4% 146ms ± 4% ~ (p=0.810 n=49+49) Reflect 357ms ± 3% 355ms ± 3% -0.59% (p=0.049 n=50+48) Tar 106ms ± 4% 107ms ± 5% ~ (p=0.454 n=49+50) XML 203ms ± 4% 203ms ± 4% ~ (p=0.726 n=48+50) name old user-ns/op new user-ns/op delta Template 237M ± 3% 235M ± 4% ~ (p=0.208 n=47+48) Unicode 111M ± 4% 108M ± 9% -2.50% (p=0.000 n=47+50) GoTypes 736M ± 5% 729M ± 4% -0.95% (p=0.017 n=50+46) SSA 5.73G ± 4% 5.74G ± 4% ~ (p=0.765 n=50+50) Flate 150M ± 5% 148M ± 6% -0.89% (p=0.045 n=48+47) GoParser 180M ± 5% 178M ± 7% -1.34% (p=0.012 n=50+50) Reflect 450M ± 4% 444M ± 4% -1.40% (p=0.000 n=50+49) Tar 124M ± 7% 123M ± 7% ~ (p=0.092 n=50+50) XML 248M ± 6% 245M ± 5% ~ (p=0.057 n=50+50) name old alloc/op new alloc/op delta Template 39.4MB ± 0% 39.3MB ± 0% -0.37% (p=0.000 n=50+50) Unicode 30.9MB ± 0% 30.9MB ± 0% -0.27% (p=0.000 n=48+50) GoTypes 114MB ± 0% 113MB ± 0% -1.03% (p=0.000 n=50+49) SSA 882MB ± 0% 865MB ± 0% -1.95% (p=0.000 n=49+49) Flate 25.8MB ± 0% 25.7MB ± 0% -0.21% (p=0.000 n=50+50) GoParser 31.7MB ± 0% 31.6MB ± 0% -0.33% (p=0.000 n=50+50) Reflect 79.7MB ± 0% 79.3MB ± 0% -0.49% (p=0.000 n=44+49) Tar 27.2MB ± 0% 27.1MB ± 0% -0.31% (p=0.000 n=50+50) XML 42.7MB ± 0% 42.3MB ± 0% -1.05% (p=0.000 n=48+49) name old allocs/op new allocs/op delta Template 379k ± 1% 380k ± 1% +0.26% (p=0.000 n=50+50) Unicode 324k ± 1% 324k ± 1% ~ (p=0.964 n=49+50) GoTypes 1.14M ± 0% 1.15M ± 0% +0.14% (p=0.000 n=50+49) SSA 7.89M ± 0% 7.89M ± 0% -0.05% (p=0.000 n=49+49) Flate 240k ± 1% 241k ± 1% +0.27% (p=0.001 n=50+50) GoParser 310k ± 1% 311k ± 1% +0.48% (p=0.000 n=50+49) Reflect 1.00M ± 0% 1.00M ± 0% +0.17% (p=0.000 n=48+50) Tar 254k ± 1% 255k ± 1% +0.23% (p=0.005 n=50+50) XML 395k ± 1% 395k ± 1% +0.19% (p=0.002 n=49+47) Change-Id: Iaa8f5f37e23bd81983409f7359f9dcd4dfe2961f Reviewed-on: https://go-review.googlesource.com/38003 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-08 12:50:00 -08:00
addr := s.constOffPtrSP(ptrto(Types[TUINTPTR]), argStart)
s.vars[&memVar] = s.newValue3A(ssa.OpStore, ssa.TypeMem, Types[TUINTPTR], addr, rcvr, s.mem())
}
// Defer/go args
if k != callNormal {
// Write argsize and closure (args to Newproc/Deferproc).
argStart := Ctxt.FixedFrameSize()
argsize := s.constInt32(Types[TUINT32], int32(stksize))
cmd/compile: add OpOffPtr [c] SP to constant cache They accounted for almost 30% of all CSE'd values. By never creating the duplicates in the first place, we reduce the high water mark of Value IDs, which in turn makes all SSA phases cheaper, particularly regalloc. name old time/op new time/op delta Template 200ms ± 3% 198ms ± 4% -0.87% (p=0.016 n=50+49) Unicode 86.9ms ± 2% 85.5ms ± 3% -1.56% (p=0.000 n=49+50) GoTypes 553ms ± 4% 551ms ± 4% ~ (p=0.183 n=50+49) SSA 3.97s ± 3% 3.93s ± 2% -1.06% (p=0.000 n=48+48) Flate 124ms ± 4% 124ms ± 3% ~ (p=0.545 n=48+50) GoParser 146ms ± 4% 146ms ± 4% ~ (p=0.810 n=49+49) Reflect 357ms ± 3% 355ms ± 3% -0.59% (p=0.049 n=50+48) Tar 106ms ± 4% 107ms ± 5% ~ (p=0.454 n=49+50) XML 203ms ± 4% 203ms ± 4% ~ (p=0.726 n=48+50) name old user-ns/op new user-ns/op delta Template 237M ± 3% 235M ± 4% ~ (p=0.208 n=47+48) Unicode 111M ± 4% 108M ± 9% -2.50% (p=0.000 n=47+50) GoTypes 736M ± 5% 729M ± 4% -0.95% (p=0.017 n=50+46) SSA 5.73G ± 4% 5.74G ± 4% ~ (p=0.765 n=50+50) Flate 150M ± 5% 148M ± 6% -0.89% (p=0.045 n=48+47) GoParser 180M ± 5% 178M ± 7% -1.34% (p=0.012 n=50+50) Reflect 450M ± 4% 444M ± 4% -1.40% (p=0.000 n=50+49) Tar 124M ± 7% 123M ± 7% ~ (p=0.092 n=50+50) XML 248M ± 6% 245M ± 5% ~ (p=0.057 n=50+50) name old alloc/op new alloc/op delta Template 39.4MB ± 0% 39.3MB ± 0% -0.37% (p=0.000 n=50+50) Unicode 30.9MB ± 0% 30.9MB ± 0% -0.27% (p=0.000 n=48+50) GoTypes 114MB ± 0% 113MB ± 0% -1.03% (p=0.000 n=50+49) SSA 882MB ± 0% 865MB ± 0% -1.95% (p=0.000 n=49+49) Flate 25.8MB ± 0% 25.7MB ± 0% -0.21% (p=0.000 n=50+50) GoParser 31.7MB ± 0% 31.6MB ± 0% -0.33% (p=0.000 n=50+50) Reflect 79.7MB ± 0% 79.3MB ± 0% -0.49% (p=0.000 n=44+49) Tar 27.2MB ± 0% 27.1MB ± 0% -0.31% (p=0.000 n=50+50) XML 42.7MB ± 0% 42.3MB ± 0% -1.05% (p=0.000 n=48+49) name old allocs/op new allocs/op delta Template 379k ± 1% 380k ± 1% +0.26% (p=0.000 n=50+50) Unicode 324k ± 1% 324k ± 1% ~ (p=0.964 n=49+50) GoTypes 1.14M ± 0% 1.15M ± 0% +0.14% (p=0.000 n=50+49) SSA 7.89M ± 0% 7.89M ± 0% -0.05% (p=0.000 n=49+49) Flate 240k ± 1% 241k ± 1% +0.27% (p=0.001 n=50+50) GoParser 310k ± 1% 311k ± 1% +0.48% (p=0.000 n=50+49) Reflect 1.00M ± 0% 1.00M ± 0% +0.17% (p=0.000 n=48+50) Tar 254k ± 1% 255k ± 1% +0.23% (p=0.005 n=50+50) XML 395k ± 1% 395k ± 1% +0.19% (p=0.002 n=49+47) Change-Id: Iaa8f5f37e23bd81983409f7359f9dcd4dfe2961f Reviewed-on: https://go-review.googlesource.com/38003 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-08 12:50:00 -08:00
addr := s.constOffPtrSP(ptrto(Types[TUINT32]), argStart)
s.vars[&memVar] = s.newValue3A(ssa.OpStore, ssa.TypeMem, Types[TUINT32], addr, argsize, s.mem())
cmd/compile: add OpOffPtr [c] SP to constant cache They accounted for almost 30% of all CSE'd values. By never creating the duplicates in the first place, we reduce the high water mark of Value IDs, which in turn makes all SSA phases cheaper, particularly regalloc. name old time/op new time/op delta Template 200ms ± 3% 198ms ± 4% -0.87% (p=0.016 n=50+49) Unicode 86.9ms ± 2% 85.5ms ± 3% -1.56% (p=0.000 n=49+50) GoTypes 553ms ± 4% 551ms ± 4% ~ (p=0.183 n=50+49) SSA 3.97s ± 3% 3.93s ± 2% -1.06% (p=0.000 n=48+48) Flate 124ms ± 4% 124ms ± 3% ~ (p=0.545 n=48+50) GoParser 146ms ± 4% 146ms ± 4% ~ (p=0.810 n=49+49) Reflect 357ms ± 3% 355ms ± 3% -0.59% (p=0.049 n=50+48) Tar 106ms ± 4% 107ms ± 5% ~ (p=0.454 n=49+50) XML 203ms ± 4% 203ms ± 4% ~ (p=0.726 n=48+50) name old user-ns/op new user-ns/op delta Template 237M ± 3% 235M ± 4% ~ (p=0.208 n=47+48) Unicode 111M ± 4% 108M ± 9% -2.50% (p=0.000 n=47+50) GoTypes 736M ± 5% 729M ± 4% -0.95% (p=0.017 n=50+46) SSA 5.73G ± 4% 5.74G ± 4% ~ (p=0.765 n=50+50) Flate 150M ± 5% 148M ± 6% -0.89% (p=0.045 n=48+47) GoParser 180M ± 5% 178M ± 7% -1.34% (p=0.012 n=50+50) Reflect 450M ± 4% 444M ± 4% -1.40% (p=0.000 n=50+49) Tar 124M ± 7% 123M ± 7% ~ (p=0.092 n=50+50) XML 248M ± 6% 245M ± 5% ~ (p=0.057 n=50+50) name old alloc/op new alloc/op delta Template 39.4MB ± 0% 39.3MB ± 0% -0.37% (p=0.000 n=50+50) Unicode 30.9MB ± 0% 30.9MB ± 0% -0.27% (p=0.000 n=48+50) GoTypes 114MB ± 0% 113MB ± 0% -1.03% (p=0.000 n=50+49) SSA 882MB ± 0% 865MB ± 0% -1.95% (p=0.000 n=49+49) Flate 25.8MB ± 0% 25.7MB ± 0% -0.21% (p=0.000 n=50+50) GoParser 31.7MB ± 0% 31.6MB ± 0% -0.33% (p=0.000 n=50+50) Reflect 79.7MB ± 0% 79.3MB ± 0% -0.49% (p=0.000 n=44+49) Tar 27.2MB ± 0% 27.1MB ± 0% -0.31% (p=0.000 n=50+50) XML 42.7MB ± 0% 42.3MB ± 0% -1.05% (p=0.000 n=48+49) name old allocs/op new allocs/op delta Template 379k ± 1% 380k ± 1% +0.26% (p=0.000 n=50+50) Unicode 324k ± 1% 324k ± 1% ~ (p=0.964 n=49+50) GoTypes 1.14M ± 0% 1.15M ± 0% +0.14% (p=0.000 n=50+49) SSA 7.89M ± 0% 7.89M ± 0% -0.05% (p=0.000 n=49+49) Flate 240k ± 1% 241k ± 1% +0.27% (p=0.001 n=50+50) GoParser 310k ± 1% 311k ± 1% +0.48% (p=0.000 n=50+49) Reflect 1.00M ± 0% 1.00M ± 0% +0.17% (p=0.000 n=48+50) Tar 254k ± 1% 255k ± 1% +0.23% (p=0.005 n=50+50) XML 395k ± 1% 395k ± 1% +0.19% (p=0.002 n=49+47) Change-Id: Iaa8f5f37e23bd81983409f7359f9dcd4dfe2961f Reviewed-on: https://go-review.googlesource.com/38003 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-08 12:50:00 -08:00
addr = s.constOffPtrSP(ptrto(Types[TUINTPTR]), argStart+int64(Widthptr))
s.vars[&memVar] = s.newValue3A(ssa.OpStore, ssa.TypeMem, Types[TUINTPTR], addr, closure, s.mem())
stksize += 2 * int64(Widthptr)
}
// call target
var call *ssa.Value
switch {
case k == callDefer:
call = s.newValue1A(ssa.OpStaticCall, ssa.TypeMem, Deferproc, s.mem())
case k == callGo:
call = s.newValue1A(ssa.OpStaticCall, ssa.TypeMem, Newproc, s.mem())
case closure != nil:
codeptr = s.newValue2(ssa.OpLoad, Types[TUINTPTR], closure, s.mem())
call = s.newValue3(ssa.OpClosureCall, ssa.TypeMem, codeptr, closure, s.mem())
case codeptr != nil:
call = s.newValue2(ssa.OpInterCall, ssa.TypeMem, codeptr, s.mem())
case sym != nil:
call = s.newValue1A(ssa.OpStaticCall, ssa.TypeMem, Linksym(sym), s.mem())
default:
Fatalf("bad call type %v %v", n.Op, n)
}
call.AuxInt = stksize // Call operations carry the argsize of the callee along with them
s.vars[&memVar] = call
// Finish block for defers
if k == callDefer {
b := s.endBlock()
b.Kind = ssa.BlockDefer
b.SetControl(call)
bNext := s.f.NewBlock(ssa.BlockPlain)
b.AddEdgeTo(bNext)
// Add recover edge to exit code.
r := s.f.NewBlock(ssa.BlockPlain)
s.startBlock(r)
s.exit()
b.AddEdgeTo(r)
b.Likely = ssa.BranchLikely
s.startBlock(bNext)
}
res := n.Left.Type.Results()
if res.NumFields() == 0 || k != callNormal {
// call has no return value. Continue with the next statement.
return nil
}
fp := res.Field(0)
cmd/compile: add OpOffPtr [c] SP to constant cache They accounted for almost 30% of all CSE'd values. By never creating the duplicates in the first place, we reduce the high water mark of Value IDs, which in turn makes all SSA phases cheaper, particularly regalloc. name old time/op new time/op delta Template 200ms ± 3% 198ms ± 4% -0.87% (p=0.016 n=50+49) Unicode 86.9ms ± 2% 85.5ms ± 3% -1.56% (p=0.000 n=49+50) GoTypes 553ms ± 4% 551ms ± 4% ~ (p=0.183 n=50+49) SSA 3.97s ± 3% 3.93s ± 2% -1.06% (p=0.000 n=48+48) Flate 124ms ± 4% 124ms ± 3% ~ (p=0.545 n=48+50) GoParser 146ms ± 4% 146ms ± 4% ~ (p=0.810 n=49+49) Reflect 357ms ± 3% 355ms ± 3% -0.59% (p=0.049 n=50+48) Tar 106ms ± 4% 107ms ± 5% ~ (p=0.454 n=49+50) XML 203ms ± 4% 203ms ± 4% ~ (p=0.726 n=48+50) name old user-ns/op new user-ns/op delta Template 237M ± 3% 235M ± 4% ~ (p=0.208 n=47+48) Unicode 111M ± 4% 108M ± 9% -2.50% (p=0.000 n=47+50) GoTypes 736M ± 5% 729M ± 4% -0.95% (p=0.017 n=50+46) SSA 5.73G ± 4% 5.74G ± 4% ~ (p=0.765 n=50+50) Flate 150M ± 5% 148M ± 6% -0.89% (p=0.045 n=48+47) GoParser 180M ± 5% 178M ± 7% -1.34% (p=0.012 n=50+50) Reflect 450M ± 4% 444M ± 4% -1.40% (p=0.000 n=50+49) Tar 124M ± 7% 123M ± 7% ~ (p=0.092 n=50+50) XML 248M ± 6% 245M ± 5% ~ (p=0.057 n=50+50) name old alloc/op new alloc/op delta Template 39.4MB ± 0% 39.3MB ± 0% -0.37% (p=0.000 n=50+50) Unicode 30.9MB ± 0% 30.9MB ± 0% -0.27% (p=0.000 n=48+50) GoTypes 114MB ± 0% 113MB ± 0% -1.03% (p=0.000 n=50+49) SSA 882MB ± 0% 865MB ± 0% -1.95% (p=0.000 n=49+49) Flate 25.8MB ± 0% 25.7MB ± 0% -0.21% (p=0.000 n=50+50) GoParser 31.7MB ± 0% 31.6MB ± 0% -0.33% (p=0.000 n=50+50) Reflect 79.7MB ± 0% 79.3MB ± 0% -0.49% (p=0.000 n=44+49) Tar 27.2MB ± 0% 27.1MB ± 0% -0.31% (p=0.000 n=50+50) XML 42.7MB ± 0% 42.3MB ± 0% -1.05% (p=0.000 n=48+49) name old allocs/op new allocs/op delta Template 379k ± 1% 380k ± 1% +0.26% (p=0.000 n=50+50) Unicode 324k ± 1% 324k ± 1% ~ (p=0.964 n=49+50) GoTypes 1.14M ± 0% 1.15M ± 0% +0.14% (p=0.000 n=50+49) SSA 7.89M ± 0% 7.89M ± 0% -0.05% (p=0.000 n=49+49) Flate 240k ± 1% 241k ± 1% +0.27% (p=0.001 n=50+50) GoParser 310k ± 1% 311k ± 1% +0.48% (p=0.000 n=50+49) Reflect 1.00M ± 0% 1.00M ± 0% +0.17% (p=0.000 n=48+50) Tar 254k ± 1% 255k ± 1% +0.23% (p=0.005 n=50+50) XML 395k ± 1% 395k ± 1% +0.19% (p=0.002 n=49+47) Change-Id: Iaa8f5f37e23bd81983409f7359f9dcd4dfe2961f Reviewed-on: https://go-review.googlesource.com/38003 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-08 12:50:00 -08:00
return s.constOffPtrSP(ptrto(fp.Type), fp.Offset+Ctxt.FixedFrameSize())
}
// etypesign returns the signed-ness of e, for integer/pointer etypes.
// -1 means signed, +1 means unsigned, 0 means non-integer/non-pointer.
func etypesign(e EType) int8 {
switch e {
case TINT8, TINT16, TINT32, TINT64, TINT:
return -1
case TUINT8, TUINT16, TUINT32, TUINT64, TUINT, TUINTPTR, TUNSAFEPTR:
return +1
}
return 0
}
// lookupSymbol is used to retrieve the symbol (Extern, Arg or Auto) used for a particular node.
// This improves the effectiveness of cse by using the same Aux values for the
// same symbols.
func (s *state) lookupSymbol(n *Node, sym interface{}) interface{} {
switch sym.(type) {
default:
s.Fatalf("sym %v is of uknown type %T", sym, sym)
case *ssa.ExternSymbol, *ssa.ArgSymbol, *ssa.AutoSymbol:
// these are the only valid types
}
if lsym, ok := s.varsyms[n]; ok {
return lsym
} else {
s.varsyms[n] = sym
return sym
}
}
// addr converts the address of the expression n to SSA, adds it to s and returns the SSA result.
// The value that the returned Value represents is guaranteed to be non-nil.
// If bounded is true then this address does not require a nil check for its operand
// even if that would otherwise be implied.
func (s *state) addr(n *Node, bounded bool) *ssa.Value {
t := ptrto(n.Type)
switch n.Op {
case ONAME:
switch n.Class {
case PEXTERN:
// global variable
aux := s.lookupSymbol(n, &ssa.ExternSymbol{Typ: n.Type, Sym: Linksym(n.Sym)})
v := s.entryNewValue1A(ssa.OpAddr, t, aux, s.sb)
// TODO: Make OpAddr use AuxInt as well as Aux.
if n.Xoffset != 0 {
v = s.entryNewValue1I(ssa.OpOffPtr, v.Type, n.Xoffset, v)
}
return v
case PPARAM:
// parameter slot
v := s.decladdrs[n]
if v != nil {
return v
}
if n == nodfp {
// Special arg that points to the frame pointer (Used by ORECOVER).
aux := s.lookupSymbol(n, &ssa.ArgSymbol{Typ: n.Type, Node: n})
return s.entryNewValue1A(ssa.OpAddr, t, aux, s.sp)
}
s.Fatalf("addr of undeclared ONAME %v. declared: %v", n, s.decladdrs)
return nil
case PAUTO:
aux := s.lookupSymbol(n, &ssa.AutoSymbol{Typ: n.Type, Node: n})
return s.newValue1A(ssa.OpAddr, t, aux, s.sp)
case PPARAMOUT: // Same as PAUTO -- cannot generate LEA early.
// ensure that we reuse symbols for out parameters so
// that cse works on their addresses
aux := s.lookupSymbol(n, &ssa.ArgSymbol{Typ: n.Type, Node: n})
return s.newValue1A(ssa.OpAddr, t, aux, s.sp)
default:
s.Fatalf("variable address class %v not implemented", classnames[n.Class])
return nil
}
case OINDREGSP:
// indirect off REGSP
// used for storing/loading arguments/returns to/from callees
cmd/compile: add OpOffPtr [c] SP to constant cache They accounted for almost 30% of all CSE'd values. By never creating the duplicates in the first place, we reduce the high water mark of Value IDs, which in turn makes all SSA phases cheaper, particularly regalloc. name old time/op new time/op delta Template 200ms ± 3% 198ms ± 4% -0.87% (p=0.016 n=50+49) Unicode 86.9ms ± 2% 85.5ms ± 3% -1.56% (p=0.000 n=49+50) GoTypes 553ms ± 4% 551ms ± 4% ~ (p=0.183 n=50+49) SSA 3.97s ± 3% 3.93s ± 2% -1.06% (p=0.000 n=48+48) Flate 124ms ± 4% 124ms ± 3% ~ (p=0.545 n=48+50) GoParser 146ms ± 4% 146ms ± 4% ~ (p=0.810 n=49+49) Reflect 357ms ± 3% 355ms ± 3% -0.59% (p=0.049 n=50+48) Tar 106ms ± 4% 107ms ± 5% ~ (p=0.454 n=49+50) XML 203ms ± 4% 203ms ± 4% ~ (p=0.726 n=48+50) name old user-ns/op new user-ns/op delta Template 237M ± 3% 235M ± 4% ~ (p=0.208 n=47+48) Unicode 111M ± 4% 108M ± 9% -2.50% (p=0.000 n=47+50) GoTypes 736M ± 5% 729M ± 4% -0.95% (p=0.017 n=50+46) SSA 5.73G ± 4% 5.74G ± 4% ~ (p=0.765 n=50+50) Flate 150M ± 5% 148M ± 6% -0.89% (p=0.045 n=48+47) GoParser 180M ± 5% 178M ± 7% -1.34% (p=0.012 n=50+50) Reflect 450M ± 4% 444M ± 4% -1.40% (p=0.000 n=50+49) Tar 124M ± 7% 123M ± 7% ~ (p=0.092 n=50+50) XML 248M ± 6% 245M ± 5% ~ (p=0.057 n=50+50) name old alloc/op new alloc/op delta Template 39.4MB ± 0% 39.3MB ± 0% -0.37% (p=0.000 n=50+50) Unicode 30.9MB ± 0% 30.9MB ± 0% -0.27% (p=0.000 n=48+50) GoTypes 114MB ± 0% 113MB ± 0% -1.03% (p=0.000 n=50+49) SSA 882MB ± 0% 865MB ± 0% -1.95% (p=0.000 n=49+49) Flate 25.8MB ± 0% 25.7MB ± 0% -0.21% (p=0.000 n=50+50) GoParser 31.7MB ± 0% 31.6MB ± 0% -0.33% (p=0.000 n=50+50) Reflect 79.7MB ± 0% 79.3MB ± 0% -0.49% (p=0.000 n=44+49) Tar 27.2MB ± 0% 27.1MB ± 0% -0.31% (p=0.000 n=50+50) XML 42.7MB ± 0% 42.3MB ± 0% -1.05% (p=0.000 n=48+49) name old allocs/op new allocs/op delta Template 379k ± 1% 380k ± 1% +0.26% (p=0.000 n=50+50) Unicode 324k ± 1% 324k ± 1% ~ (p=0.964 n=49+50) GoTypes 1.14M ± 0% 1.15M ± 0% +0.14% (p=0.000 n=50+49) SSA 7.89M ± 0% 7.89M ± 0% -0.05% (p=0.000 n=49+49) Flate 240k ± 1% 241k ± 1% +0.27% (p=0.001 n=50+50) GoParser 310k ± 1% 311k ± 1% +0.48% (p=0.000 n=50+49) Reflect 1.00M ± 0% 1.00M ± 0% +0.17% (p=0.000 n=48+50) Tar 254k ± 1% 255k ± 1% +0.23% (p=0.005 n=50+50) XML 395k ± 1% 395k ± 1% +0.19% (p=0.002 n=49+47) Change-Id: Iaa8f5f37e23bd81983409f7359f9dcd4dfe2961f Reviewed-on: https://go-review.googlesource.com/38003 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-08 12:50:00 -08:00
return s.constOffPtrSP(t, n.Xoffset)
case OINDEX:
if n.Left.Type.IsSlice() {
a := s.expr(n.Left)
i := s.expr(n.Right)
i = s.extendIndex(i, panicindex)
len := s.newValue1(ssa.OpSliceLen, Types[TINT], a)
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
if !n.Bounded() {
s.boundsCheck(i, len)
}
p := s.newValue1(ssa.OpSlicePtr, t, a)
return s.newValue2(ssa.OpPtrIndex, t, p, i)
} else { // array
a := s.addr(n.Left, bounded)
i := s.expr(n.Right)
i = s.extendIndex(i, panicindex)
len := s.constInt(Types[TINT], n.Left.Type.NumElem())
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
if !n.Bounded() {
s.boundsCheck(i, len)
}
return s.newValue2(ssa.OpPtrIndex, ptrto(n.Left.Type.Elem()), a, i)
}
case OIND:
return s.exprPtr(n.Left, bounded, n.Pos)
case ODOT:
p := s.addr(n.Left, bounded)
return s.newValue1I(ssa.OpOffPtr, t, n.Xoffset, p)
case ODOTPTR:
p := s.exprPtr(n.Left, bounded, n.Pos)
return s.newValue1I(ssa.OpOffPtr, t, n.Xoffset, p)
case OCLOSUREVAR:
return s.newValue1I(ssa.OpOffPtr, t, n.Xoffset,
s.entryNewValue0(ssa.OpGetClosurePtr, ptrto(Types[TUINT8])))
case OCONVNOP:
addr := s.addr(n.Left, bounded)
return s.newValue1(ssa.OpCopy, t, addr) // ensure that addr has the right type
case OCALLFUNC, OCALLINTER, OCALLMETH:
return s.call(n, callNormal)
case ODOTTYPE:
v, _ := s.dottype(n, false)
if v.Op != ssa.OpLoad {
s.Fatalf("dottype of non-load")
}
if v.Args[1] != s.mem() {
s.Fatalf("memory no longer live from dottype load")
}
return v.Args[0]
default:
s.Fatalf("unhandled addr %v", n.Op)
return nil
}
}
// canSSA reports whether n is SSA-able.
// n must be an ONAME (or an ODOT sequence with an ONAME base).
func (s *state) canSSA(n *Node) bool {
if Debug['N'] != 0 {
return false
}
for n.Op == ODOT || (n.Op == OINDEX && n.Left.Type.IsArray()) {
n = n.Left
}
if n.Op != ONAME {
return false
}
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
if n.Addrtaken() {
return false
}
cmd/compile: fix liveness computation for heap-escaped parameters The liveness computation of parameters generally was never correct, but forcing all parameters to be live throughout the function covered up that problem. The new SSA back end is too clever: even though it currently keeps the parameter values live throughout the function, it may find optimizations that mean the current values are not written back to the original parameter stack slots immediately or ever (for example if a parameter is set to nil, SSA constant propagation may replace all later uses of the parameter with a constant nil, eliminating the need to write the nil value back to the stack slot), so the liveness code must now track the actual operations on the stack slots, exposing these problems. One small problem in the handling of arguments is that nodarg can return ONAME PPARAM nodes with adjusted offsets, so that there are actually multiple *Node pointers for the same parameter in the instruction stream. This might be possible to correct, but not in this CL. For now, we fix this by using n.Orig instead of n when considering PPARAM and PPARAMOUT nodes. The major problem in the handling of arguments is general confusion in the liveness code about the meaning of PPARAM|PHEAP and PPARAMOUT|PHEAP nodes, especially as contrasted with PAUTO|PHEAP. The difference between these two is that when a local variable "moves" to the heap, it's really just allocated there to start with; in contrast, when an argument moves to the heap, the actual data has to be copied there from the stack at the beginning of the function, and when a result "moves" to the heap the value in the heap has to be copied back to the stack when the function returns This general confusion is also present in the SSA back end. The PHEAP bit worked decently when I first introduced it 7 years ago (!) in 391425ae. The back end did nothing sophisticated, and in particular there was no analysis at all: no escape analysis, no liveness analysis, and certainly no SSA back end. But the complications caused in the various downstream consumers suggest that this should be a detail kept mainly in the front end. This CL therefore eliminates both the PHEAP bit and even the idea of "heap variables" from the back ends. First, it replaces the PPARAM|PHEAP, PPARAMOUT|PHEAP, and PAUTO|PHEAP variable classes with the single PAUTOHEAP, a pseudo-class indicating a variable maintained on the heap and available by indirecting a local variable kept on the stack (a plain PAUTO). Second, walkexpr replaces all references to PAUTOHEAP variables with indirections of the corresponding PAUTO variable. The back ends and the liveness code now just see plain indirected variables. This may actually produce better code, but the real goal here is to eliminate these little-used and somewhat suspect code paths in the back end analyses. The OPARAM node type goes away too. A followup CL will do the same to PPARAMREF. I'm not sure that the back ends (SSA in particular) are handling those right either, and with the framework established in this CL that change is trivial and the result clearly more correct. Fixes #15747. Change-Id: I2770b1ce3cbc93981bfc7166be66a9da12013d74 Reviewed-on: https://go-review.googlesource.com/23393 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-25 01:33:24 -04:00
if n.isParamHeapCopy() {
return false
}
cmd/compile: fix liveness computation for heap-escaped parameters The liveness computation of parameters generally was never correct, but forcing all parameters to be live throughout the function covered up that problem. The new SSA back end is too clever: even though it currently keeps the parameter values live throughout the function, it may find optimizations that mean the current values are not written back to the original parameter stack slots immediately or ever (for example if a parameter is set to nil, SSA constant propagation may replace all later uses of the parameter with a constant nil, eliminating the need to write the nil value back to the stack slot), so the liveness code must now track the actual operations on the stack slots, exposing these problems. One small problem in the handling of arguments is that nodarg can return ONAME PPARAM nodes with adjusted offsets, so that there are actually multiple *Node pointers for the same parameter in the instruction stream. This might be possible to correct, but not in this CL. For now, we fix this by using n.Orig instead of n when considering PPARAM and PPARAMOUT nodes. The major problem in the handling of arguments is general confusion in the liveness code about the meaning of PPARAM|PHEAP and PPARAMOUT|PHEAP nodes, especially as contrasted with PAUTO|PHEAP. The difference between these two is that when a local variable "moves" to the heap, it's really just allocated there to start with; in contrast, when an argument moves to the heap, the actual data has to be copied there from the stack at the beginning of the function, and when a result "moves" to the heap the value in the heap has to be copied back to the stack when the function returns This general confusion is also present in the SSA back end. The PHEAP bit worked decently when I first introduced it 7 years ago (!) in 391425ae. The back end did nothing sophisticated, and in particular there was no analysis at all: no escape analysis, no liveness analysis, and certainly no SSA back end. But the complications caused in the various downstream consumers suggest that this should be a detail kept mainly in the front end. This CL therefore eliminates both the PHEAP bit and even the idea of "heap variables" from the back ends. First, it replaces the PPARAM|PHEAP, PPARAMOUT|PHEAP, and PAUTO|PHEAP variable classes with the single PAUTOHEAP, a pseudo-class indicating a variable maintained on the heap and available by indirecting a local variable kept on the stack (a plain PAUTO). Second, walkexpr replaces all references to PAUTOHEAP variables with indirections of the corresponding PAUTO variable. The back ends and the liveness code now just see plain indirected variables. This may actually produce better code, but the real goal here is to eliminate these little-used and somewhat suspect code paths in the back end analyses. The OPARAM node type goes away too. A followup CL will do the same to PPARAMREF. I'm not sure that the back ends (SSA in particular) are handling those right either, and with the framework established in this CL that change is trivial and the result clearly more correct. Fixes #15747. Change-Id: I2770b1ce3cbc93981bfc7166be66a9da12013d74 Reviewed-on: https://go-review.googlesource.com/23393 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-25 01:33:24 -04:00
if n.Class == PAUTOHEAP {
Fatalf("canSSA of PAUTOHEAP %v", n)
}
switch n.Class {
case PEXTERN:
return false
case PPARAMOUT:
if s.hasdefer {
// TODO: handle this case? Named return values must be
// in memory so that the deferred function can see them.
// Maybe do: if !strings.HasPrefix(n.String(), "~") { return false }
// Or maybe not, see issue 18860. Even unnamed return values
// must be written back so if a defer recovers, the caller can see them.
return false
}
if s.cgoUnsafeArgs {
// Cgo effectively takes the address of all result args,
// but the compiler can't see that.
return false
}
}
if n.Class == PPARAM && n.Sym != nil && n.Sym.Name == ".this" {
// wrappers generated by genwrapper need to update
// the .this pointer in place.
// TODO: treat as a PPARMOUT?
return false
}
return canSSAType(n.Type)
// TODO: try to make more variables SSAable?
}
// canSSA reports whether variables of type t are SSA-able.
func canSSAType(t *Type) bool {
dowidth(t)
if t.Width > int64(4*Widthptr) {
// 4*Widthptr is an arbitrary constant. We want it
// to be at least 3*Widthptr so slices can be registerized.
// Too big and we'll introduce too much register pressure.
return false
}
switch t.Etype {
case TARRAY:
// We can't do larger arrays because dynamic indexing is
// not supported on SSA variables.
// TODO: allow if all indexes are constant.
if t.NumElem() == 0 {
return true
}
if t.NumElem() == 1 {
return canSSAType(t.Elem())
}
return false
case TSTRUCT:
if t.NumFields() > ssa.MaxStruct {
return false
}
for _, t1 := range t.Fields().Slice() {
if !canSSAType(t1.Type) {
return false
}
}
return true
default:
return true
}
}
// exprPtr evaluates n to a pointer and nil-checks it.
2016-12-15 17:17:01 -08:00
func (s *state) exprPtr(n *Node, bounded bool, lineno src.XPos) *ssa.Value {
p := s.expr(n)
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
if bounded || n.NonNil() {
if s.f.Config.Debug_checknil() && lineno.Line() > 1 {
s.f.Config.Warnl(lineno, "removed nil check")
}
return p
}
s.nilCheck(p)
return p
}
// nilCheck generates nil pointer checking code.
// Used only for automatically inserted nil checks,
// not for user code like 'x != nil'.
func (s *state) nilCheck(ptr *ssa.Value) {
if disable_checknil != 0 {
return
}
s.newValue2(ssa.OpNilCheck, ssa.TypeVoid, ptr, s.mem())
}
// boundsCheck generates bounds checking code. Checks if 0 <= idx < len, branches to exit if not.
// Starts a new block on return.
// idx is already converted to full int width.
func (s *state) boundsCheck(idx, len *ssa.Value) {
if Debug['B'] != 0 {
return
}
// bounds check
cmp := s.newValue2(ssa.OpIsInBounds, Types[TBOOL], idx, len)
s.check(cmp, panicindex)
}
// sliceBoundsCheck generates slice bounds checking code. Checks if 0 <= idx <= len, branches to exit if not.
// Starts a new block on return.
// idx and len are already converted to full int width.
func (s *state) sliceBoundsCheck(idx, len *ssa.Value) {
if Debug['B'] != 0 {
return
}
// bounds check
cmp := s.newValue2(ssa.OpIsSliceInBounds, Types[TBOOL], idx, len)
s.check(cmp, panicslice)
}
// If cmp (a bool) is false, panic using the given function.
func (s *state) check(cmp *ssa.Value, fn *obj.LSym) {
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(cmp)
b.Likely = ssa.BranchLikely
bNext := s.f.NewBlock(ssa.BlockPlain)
line := s.peekPos()
bPanic := s.panics[funcLine{fn, line}]
if bPanic == nil {
bPanic = s.f.NewBlock(ssa.BlockPlain)
s.panics[funcLine{fn, line}] = bPanic
s.startBlock(bPanic)
// The panic call takes/returns memory to ensure that the right
// memory state is observed if the panic happens.
s.rtcall(fn, false, nil)
}
b.AddEdgeTo(bNext)
b.AddEdgeTo(bPanic)
s.startBlock(bNext)
}
func (s *state) intDivide(n *Node, a, b *ssa.Value) *ssa.Value {
needcheck := true
switch b.Op {
case ssa.OpConst8, ssa.OpConst16, ssa.OpConst32, ssa.OpConst64:
if b.AuxInt != 0 {
needcheck = false
}
}
if needcheck {
// do a size-appropriate check for zero
cmp := s.newValue2(s.ssaOp(ONE, n.Type), Types[TBOOL], b, s.zeroVal(n.Type))
s.check(cmp, panicdivide)
}
return s.newValue2(s.ssaOp(n.Op, n.Type), a.Type, a, b)
}
// rtcall issues a call to the given runtime function fn with the listed args.
// Returns a slice of results of the given result types.
// The call is added to the end of the current block.
// If returns is false, the block is marked as an exit block.
func (s *state) rtcall(fn *obj.LSym, returns bool, results []*Type, args ...*ssa.Value) []*ssa.Value {
// Write args to the stack
off := Ctxt.FixedFrameSize()
for _, arg := range args {
t := arg.Type
off = Rnd(off, t.Alignment())
cmd/compile: add OpOffPtr [c] SP to constant cache They accounted for almost 30% of all CSE'd values. By never creating the duplicates in the first place, we reduce the high water mark of Value IDs, which in turn makes all SSA phases cheaper, particularly regalloc. name old time/op new time/op delta Template 200ms ± 3% 198ms ± 4% -0.87% (p=0.016 n=50+49) Unicode 86.9ms ± 2% 85.5ms ± 3% -1.56% (p=0.000 n=49+50) GoTypes 553ms ± 4% 551ms ± 4% ~ (p=0.183 n=50+49) SSA 3.97s ± 3% 3.93s ± 2% -1.06% (p=0.000 n=48+48) Flate 124ms ± 4% 124ms ± 3% ~ (p=0.545 n=48+50) GoParser 146ms ± 4% 146ms ± 4% ~ (p=0.810 n=49+49) Reflect 357ms ± 3% 355ms ± 3% -0.59% (p=0.049 n=50+48) Tar 106ms ± 4% 107ms ± 5% ~ (p=0.454 n=49+50) XML 203ms ± 4% 203ms ± 4% ~ (p=0.726 n=48+50) name old user-ns/op new user-ns/op delta Template 237M ± 3% 235M ± 4% ~ (p=0.208 n=47+48) Unicode 111M ± 4% 108M ± 9% -2.50% (p=0.000 n=47+50) GoTypes 736M ± 5% 729M ± 4% -0.95% (p=0.017 n=50+46) SSA 5.73G ± 4% 5.74G ± 4% ~ (p=0.765 n=50+50) Flate 150M ± 5% 148M ± 6% -0.89% (p=0.045 n=48+47) GoParser 180M ± 5% 178M ± 7% -1.34% (p=0.012 n=50+50) Reflect 450M ± 4% 444M ± 4% -1.40% (p=0.000 n=50+49) Tar 124M ± 7% 123M ± 7% ~ (p=0.092 n=50+50) XML 248M ± 6% 245M ± 5% ~ (p=0.057 n=50+50) name old alloc/op new alloc/op delta Template 39.4MB ± 0% 39.3MB ± 0% -0.37% (p=0.000 n=50+50) Unicode 30.9MB ± 0% 30.9MB ± 0% -0.27% (p=0.000 n=48+50) GoTypes 114MB ± 0% 113MB ± 0% -1.03% (p=0.000 n=50+49) SSA 882MB ± 0% 865MB ± 0% -1.95% (p=0.000 n=49+49) Flate 25.8MB ± 0% 25.7MB ± 0% -0.21% (p=0.000 n=50+50) GoParser 31.7MB ± 0% 31.6MB ± 0% -0.33% (p=0.000 n=50+50) Reflect 79.7MB ± 0% 79.3MB ± 0% -0.49% (p=0.000 n=44+49) Tar 27.2MB ± 0% 27.1MB ± 0% -0.31% (p=0.000 n=50+50) XML 42.7MB ± 0% 42.3MB ± 0% -1.05% (p=0.000 n=48+49) name old allocs/op new allocs/op delta Template 379k ± 1% 380k ± 1% +0.26% (p=0.000 n=50+50) Unicode 324k ± 1% 324k ± 1% ~ (p=0.964 n=49+50) GoTypes 1.14M ± 0% 1.15M ± 0% +0.14% (p=0.000 n=50+49) SSA 7.89M ± 0% 7.89M ± 0% -0.05% (p=0.000 n=49+49) Flate 240k ± 1% 241k ± 1% +0.27% (p=0.001 n=50+50) GoParser 310k ± 1% 311k ± 1% +0.48% (p=0.000 n=50+49) Reflect 1.00M ± 0% 1.00M ± 0% +0.17% (p=0.000 n=48+50) Tar 254k ± 1% 255k ± 1% +0.23% (p=0.005 n=50+50) XML 395k ± 1% 395k ± 1% +0.19% (p=0.002 n=49+47) Change-Id: Iaa8f5f37e23bd81983409f7359f9dcd4dfe2961f Reviewed-on: https://go-review.googlesource.com/38003 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-08 12:50:00 -08:00
ptr := s.constOffPtrSP(t.PtrTo(), off)
size := t.Size()
s.vars[&memVar] = s.newValue3A(ssa.OpStore, ssa.TypeMem, t, ptr, arg, s.mem())
off += size
}
off = Rnd(off, int64(Widthptr))
if Thearch.LinkArch.Name == "amd64p32" {
// amd64p32 wants 8-byte alignment of the start of the return values.
off = Rnd(off, 8)
}
// Issue call
call := s.newValue1A(ssa.OpStaticCall, ssa.TypeMem, fn, s.mem())
s.vars[&memVar] = call
if !returns {
// Finish block
b := s.endBlock()
b.Kind = ssa.BlockExit
b.SetControl(call)
[dev.ssa] cmd/compile: fix argument size of runtime call in SSA for ARM The argument size for runtime call was incorrectly includes the size of LR (FixedFrameSize in general). This makes the stack frame sometimes unnecessarily 4 bytes larger on ARM. For example, func f(b []byte) byte { return b[0] } compiles to 0x0000 00000 (h.go:6) TEXT "".f(SB), $4-16 // <-- framesize = 4 0x0000 00000 (h.go:6) MOVW 8(g), R1 0x0004 00004 (h.go:6) CMP R1, R13 0x0008 00008 (h.go:6) BLS 52 0x000c 00012 (h.go:6) MOVW.W R14, -8(R13) 0x0010 00016 (h.go:6) FUNCDATA $0, gclocals·8355ad952265fec823c17fcf739bd009(SB) 0x0010 00016 (h.go:6) FUNCDATA $1, gclocals·69c1753bd5f81501d95132d08af04464(SB) 0x0010 00016 (h.go:6) MOVW "".b+4(FP), R0 0x0014 00020 (h.go:6) CMP $0, R0 0x0018 00024 (h.go:6) BLS 44 0x001c 00028 (h.go:6) MOVW "".b(FP), R0 0x0020 00032 (h.go:6) MOVBU (R0), R0 0x0024 00036 (h.go:6) MOVB R0, "".~r1+12(FP) 0x0028 00040 (h.go:6) MOVW.P 8(R13), R15 0x002c 00044 (h.go:6) PCDATA $0, $1 0x002c 00044 (h.go:6) CALL runtime.panicindex(SB) 0x0030 00048 (h.go:6) UNDEF 0x0034 00052 (h.go:6) NOP 0x0034 00052 (h.go:6) MOVW R14, R3 0x0038 00056 (h.go:6) CALL runtime.morestack_noctxt(SB) 0x003c 00060 (h.go:6) JMP 0 Note that the frame size is 4, but there is actually no local. It incorrectly thinks call to runtime.panicindex needs 4 bytes space for argument. This CL fixes it. Updates #15365. Change-Id: Ic65d55283a6aa8a7861d7a3fbc7b63c35785eeec Reviewed-on: https://go-review.googlesource.com/24909 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-07-13 09:22:48 -06:00
call.AuxInt = off - Ctxt.FixedFrameSize()
if len(results) > 0 {
Fatalf("panic call can't have results")
}
return nil
}
// Load results
res := make([]*ssa.Value, len(results))
for i, t := range results {
off = Rnd(off, t.Alignment())
cmd/compile: add OpOffPtr [c] SP to constant cache They accounted for almost 30% of all CSE'd values. By never creating the duplicates in the first place, we reduce the high water mark of Value IDs, which in turn makes all SSA phases cheaper, particularly regalloc. name old time/op new time/op delta Template 200ms ± 3% 198ms ± 4% -0.87% (p=0.016 n=50+49) Unicode 86.9ms ± 2% 85.5ms ± 3% -1.56% (p=0.000 n=49+50) GoTypes 553ms ± 4% 551ms ± 4% ~ (p=0.183 n=50+49) SSA 3.97s ± 3% 3.93s ± 2% -1.06% (p=0.000 n=48+48) Flate 124ms ± 4% 124ms ± 3% ~ (p=0.545 n=48+50) GoParser 146ms ± 4% 146ms ± 4% ~ (p=0.810 n=49+49) Reflect 357ms ± 3% 355ms ± 3% -0.59% (p=0.049 n=50+48) Tar 106ms ± 4% 107ms ± 5% ~ (p=0.454 n=49+50) XML 203ms ± 4% 203ms ± 4% ~ (p=0.726 n=48+50) name old user-ns/op new user-ns/op delta Template 237M ± 3% 235M ± 4% ~ (p=0.208 n=47+48) Unicode 111M ± 4% 108M ± 9% -2.50% (p=0.000 n=47+50) GoTypes 736M ± 5% 729M ± 4% -0.95% (p=0.017 n=50+46) SSA 5.73G ± 4% 5.74G ± 4% ~ (p=0.765 n=50+50) Flate 150M ± 5% 148M ± 6% -0.89% (p=0.045 n=48+47) GoParser 180M ± 5% 178M ± 7% -1.34% (p=0.012 n=50+50) Reflect 450M ± 4% 444M ± 4% -1.40% (p=0.000 n=50+49) Tar 124M ± 7% 123M ± 7% ~ (p=0.092 n=50+50) XML 248M ± 6% 245M ± 5% ~ (p=0.057 n=50+50) name old alloc/op new alloc/op delta Template 39.4MB ± 0% 39.3MB ± 0% -0.37% (p=0.000 n=50+50) Unicode 30.9MB ± 0% 30.9MB ± 0% -0.27% (p=0.000 n=48+50) GoTypes 114MB ± 0% 113MB ± 0% -1.03% (p=0.000 n=50+49) SSA 882MB ± 0% 865MB ± 0% -1.95% (p=0.000 n=49+49) Flate 25.8MB ± 0% 25.7MB ± 0% -0.21% (p=0.000 n=50+50) GoParser 31.7MB ± 0% 31.6MB ± 0% -0.33% (p=0.000 n=50+50) Reflect 79.7MB ± 0% 79.3MB ± 0% -0.49% (p=0.000 n=44+49) Tar 27.2MB ± 0% 27.1MB ± 0% -0.31% (p=0.000 n=50+50) XML 42.7MB ± 0% 42.3MB ± 0% -1.05% (p=0.000 n=48+49) name old allocs/op new allocs/op delta Template 379k ± 1% 380k ± 1% +0.26% (p=0.000 n=50+50) Unicode 324k ± 1% 324k ± 1% ~ (p=0.964 n=49+50) GoTypes 1.14M ± 0% 1.15M ± 0% +0.14% (p=0.000 n=50+49) SSA 7.89M ± 0% 7.89M ± 0% -0.05% (p=0.000 n=49+49) Flate 240k ± 1% 241k ± 1% +0.27% (p=0.001 n=50+50) GoParser 310k ± 1% 311k ± 1% +0.48% (p=0.000 n=50+49) Reflect 1.00M ± 0% 1.00M ± 0% +0.17% (p=0.000 n=48+50) Tar 254k ± 1% 255k ± 1% +0.23% (p=0.005 n=50+50) XML 395k ± 1% 395k ± 1% +0.19% (p=0.002 n=49+47) Change-Id: Iaa8f5f37e23bd81983409f7359f9dcd4dfe2961f Reviewed-on: https://go-review.googlesource.com/38003 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-08 12:50:00 -08:00
ptr := s.constOffPtrSP(ptrto(t), off)
res[i] = s.newValue2(ssa.OpLoad, t, ptr, s.mem())
off += t.Size()
}
off = Rnd(off, int64(Widthptr))
// Remember how much callee stack space we needed.
call.AuxInt = off
return res
}
// do *left = right for type t.
func (s *state) storeType(t *Type, left, right *ssa.Value, skip skipMask) {
// store scalar fields first, so write barrier stores for
// pointer fields can be grouped together, and scalar values
// don't need to be live across the write barrier call.
s.storeTypeScalars(t, left, right, skip)
if skip&skipPtr == 0 && haspointers(t) {
s.storeTypePtrs(t, left, right)
}
}
// do *left = right for all scalar (non-pointer) parts of t.
func (s *state) storeTypeScalars(t *Type, left, right *ssa.Value, skip skipMask) {
switch {
case t.IsBoolean() || t.IsInteger() || t.IsFloat() || t.IsComplex():
s.vars[&memVar] = s.newValue3A(ssa.OpStore, ssa.TypeMem, t, left, right, s.mem())
case t.IsPtrShaped():
// no scalar fields.
case t.IsString():
if skip&skipLen != 0 {
return
}
len := s.newValue1(ssa.OpStringLen, Types[TINT], right)
lenAddr := s.newValue1I(ssa.OpOffPtr, ptrto(Types[TINT]), s.config.IntSize, left)
s.vars[&memVar] = s.newValue3A(ssa.OpStore, ssa.TypeMem, Types[TINT], lenAddr, len, s.mem())
case t.IsSlice():
if skip&skipLen == 0 {
len := s.newValue1(ssa.OpSliceLen, Types[TINT], right)
lenAddr := s.newValue1I(ssa.OpOffPtr, ptrto(Types[TINT]), s.config.IntSize, left)
s.vars[&memVar] = s.newValue3A(ssa.OpStore, ssa.TypeMem, Types[TINT], lenAddr, len, s.mem())
}
if skip&skipCap == 0 {
cap := s.newValue1(ssa.OpSliceCap, Types[TINT], right)
capAddr := s.newValue1I(ssa.OpOffPtr, ptrto(Types[TINT]), 2*s.config.IntSize, left)
s.vars[&memVar] = s.newValue3A(ssa.OpStore, ssa.TypeMem, Types[TINT], capAddr, cap, s.mem())
}
case t.IsInterface():
// itab field doesn't need a write barrier (even though it is a pointer).
itab := s.newValue1(ssa.OpITab, ptrto(Types[TUINT8]), right)
s.vars[&memVar] = s.newValue3A(ssa.OpStore, ssa.TypeMem, Types[TUINTPTR], left, itab, s.mem())
case t.IsStruct():
n := t.NumFields()
for i := 0; i < n; i++ {
ft := t.FieldType(i)
addr := s.newValue1I(ssa.OpOffPtr, ft.PtrTo(), t.FieldOff(i), left)
val := s.newValue1I(ssa.OpStructSelect, ft, int64(i), right)
s.storeTypeScalars(ft.(*Type), addr, val, 0)
}
case t.IsArray() && t.NumElem() == 0:
// nothing
case t.IsArray() && t.NumElem() == 1:
s.storeTypeScalars(t.Elem(), left, s.newValue1I(ssa.OpArraySelect, t.Elem(), 0, right), 0)
default:
s.Fatalf("bad write barrier type %v", t)
}
}
// do *left = right for all pointer parts of t.
func (s *state) storeTypePtrs(t *Type, left, right *ssa.Value) {
switch {
case t.IsPtrShaped():
s.vars[&memVar] = s.newValue3A(ssa.OpStore, ssa.TypeMem, t, left, right, s.mem())
case t.IsString():
ptr := s.newValue1(ssa.OpStringPtr, ptrto(Types[TUINT8]), right)
s.vars[&memVar] = s.newValue3A(ssa.OpStore, ssa.TypeMem, ptrto(Types[TUINT8]), left, ptr, s.mem())
case t.IsSlice():
ptr := s.newValue1(ssa.OpSlicePtr, ptrto(Types[TUINT8]), right)
s.vars[&memVar] = s.newValue3A(ssa.OpStore, ssa.TypeMem, ptrto(Types[TUINT8]), left, ptr, s.mem())
case t.IsInterface():
// itab field is treated as a scalar.
idata := s.newValue1(ssa.OpIData, ptrto(Types[TUINT8]), right)
idataAddr := s.newValue1I(ssa.OpOffPtr, ptrto(ptrto(Types[TUINT8])), s.config.PtrSize, left)
s.vars[&memVar] = s.newValue3A(ssa.OpStore, ssa.TypeMem, ptrto(Types[TUINT8]), idataAddr, idata, s.mem())
case t.IsStruct():
n := t.NumFields()
for i := 0; i < n; i++ {
ft := t.FieldType(i)
if !haspointers(ft.(*Type)) {
continue
}
addr := s.newValue1I(ssa.OpOffPtr, ft.PtrTo(), t.FieldOff(i), left)
val := s.newValue1I(ssa.OpStructSelect, ft, int64(i), right)
s.storeTypePtrs(ft.(*Type), addr, val)
}
case t.IsArray() && t.NumElem() == 0:
// nothing
case t.IsArray() && t.NumElem() == 1:
s.storeTypePtrs(t.Elem(), left, s.newValue1I(ssa.OpArraySelect, t.Elem(), 0, right))
default:
s.Fatalf("bad write barrier type %v", t)
}
}
// slice computes the slice v[i:j:k] and returns ptr, len, and cap of result.
// i,j,k may be nil, in which case they are set to their default value.
// t is a slice, ptr to array, or string type.
func (s *state) slice(t *Type, v, i, j, k *ssa.Value) (p, l, c *ssa.Value) {
var elemtype *Type
var ptrtype *Type
var ptr *ssa.Value
var len *ssa.Value
var cap *ssa.Value
zero := s.constInt(Types[TINT], 0)
switch {
case t.IsSlice():
elemtype = t.Elem()
ptrtype = ptrto(elemtype)
ptr = s.newValue1(ssa.OpSlicePtr, ptrtype, v)
len = s.newValue1(ssa.OpSliceLen, Types[TINT], v)
cap = s.newValue1(ssa.OpSliceCap, Types[TINT], v)
case t.IsString():
elemtype = Types[TUINT8]
ptrtype = ptrto(elemtype)
ptr = s.newValue1(ssa.OpStringPtr, ptrtype, v)
len = s.newValue1(ssa.OpStringLen, Types[TINT], v)
cap = len
case t.IsPtr():
if !t.Elem().IsArray() {
s.Fatalf("bad ptr to array in slice %v\n", t)
}
elemtype = t.Elem().Elem()
ptrtype = ptrto(elemtype)
s.nilCheck(v)
ptr = v
len = s.constInt(Types[TINT], t.Elem().NumElem())
cap = len
default:
s.Fatalf("bad type in slice %v\n", t)
}
// Set default values
if i == nil {
i = zero
}
if j == nil {
j = len
}
if k == nil {
k = cap
}
// Panic if slice indices are not in bounds.
s.sliceBoundsCheck(i, j)
if j != k {
s.sliceBoundsCheck(j, k)
}
if k != cap {
s.sliceBoundsCheck(k, cap)
}
// Generate the following code assuming that indexes are in bounds.
// The masking is to make sure that we don't generate a slice
// that points to the next object in memory.
// rlen = j - i
// rcap = k - i
// delta = i * elemsize
// rptr = p + delta&mask(rcap)
// result = (SliceMake rptr rlen rcap)
// where mask(x) is 0 if x==0 and -1 if x>0.
subOp := s.ssaOp(OSUB, Types[TINT])
mulOp := s.ssaOp(OMUL, Types[TINT])
andOp := s.ssaOp(OAND, Types[TINT])
rlen := s.newValue2(subOp, Types[TINT], j, i)
var rcap *ssa.Value
switch {
case t.IsString():
// Capacity of the result is unimportant. However, we use
// rcap to test if we've generated a zero-length slice.
// Use length of strings for that.
rcap = rlen
case j == k:
rcap = rlen
default:
rcap = s.newValue2(subOp, Types[TINT], k, i)
}
var rptr *ssa.Value
if (i.Op == ssa.OpConst64 || i.Op == ssa.OpConst32) && i.AuxInt == 0 {
// No pointer arithmetic necessary.
rptr = ptr
} else {
// delta = # of bytes to offset pointer by.
delta := s.newValue2(mulOp, Types[TINT], i, s.constInt(Types[TINT], elemtype.Width))
// If we're slicing to the point where the capacity is zero,
// zero out the delta.
mask := s.newValue1(ssa.OpSlicemask, Types[TINT], rcap)
delta = s.newValue2(andOp, Types[TINT], delta, mask)
// Compute rptr = ptr + delta
rptr = s.newValue2(ssa.OpAddPtr, ptrtype, ptr, delta)
}
return rptr, rlen, rcap
}
type u642fcvtTab struct {
geq, cvt2F, and, rsh, or, add ssa.Op
one func(*state, ssa.Type, int64) *ssa.Value
}
var u64_f64 u642fcvtTab = u642fcvtTab{
geq: ssa.OpGeq64,
cvt2F: ssa.OpCvt64to64F,
and: ssa.OpAnd64,
rsh: ssa.OpRsh64Ux64,
or: ssa.OpOr64,
add: ssa.OpAdd64F,
one: (*state).constInt64,
}
var u64_f32 u642fcvtTab = u642fcvtTab{
geq: ssa.OpGeq64,
cvt2F: ssa.OpCvt64to32F,
and: ssa.OpAnd64,
rsh: ssa.OpRsh64Ux64,
or: ssa.OpOr64,
add: ssa.OpAdd32F,
one: (*state).constInt64,
}
func (s *state) uint64Tofloat64(n *Node, x *ssa.Value, ft, tt *Type) *ssa.Value {
return s.uint64Tofloat(&u64_f64, n, x, ft, tt)
}
func (s *state) uint64Tofloat32(n *Node, x *ssa.Value, ft, tt *Type) *ssa.Value {
return s.uint64Tofloat(&u64_f32, n, x, ft, tt)
}
func (s *state) uint64Tofloat(cvttab *u642fcvtTab, n *Node, x *ssa.Value, ft, tt *Type) *ssa.Value {
// if x >= 0 {
// result = (floatY) x
// } else {
// y = uintX(x) ; y = x & 1
// z = uintX(x) ; z = z >> 1
// z = z >> 1
// z = z | y
// result = floatY(z)
// result = result + result
// }
//
// Code borrowed from old code generator.
// What's going on: large 64-bit "unsigned" looks like
// negative number to hardware's integer-to-float
// conversion. However, because the mantissa is only
// 63 bits, we don't need the LSB, so instead we do an
// unsigned right shift (divide by two), convert, and
// double. However, before we do that, we need to be
// sure that we do not lose a "1" if that made the
// difference in the resulting rounding. Therefore, we
// preserve it, and OR (not ADD) it back in. The case
// that matters is when the eleven discarded bits are
// equal to 10000000001; that rounds up, and the 1 cannot
// be lost else it would round down if the LSB of the
// candidate mantissa is 0.
cmp := s.newValue2(cvttab.geq, Types[TBOOL], x, s.zeroVal(ft))
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(cmp)
b.Likely = ssa.BranchLikely
bThen := s.f.NewBlock(ssa.BlockPlain)
bElse := s.f.NewBlock(ssa.BlockPlain)
bAfter := s.f.NewBlock(ssa.BlockPlain)
b.AddEdgeTo(bThen)
s.startBlock(bThen)
a0 := s.newValue1(cvttab.cvt2F, tt, x)
s.vars[n] = a0
s.endBlock()
bThen.AddEdgeTo(bAfter)
b.AddEdgeTo(bElse)
s.startBlock(bElse)
one := cvttab.one(s, ft, 1)
y := s.newValue2(cvttab.and, ft, x, one)
z := s.newValue2(cvttab.rsh, ft, x, one)
z = s.newValue2(cvttab.or, ft, z, y)
a := s.newValue1(cvttab.cvt2F, tt, z)
a1 := s.newValue2(cvttab.add, tt, a, a)
s.vars[n] = a1
s.endBlock()
bElse.AddEdgeTo(bAfter)
s.startBlock(bAfter)
return s.variable(n, n.Type)
}
type u322fcvtTab struct {
cvtI2F, cvtF2F ssa.Op
}
var u32_f64 u322fcvtTab = u322fcvtTab{
cvtI2F: ssa.OpCvt32to64F,
cvtF2F: ssa.OpCopy,
}
var u32_f32 u322fcvtTab = u322fcvtTab{
cvtI2F: ssa.OpCvt32to32F,
cvtF2F: ssa.OpCvt64Fto32F,
}
func (s *state) uint32Tofloat64(n *Node, x *ssa.Value, ft, tt *Type) *ssa.Value {
return s.uint32Tofloat(&u32_f64, n, x, ft, tt)
}
func (s *state) uint32Tofloat32(n *Node, x *ssa.Value, ft, tt *Type) *ssa.Value {
return s.uint32Tofloat(&u32_f32, n, x, ft, tt)
}
func (s *state) uint32Tofloat(cvttab *u322fcvtTab, n *Node, x *ssa.Value, ft, tt *Type) *ssa.Value {
// if x >= 0 {
// result = floatY(x)
// } else {
// result = floatY(float64(x) + (1<<32))
// }
cmp := s.newValue2(ssa.OpGeq32, Types[TBOOL], x, s.zeroVal(ft))
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(cmp)
b.Likely = ssa.BranchLikely
bThen := s.f.NewBlock(ssa.BlockPlain)
bElse := s.f.NewBlock(ssa.BlockPlain)
bAfter := s.f.NewBlock(ssa.BlockPlain)
b.AddEdgeTo(bThen)
s.startBlock(bThen)
a0 := s.newValue1(cvttab.cvtI2F, tt, x)
s.vars[n] = a0
s.endBlock()
bThen.AddEdgeTo(bAfter)
b.AddEdgeTo(bElse)
s.startBlock(bElse)
a1 := s.newValue1(ssa.OpCvt32to64F, Types[TFLOAT64], x)
twoToThe32 := s.constFloat64(Types[TFLOAT64], float64(1<<32))
a2 := s.newValue2(ssa.OpAdd64F, Types[TFLOAT64], a1, twoToThe32)
a3 := s.newValue1(cvttab.cvtF2F, tt, a2)
s.vars[n] = a3
s.endBlock()
bElse.AddEdgeTo(bAfter)
s.startBlock(bAfter)
return s.variable(n, n.Type)
}
// referenceTypeBuiltin generates code for the len/cap builtins for maps and channels.
func (s *state) referenceTypeBuiltin(n *Node, x *ssa.Value) *ssa.Value {
if !n.Left.Type.IsMap() && !n.Left.Type.IsChan() {
s.Fatalf("node must be a map or a channel")
}
// if n == nil {
// return 0
// } else {
// // len
// return *((*int)n)
// // cap
// return *(((*int)n)+1)
// }
lenType := n.Type
nilValue := s.constNil(Types[TUINTPTR])
cmp := s.newValue2(ssa.OpEqPtr, Types[TBOOL], x, nilValue)
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(cmp)
b.Likely = ssa.BranchUnlikely
bThen := s.f.NewBlock(ssa.BlockPlain)
bElse := s.f.NewBlock(ssa.BlockPlain)
bAfter := s.f.NewBlock(ssa.BlockPlain)
// length/capacity of a nil map/chan is zero
b.AddEdgeTo(bThen)
s.startBlock(bThen)
s.vars[n] = s.zeroVal(lenType)
s.endBlock()
bThen.AddEdgeTo(bAfter)
b.AddEdgeTo(bElse)
s.startBlock(bElse)
if n.Op == OLEN {
// length is stored in the first word for map/chan
s.vars[n] = s.newValue2(ssa.OpLoad, lenType, x, s.mem())
} else if n.Op == OCAP {
// capacity is stored in the second word for chan
sw := s.newValue1I(ssa.OpOffPtr, lenType.PtrTo(), lenType.Width, x)
s.vars[n] = s.newValue2(ssa.OpLoad, lenType, sw, s.mem())
} else {
s.Fatalf("op must be OLEN or OCAP")
}
s.endBlock()
bElse.AddEdgeTo(bAfter)
s.startBlock(bAfter)
return s.variable(n, lenType)
}
type f2uCvtTab struct {
ltf, cvt2U, subf, or ssa.Op
floatValue func(*state, ssa.Type, float64) *ssa.Value
intValue func(*state, ssa.Type, int64) *ssa.Value
cutoff uint64
}
var f32_u64 f2uCvtTab = f2uCvtTab{
ltf: ssa.OpLess32F,
cvt2U: ssa.OpCvt32Fto64,
subf: ssa.OpSub32F,
or: ssa.OpOr64,
floatValue: (*state).constFloat32,
intValue: (*state).constInt64,
cutoff: 9223372036854775808,
}
var f64_u64 f2uCvtTab = f2uCvtTab{
ltf: ssa.OpLess64F,
cvt2U: ssa.OpCvt64Fto64,
subf: ssa.OpSub64F,
or: ssa.OpOr64,
floatValue: (*state).constFloat64,
intValue: (*state).constInt64,
cutoff: 9223372036854775808,
}
var f32_u32 f2uCvtTab = f2uCvtTab{
ltf: ssa.OpLess32F,
cvt2U: ssa.OpCvt32Fto32,
subf: ssa.OpSub32F,
or: ssa.OpOr32,
floatValue: (*state).constFloat32,
intValue: func(s *state, t ssa.Type, v int64) *ssa.Value { return s.constInt32(t, int32(v)) },
cutoff: 2147483648,
}
var f64_u32 f2uCvtTab = f2uCvtTab{
ltf: ssa.OpLess64F,
cvt2U: ssa.OpCvt64Fto32,
subf: ssa.OpSub64F,
or: ssa.OpOr32,
floatValue: (*state).constFloat64,
intValue: func(s *state, t ssa.Type, v int64) *ssa.Value { return s.constInt32(t, int32(v)) },
cutoff: 2147483648,
}
func (s *state) float32ToUint64(n *Node, x *ssa.Value, ft, tt *Type) *ssa.Value {
return s.floatToUint(&f32_u64, n, x, ft, tt)
}
func (s *state) float64ToUint64(n *Node, x *ssa.Value, ft, tt *Type) *ssa.Value {
return s.floatToUint(&f64_u64, n, x, ft, tt)
}
func (s *state) float32ToUint32(n *Node, x *ssa.Value, ft, tt *Type) *ssa.Value {
return s.floatToUint(&f32_u32, n, x, ft, tt)
}
func (s *state) float64ToUint32(n *Node, x *ssa.Value, ft, tt *Type) *ssa.Value {
return s.floatToUint(&f64_u32, n, x, ft, tt)
}
func (s *state) floatToUint(cvttab *f2uCvtTab, n *Node, x *ssa.Value, ft, tt *Type) *ssa.Value {
// cutoff:=1<<(intY_Size-1)
// if x < floatX(cutoff) {
// result = uintY(x)
// } else {
// y = x - floatX(cutoff)
// z = uintY(y)
// result = z | -(cutoff)
// }
cutoff := cvttab.floatValue(s, ft, float64(cvttab.cutoff))
cmp := s.newValue2(cvttab.ltf, Types[TBOOL], x, cutoff)
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(cmp)
b.Likely = ssa.BranchLikely
bThen := s.f.NewBlock(ssa.BlockPlain)
bElse := s.f.NewBlock(ssa.BlockPlain)
bAfter := s.f.NewBlock(ssa.BlockPlain)
b.AddEdgeTo(bThen)
s.startBlock(bThen)
a0 := s.newValue1(cvttab.cvt2U, tt, x)
s.vars[n] = a0
s.endBlock()
bThen.AddEdgeTo(bAfter)
b.AddEdgeTo(bElse)
s.startBlock(bElse)
y := s.newValue2(cvttab.subf, ft, x, cutoff)
y = s.newValue1(cvttab.cvt2U, tt, y)
z := cvttab.intValue(s, tt, int64(-cvttab.cutoff))
a1 := s.newValue2(cvttab.or, tt, y, z)
s.vars[n] = a1
s.endBlock()
bElse.AddEdgeTo(bAfter)
s.startBlock(bAfter)
return s.variable(n, n.Type)
}
// dottype generates SSA for a type assertion node.
// commaok indicates whether to panic or return a bool.
// If commaok is false, resok will be nil.
func (s *state) dottype(n *Node, commaok bool) (res, resok *ssa.Value) {
iface := s.expr(n.Left) // input interface
target := s.expr(typename(n.Type)) // target type
byteptr := ptrto(Types[TUINT8])
if n.Type.IsInterface() {
if n.Type.IsEmptyInterface() {
// Converting to an empty interface.
// Input could be an empty or nonempty interface.
if Debug_typeassert > 0 {
Warnl(n.Pos, "type assertion inlined")
}
// Get itab/type field from input.
itab := s.newValue1(ssa.OpITab, byteptr, iface)
// Conversion succeeds iff that field is not nil.
cond := s.newValue2(ssa.OpNeqPtr, Types[TBOOL], itab, s.constNil(byteptr))
if n.Left.Type.IsEmptyInterface() && commaok {
// Converting empty interface to empty interface with ,ok is just a nil check.
return iface, cond
}
// Branch on nilness.
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(cond)
b.Likely = ssa.BranchLikely
bOk := s.f.NewBlock(ssa.BlockPlain)
bFail := s.f.NewBlock(ssa.BlockPlain)
b.AddEdgeTo(bOk)
b.AddEdgeTo(bFail)
if !commaok {
// On failure, panic by calling panicnildottype.
s.startBlock(bFail)
s.rtcall(panicnildottype, false, nil, target)
// On success, return (perhaps modified) input interface.
s.startBlock(bOk)
if n.Left.Type.IsEmptyInterface() {
res = iface // Use input interface unchanged.
return
}
// Load type out of itab, build interface with existing idata.
off := s.newValue1I(ssa.OpOffPtr, byteptr, int64(Widthptr), itab)
typ := s.newValue2(ssa.OpLoad, byteptr, off, s.mem())
idata := s.newValue1(ssa.OpIData, n.Type, iface)
res = s.newValue2(ssa.OpIMake, n.Type, typ, idata)
return
}
s.startBlock(bOk)
// nonempty -> empty
// Need to load type from itab
off := s.newValue1I(ssa.OpOffPtr, byteptr, int64(Widthptr), itab)
s.vars[&typVar] = s.newValue2(ssa.OpLoad, byteptr, off, s.mem())
s.endBlock()
// itab is nil, might as well use that as the nil result.
s.startBlock(bFail)
s.vars[&typVar] = itab
s.endBlock()
// Merge point.
bEnd := s.f.NewBlock(ssa.BlockPlain)
bOk.AddEdgeTo(bEnd)
bFail.AddEdgeTo(bEnd)
s.startBlock(bEnd)
idata := s.newValue1(ssa.OpIData, n.Type, iface)
res = s.newValue2(ssa.OpIMake, n.Type, s.variable(&typVar, byteptr), idata)
resok = cond
delete(s.vars, &typVar)
return
}
// converting to a nonempty interface needs a runtime call.
if Debug_typeassert > 0 {
Warnl(n.Pos, "type assertion not inlined")
}
if n.Left.Type.IsEmptyInterface() {
if commaok {
call := s.rtcall(assertE2I2, true, []*Type{n.Type, Types[TBOOL]}, target, iface)
return call[0], call[1]
}
return s.rtcall(assertE2I, true, []*Type{n.Type}, target, iface)[0], nil
}
if commaok {
call := s.rtcall(assertI2I2, true, []*Type{n.Type, Types[TBOOL]}, target, iface)
return call[0], call[1]
}
return s.rtcall(assertI2I, true, []*Type{n.Type}, target, iface)[0], nil
}
if Debug_typeassert > 0 {
Warnl(n.Pos, "type assertion inlined")
}
// Converting to a concrete type.
direct := isdirectiface(n.Type)
itab := s.newValue1(ssa.OpITab, byteptr, iface) // type word of interface
if Debug_typeassert > 0 {
Warnl(n.Pos, "type assertion inlined")
}
var targetITab *ssa.Value
if n.Left.Type.IsEmptyInterface() {
// Looking for pointer to target type.
targetITab = target
} else {
// Looking for pointer to itab for target type and source interface.
targetITab = s.expr(itabname(n.Type, n.Left.Type))
}
var tmp *Node // temporary for use with large types
var addr *ssa.Value // address of tmp
if commaok && !canSSAType(n.Type) {
// unSSAable type, use temporary.
// TODO: get rid of some of these temporaries.
tmp = temp(n.Type)
addr = s.addr(tmp, false)
s.vars[&memVar] = s.newValue1A(ssa.OpVarDef, ssa.TypeMem, tmp, s.mem())
}
cond := s.newValue2(ssa.OpEqPtr, Types[TBOOL], itab, targetITab)
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(cond)
b.Likely = ssa.BranchLikely
bOk := s.f.NewBlock(ssa.BlockPlain)
bFail := s.f.NewBlock(ssa.BlockPlain)
b.AddEdgeTo(bOk)
b.AddEdgeTo(bFail)
if !commaok {
// on failure, panic by calling panicdottype
s.startBlock(bFail)
taddr := s.newValue1A(ssa.OpAddr, byteptr, &ssa.ExternSymbol{Typ: byteptr, Sym: Linksym(typenamesym(n.Left.Type))}, s.sb)
if n.Left.Type.IsEmptyInterface() {
s.rtcall(panicdottypeE, false, nil, itab, target, taddr)
} else {
s.rtcall(panicdottypeI, false, nil, itab, target, taddr)
}
// on success, return data from interface
s.startBlock(bOk)
if direct {
return s.newValue1(ssa.OpIData, n.Type, iface), nil
}
p := s.newValue1(ssa.OpIData, ptrto(n.Type), iface)
return s.newValue2(ssa.OpLoad, n.Type, p, s.mem()), nil
}
// commaok is the more complicated case because we have
// a control flow merge point.
bEnd := s.f.NewBlock(ssa.BlockPlain)
// Note that we need a new valVar each time (unlike okVar where we can
// reuse the variable) because it might have a different type every time.
valVar := &Node{Op: ONAME, Class: Pxxx, Sym: &Sym{Name: "val"}}
// type assertion succeeded
s.startBlock(bOk)
if tmp == nil {
if direct {
s.vars[valVar] = s.newValue1(ssa.OpIData, n.Type, iface)
} else {
p := s.newValue1(ssa.OpIData, ptrto(n.Type), iface)
s.vars[valVar] = s.newValue2(ssa.OpLoad, n.Type, p, s.mem())
}
} else {
p := s.newValue1(ssa.OpIData, ptrto(n.Type), iface)
store := s.newValue3I(ssa.OpMove, ssa.TypeMem, n.Type.Size(), addr, p, s.mem())
store.Aux = n.Type
s.vars[&memVar] = store
}
s.vars[&okVar] = s.constBool(true)
s.endBlock()
bOk.AddEdgeTo(bEnd)
// type assertion failed
s.startBlock(bFail)
if tmp == nil {
s.vars[valVar] = s.zeroVal(n.Type)
} else {
store := s.newValue2I(ssa.OpZero, ssa.TypeMem, n.Type.Size(), addr, s.mem())
store.Aux = n.Type
s.vars[&memVar] = store
}
s.vars[&okVar] = s.constBool(false)
s.endBlock()
bFail.AddEdgeTo(bEnd)
// merge point
s.startBlock(bEnd)
if tmp == nil {
res = s.variable(valVar, n.Type)
delete(s.vars, valVar)
} else {
res = s.newValue2(ssa.OpLoad, n.Type, addr, s.mem())
s.vars[&memVar] = s.newValue1A(ssa.OpVarKill, ssa.TypeMem, tmp, s.mem())
}
resok = s.variable(&okVar, Types[TBOOL])
delete(s.vars, &okVar)
return res, resok
}
// variable returns the value of a variable at the current location.
func (s *state) variable(name *Node, t ssa.Type) *ssa.Value {
v := s.vars[name]
if v != nil {
return v
}
v = s.fwdVars[name]
if v != nil {
return v
}
if s.curBlock == s.f.Entry {
// No variable should be live at entry.
s.Fatalf("Value live at entry. It shouldn't be. func %s, node %v, value %v", s.f.Name, name, v)
}
// Make a FwdRef, which records a value that's live on block input.
// We'll find the matching definition as part of insertPhis.
v = s.newValue0A(ssa.OpFwdRef, t, name)
s.fwdVars[name] = v
s.addNamedValue(name, v)
return v
}
func (s *state) mem() *ssa.Value {
return s.variable(&memVar, ssa.TypeMem)
}
func (s *state) addNamedValue(n *Node, v *ssa.Value) {
if n.Class == Pxxx {
// Don't track our dummy nodes (&memVar etc.).
return
}
if n.IsAutoTmp() {
// Don't track temporary variables.
return
}
if n.Class == PPARAMOUT {
// Don't track named output values. This prevents return values
// from being assigned too early. See #14591 and #14762. TODO: allow this.
return
}
if n.Class == PAUTO && n.Xoffset != 0 {
s.Fatalf("AUTO var with offset %v %d", n, n.Xoffset)
}
loc := ssa.LocalSlot{N: n, Type: n.Type, Off: 0}
values, ok := s.f.NamedValues[loc]
if !ok {
s.f.Names = append(s.f.Names, loc)
}
s.f.NamedValues[loc] = append(values, v)
}
// Branch is an unresolved branch.
type Branch struct {
P *obj.Prog // branch instruction
B *ssa.Block // target
}
// SSAGenState contains state needed during Prog generation.
type SSAGenState struct {
// Branches remembers all the branch instructions we've seen
// and where they would like to go.
Branches []Branch
// bstart remembers where each block starts (indexed by block ID)
bstart []*obj.Prog
// 387 port: maps from SSE registers (REG_X?) to 387 registers (REG_F?)
SSEto387 map[int16]int16
// Some architectures require a 64-bit temporary for FP-related register shuffling. Examples include x86-387, PPC, and Sparc V8.
ScratchFpMem *Node
}
// Pc returns the current Prog.
func (s *SSAGenState) Pc() *obj.Prog {
return pc
}
// SetPos sets the current source position.
2016-12-15 17:17:01 -08:00
func (s *SSAGenState) SetPos(pos src.XPos) {
lineno = pos
}
// genssa appends entries to ptxt for each instruction in f.
// gcargs and gclocals are filled in with pointer maps for the frame.
func genssa(f *ssa.Func, ptxt *obj.Prog, gcargs, gclocals *Sym) {
var s SSAGenState
[dev.ssa] cmd/compile: add GOSSAFUNC and GOSSAPKG These temporary environment variables make it possible to enable using SSA-generated code for a particular function or package without having to rebuild the compiler. This makes it possible to start bulk testing SSA generated code. First, bump up the default stack size (_StackMin in runtime/stack2.go) to something large like 32768, because without stackmaps we can't grow stacks. Then run something like: for pkg in `go list std` do GOGC=off GOSSAPKG=`basename $pkg` go test -a $pkg done When a test fails, you can re-run those tests, selectively enabling one function after another, until you find the one that is causing trouble. Doing this right now yields some interesting results: * There are several packages for which we generate some code and whose tests pass. Yay! * We can generate code for encoding/base64, but tests there fail, so there's a bug to fix. * Attempting to build the runtime yields a panic during codegen: panic: interface conversion: ssa.Location is nil, not *ssa.LocalSlot * The top unimplemented codegen items are (simplified): 59 genValue not implemented: REPMOVSB 18 genValue not implemented: REPSTOSQ 14 genValue not implemented: SUBQ 9 branch not implemented: If v -> b b. Control: XORQconst <bool> [1] 8 genValue not implemented: MOVQstoreidx8 4 branch not implemented: If v -> b b. Control: SETG <bool> 3 branch not implemented: If v -> b b. Control: SETLE <bool> 2 load flags not implemented: LoadReg8 <flags> 2 genValue not implemented: InvertFlags <flags> 1 store flags not implemented: StoreReg8 <flags> 1 branch not implemented: If v -> b b. Control: SETGE <bool> Change-Id: Ib64809ac0c917e25bcae27829ae634c70d290c7f Reviewed-on: https://go-review.googlesource.com/12547 Reviewed-by: Keith Randall <khr@golang.org>
2015-07-22 13:13:53 -07:00
e := f.Config.Frontend().(*ssaExport)
// Remember where each block starts.
s.bstart = make([]*obj.Prog, f.NumBlocks())
var valueProgs map[*obj.Prog]*ssa.Value
var blockProgs map[*obj.Prog]*ssa.Block
var logProgs = e.log
if logProgs {
valueProgs = make(map[*obj.Prog]*ssa.Value, f.NumValues())
blockProgs = make(map[*obj.Prog]*ssa.Block, f.NumBlocks())
f.Logf("genssa %s\n", f.Name)
blockProgs[pc] = f.Blocks[0]
}
if Thearch.Use387 {
s.SSEto387 = map[int16]int16{}
}
s.ScratchFpMem = scratchFpMem
scratchFpMem = nil
// Emit basic blocks
for i, b := range f.Blocks {
s.bstart[b.ID] = pc
// Emit values in block
Thearch.SSAMarkMoves(&s, b)
for _, v := range b.Values {
x := pc
s.SetPos(v.Pos)
switch v.Op {
case ssa.OpInitMem:
// memory arg needs no code
case ssa.OpArg:
// input args need no code
case ssa.OpSP, ssa.OpSB:
// nothing to do
case ssa.OpSelect0, ssa.OpSelect1:
// nothing to do
case ssa.OpGetG:
// nothing to do when there's a g register,
// and checkLower complains if there's not
case ssa.OpVarDef:
Gvardef(v.Aux.(*Node))
case ssa.OpVarKill:
Gvarkill(v.Aux.(*Node))
case ssa.OpVarLive:
Gvarlive(v.Aux.(*Node))
case ssa.OpKeepAlive:
KeepAlive(v)
case ssa.OpPhi:
CheckLoweredPhi(v)
default:
// let the backend handle it
Thearch.SSAGenValue(&s, v)
}
if logProgs {
for ; x != pc; x = x.Link {
valueProgs[x] = v
}
}
}
// Emit control flow instructions for block
var next *ssa.Block
if i < len(f.Blocks)-1 && Debug['N'] == 0 {
// If -N, leave next==nil so every block with successors
// ends in a JMP (except call blocks - plive doesn't like
// select{send,recv} followed by a JMP call). Helps keep
// line numbers for otherwise empty blocks.
next = f.Blocks[i+1]
}
x := pc
Thearch.SSAGenBlock(&s, b, next)
if logProgs {
for ; x != pc; x = x.Link {
blockProgs[x] = b
}
}
}
// Resolve branches
for _, br := range s.Branches {
br.P.To.Val = s.bstart[br.B.ID]
}
if logProgs {
for p := ptxt; p != nil; p = p.Link {
var s string
if v, ok := valueProgs[p]; ok {
s = v.String()
} else if b, ok := blockProgs[p]; ok {
s = b.String()
} else {
s = " " // most value and branch strings are 2-3 characters long
}
f.Logf("%s\t%s\n", s, p)
}
if f.Config.HTML != nil {
[dev.inline] cmd/internal/src: replace src.Pos with syntax.Pos This replaces the src.Pos LineHist-based position tracking with the syntax.Pos implementation and updates all uses. The LineHist table is not used anymore - the respective code is still there but should be removed eventually. CL forthcoming. Passes toolstash -cmp when comparing to the master repo (with the exception of a couple of swapped assembly instructions, likely due to different instruction scheduling because the line-based sorting has changed; though this is won't affect correctness). The sizes of various important compiler data structures have increased significantly (see the various sizes_test.go files); this is probably the reason for an increase of compilation times (to be addressed). Here are the results of compilebench -count 5, run on a "quiet" machine (no apps running besides a terminal): name old time/op new time/op delta Template 256ms ± 1% 280ms ±15% +9.54% (p=0.008 n=5+5) Unicode 132ms ± 1% 132ms ± 1% ~ (p=0.690 n=5+5) GoTypes 891ms ± 1% 917ms ± 2% +2.88% (p=0.008 n=5+5) Compiler 3.84s ± 2% 3.99s ± 2% +3.95% (p=0.016 n=5+5) MakeBash 47.1s ± 1% 47.2s ± 2% ~ (p=0.841 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 326M ± 2% +5.18% (p=0.008 n=5+5) Unicode 165M ± 1% 168M ± 4% ~ (p=0.421 n=5+5) GoTypes 1.14G ± 2% 1.18G ± 1% +3.47% (p=0.008 n=5+5) Compiler 5.00G ± 1% 5.16G ± 1% +3.12% (p=0.008 n=5+5) Change-Id: I241c4246cdff627d7ecb95cac23060b38f9775ec Reviewed-on: https://go-review.googlesource.com/34273 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09 17:15:05 -08:00
// LineHist is defunct now - this code won't do
// anything.
// TODO: fix this (ideally without a global variable)
// saved := ptxt.Ctxt.LineHist.PrintFilenameOnly
// ptxt.Ctxt.LineHist.PrintFilenameOnly = true
var buf bytes.Buffer
buf.WriteString("<code>")
buf.WriteString("<dl class=\"ssa-gen\">")
for p := ptxt; p != nil; p = p.Link {
buf.WriteString("<dt class=\"ssa-prog-src\">")
if v, ok := valueProgs[p]; ok {
buf.WriteString(v.HTML())
} else if b, ok := blockProgs[p]; ok {
buf.WriteString(b.HTML())
}
buf.WriteString("</dt>")
buf.WriteString("<dd class=\"ssa-prog\">")
buf.WriteString(html.EscapeString(p.String()))
buf.WriteString("</dd>")
buf.WriteString("</li>")
}
buf.WriteString("</dl>")
buf.WriteString("</code>")
f.Config.HTML.WriteColumn("genssa", buf.String())
[dev.inline] cmd/internal/src: replace src.Pos with syntax.Pos This replaces the src.Pos LineHist-based position tracking with the syntax.Pos implementation and updates all uses. The LineHist table is not used anymore - the respective code is still there but should be removed eventually. CL forthcoming. Passes toolstash -cmp when comparing to the master repo (with the exception of a couple of swapped assembly instructions, likely due to different instruction scheduling because the line-based sorting has changed; though this is won't affect correctness). The sizes of various important compiler data structures have increased significantly (see the various sizes_test.go files); this is probably the reason for an increase of compilation times (to be addressed). Here are the results of compilebench -count 5, run on a "quiet" machine (no apps running besides a terminal): name old time/op new time/op delta Template 256ms ± 1% 280ms ±15% +9.54% (p=0.008 n=5+5) Unicode 132ms ± 1% 132ms ± 1% ~ (p=0.690 n=5+5) GoTypes 891ms ± 1% 917ms ± 2% +2.88% (p=0.008 n=5+5) Compiler 3.84s ± 2% 3.99s ± 2% +3.95% (p=0.016 n=5+5) MakeBash 47.1s ± 1% 47.2s ± 2% ~ (p=0.841 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 326M ± 2% +5.18% (p=0.008 n=5+5) Unicode 165M ± 1% 168M ± 4% ~ (p=0.421 n=5+5) GoTypes 1.14G ± 2% 1.18G ± 1% +3.47% (p=0.008 n=5+5) Compiler 5.00G ± 1% 5.16G ± 1% +3.12% (p=0.008 n=5+5) Change-Id: I241c4246cdff627d7ecb95cac23060b38f9775ec Reviewed-on: https://go-review.googlesource.com/34273 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09 17:15:05 -08:00
// ptxt.Ctxt.LineHist.PrintFilenameOnly = saved
}
}
// Generate gc bitmaps.
liveness(Curfn, ptxt, gcargs, gclocals)
// Add frame prologue. Zero ambiguously live variables.
Thearch.Defframe(ptxt)
if Debug['f'] != 0 {
frame(0)
}
// Remove leftover instrumentation from the instruction stream.
removevardef(ptxt)
f.Config.HTML.Close()
f.Config.HTML = nil
}
type FloatingEQNEJump struct {
Jump obj.As
Index int
}
func oneFPJump(b *ssa.Block, jumps *FloatingEQNEJump, likely ssa.BranchPrediction, branches []Branch) []Branch {
p := Prog(jumps.Jump)
p.To.Type = obj.TYPE_BRANCH
to := jumps.Index
branches = append(branches, Branch{p, b.Succs[to].Block()})
if to == 1 {
likely = -likely
}
// liblink reorders the instruction stream as it sees fit.
// Pass along what we know so liblink can make use of it.
// TODO: Once we've fully switched to SSA,
// make liblink leave our output alone.
switch likely {
case ssa.BranchUnlikely:
p.From.Type = obj.TYPE_CONST
p.From.Offset = 0
case ssa.BranchLikely:
p.From.Type = obj.TYPE_CONST
p.From.Offset = 1
}
return branches
}
func SSAGenFPJump(s *SSAGenState, b, next *ssa.Block, jumps *[2][2]FloatingEQNEJump) {
likely := b.Likely
switch next {
case b.Succs[0].Block():
s.Branches = oneFPJump(b, &jumps[0][0], likely, s.Branches)
s.Branches = oneFPJump(b, &jumps[0][1], likely, s.Branches)
case b.Succs[1].Block():
s.Branches = oneFPJump(b, &jumps[1][0], likely, s.Branches)
s.Branches = oneFPJump(b, &jumps[1][1], likely, s.Branches)
default:
s.Branches = oneFPJump(b, &jumps[1][0], likely, s.Branches)
s.Branches = oneFPJump(b, &jumps[1][1], likely, s.Branches)
q := Prog(obj.AJMP)
q.To.Type = obj.TYPE_BRANCH
s.Branches = append(s.Branches, Branch{q, b.Succs[1].Block()})
}
}
func AuxOffset(v *ssa.Value) (offset int64) {
if v.Aux == nil {
return 0
}
switch sym := v.Aux.(type) {
case *ssa.AutoSymbol:
n := sym.Node.(*Node)
return n.Xoffset
}
return 0
}
// AddAux adds the offset in the aux fields (AuxInt and Aux) of v to a.
func AddAux(a *obj.Addr, v *ssa.Value) {
AddAux2(a, v, v.AuxInt)
}
func AddAux2(a *obj.Addr, v *ssa.Value, offset int64) {
if a.Type != obj.TYPE_MEM && a.Type != obj.TYPE_ADDR {
v.Fatalf("bad AddAux addr %v", a)
}
// add integer offset
a.Offset += offset
// If no additional symbol offset, we're done.
if v.Aux == nil {
return
}
// Add symbol's offset from its base register.
switch sym := v.Aux.(type) {
case *ssa.ExternSymbol:
a.Name = obj.NAME_EXTERN
a.Sym = sym.Sym
case *ssa.ArgSymbol:
n := sym.Node.(*Node)
a.Name = obj.NAME_PARAM
a.Node = n
a.Sym = Linksym(n.Orig.Sym)
a.Offset += n.Xoffset
case *ssa.AutoSymbol:
n := sym.Node.(*Node)
a.Name = obj.NAME_AUTO
a.Node = n
a.Sym = Linksym(n.Sym)
a.Offset += n.Xoffset
default:
v.Fatalf("aux in %s not implemented %#v", v, v.Aux)
}
}
// extendIndex extends v to a full int width.
// panic using the given function if v does not fit in an int (only on 32-bit archs).
func (s *state) extendIndex(v *ssa.Value, panicfn *obj.LSym) *ssa.Value {
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
size := v.Type.Size()
if size == s.config.IntSize {
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
return v
}
if size > s.config.IntSize {
// truncate 64-bit indexes on 32-bit pointer archs. Test the
// high word and branch to out-of-bounds failure if it is not 0.
if Debug['B'] == 0 {
hi := s.newValue1(ssa.OpInt64Hi, Types[TUINT32], v)
cmp := s.newValue2(ssa.OpEq32, Types[TBOOL], hi, s.constInt32(Types[TUINT32], 0))
s.check(cmp, panicfn)
}
return s.newValue1(ssa.OpTrunc64to32, Types[TINT], v)
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
}
// Extend value to the required size
var op ssa.Op
if v.Type.IsSigned() {
switch 10*size + s.config.IntSize {
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
case 14:
op = ssa.OpSignExt8to32
case 18:
op = ssa.OpSignExt8to64
case 24:
op = ssa.OpSignExt16to32
case 28:
op = ssa.OpSignExt16to64
case 48:
op = ssa.OpSignExt32to64
default:
s.Fatalf("bad signed index extension %s", v.Type)
}
} else {
switch 10*size + s.config.IntSize {
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
case 14:
op = ssa.OpZeroExt8to32
case 18:
op = ssa.OpZeroExt8to64
case 24:
op = ssa.OpZeroExt16to32
case 28:
op = ssa.OpZeroExt16to64
case 48:
op = ssa.OpZeroExt32to64
default:
s.Fatalf("bad unsigned index extension %s", v.Type)
}
}
return s.newValue1(op, Types[TINT], v)
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
}
// CheckLoweredPhi checks that regalloc and stackalloc correctly handled phi values.
// Called during ssaGenValue.
func CheckLoweredPhi(v *ssa.Value) {
if v.Op != ssa.OpPhi {
v.Fatalf("CheckLoweredPhi called with non-phi value: %v", v.LongString())
}
if v.Type.IsMemory() {
return
}
f := v.Block.Func
loc := f.RegAlloc[v.ID]
for _, a := range v.Args {
if aloc := f.RegAlloc[a.ID]; aloc != loc { // TODO: .Equal() instead?
v.Fatalf("phi arg at different location than phi: %v @ %v, but arg %v @ %v\n%s\n", v, loc, a, aloc, v.Block.Func)
}
}
}
// CheckLoweredGetClosurePtr checks that v is the first instruction in the function's entry block.
// The output of LoweredGetClosurePtr is generally hardwired to the correct register.
// That register contains the closure pointer on closure entry.
func CheckLoweredGetClosurePtr(v *ssa.Value) {
entry := v.Block.Func.Entry
if entry != v.Block || entry.Values[0] != v {
Fatalf("in %s, badly placed LoweredGetClosurePtr: %v %v", v.Block.Func.Name, v.Block, v)
}
}
// KeepAlive marks the variable referenced by OpKeepAlive as live.
// Called during ssaGenValue.
func KeepAlive(v *ssa.Value) {
if v.Op != ssa.OpKeepAlive {
v.Fatalf("KeepAlive called with non-KeepAlive value: %v", v.LongString())
}
if !v.Args[0].Type.IsPtrShaped() {
v.Fatalf("keeping non-pointer alive %v", v.Args[0])
}
n, _ := AutoVar(v.Args[0])
if n == nil {
v.Fatalf("KeepAlive with non-spilled value %s %s", v, v.Args[0])
}
// Note: KeepAlive arg may be a small part of a larger variable n. We keep the
// whole variable n alive at this point. (Typically, this happens when
// we are requested to keep the idata portion of an interface{} alive, and
// we end up keeping the whole interface{} alive. That's ok.)
Gvarlive(n)
}
// AutoVar returns a *Node and int64 representing the auto variable and offset within it
// where v should be spilled.
func AutoVar(v *ssa.Value) (*Node, int64) {
loc := v.Block.Func.RegAlloc[v.ID].(ssa.LocalSlot)
if v.Type.Size() > loc.Type.Size() {
v.Fatalf("spill/restore type %s doesn't fit in slot type %s", v.Type, loc.Type)
}
return loc.N.(*Node), loc.Off
}
func AddrAuto(a *obj.Addr, v *ssa.Value) {
n, off := AutoVar(v)
a.Type = obj.TYPE_MEM
a.Node = n
a.Sym = Linksym(n.Sym)
a.Reg = int16(Thearch.REGSP)
a.Offset = n.Xoffset + off
if n.Class == PPARAM || n.Class == PPARAMOUT {
a.Name = obj.NAME_PARAM
} else {
a.Name = obj.NAME_AUTO
}
}
func (s *SSAGenState) AddrScratch(a *obj.Addr) {
if s.ScratchFpMem == nil {
panic("no scratch memory available; forgot to declare usesScratch for Op?")
}
a.Type = obj.TYPE_MEM
a.Name = obj.NAME_AUTO
a.Node = s.ScratchFpMem
a.Sym = Linksym(s.ScratchFpMem.Sym)
a.Reg = int16(Thearch.REGSP)
a.Offset = s.ScratchFpMem.Xoffset
}
func (s *SSAGenState) Call(v *ssa.Value) *obj.Prog {
if sym, _ := v.Aux.(*obj.LSym); sym == Deferreturn {
// Deferred calls will appear to be returning to
// the CALL deferreturn(SB) that we are about to emit.
// However, the stack trace code will show the line
// of the instruction byte before the return PC.
// To avoid that being an unrelated instruction,
// insert an actual hardware NOP that will have the right line number.
// This is different from obj.ANOP, which is a virtual no-op
// that doesn't make it into the instruction stream.
Thearch.Ginsnop()
}
p := Prog(obj.ACALL)
if sym, ok := v.Aux.(*obj.LSym); ok {
p.To.Type = obj.TYPE_MEM
p.To.Name = obj.NAME_EXTERN
p.To.Sym = sym
} else {
// TODO(mdempsky): Can these differences be eliminated?
switch Thearch.LinkArch.Family {
case sys.AMD64, sys.I386, sys.PPC64, sys.S390X:
p.To.Type = obj.TYPE_REG
case sys.ARM, sys.ARM64, sys.MIPS, sys.MIPS64:
p.To.Type = obj.TYPE_MEM
default:
Fatalf("unknown indirect call family")
}
p.To.Reg = v.Args[0].Reg()
}
if Maxarg < v.AuxInt {
Maxarg = v.AuxInt
}
return p
}
// fieldIdx finds the index of the field referred to by the ODOT node n.
func fieldIdx(n *Node) int {
t := n.Left.Type
f := n.Sym
if !t.IsStruct() {
panic("ODOT's LHS is not a struct")
}
var i int
for _, t1 := range t.Fields().Slice() {
if t1.Sym != f {
i++
continue
}
if t1.Offset != n.Xoffset {
panic("field offset doesn't match")
}
return i
}
panic(fmt.Sprintf("can't find field in expr %v\n", n))
// TODO: keep the result of this function somewhere in the ODOT Node
// so we don't have to recompute it each time we need it.
}
// ssaExport exports a bunch of compiler services for the ssa backend.
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
type ssaExport struct {
log bool
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
}
func (s *ssaExport) TypeBool() ssa.Type { return Types[TBOOL] }
func (s *ssaExport) TypeInt8() ssa.Type { return Types[TINT8] }
func (s *ssaExport) TypeInt16() ssa.Type { return Types[TINT16] }
func (s *ssaExport) TypeInt32() ssa.Type { return Types[TINT32] }
func (s *ssaExport) TypeInt64() ssa.Type { return Types[TINT64] }
func (s *ssaExport) TypeUInt8() ssa.Type { return Types[TUINT8] }
func (s *ssaExport) TypeUInt16() ssa.Type { return Types[TUINT16] }
func (s *ssaExport) TypeUInt32() ssa.Type { return Types[TUINT32] }
func (s *ssaExport) TypeUInt64() ssa.Type { return Types[TUINT64] }
func (s *ssaExport) TypeFloat32() ssa.Type { return Types[TFLOAT32] }
func (s *ssaExport) TypeFloat64() ssa.Type { return Types[TFLOAT64] }
func (s *ssaExport) TypeInt() ssa.Type { return Types[TINT] }
func (s *ssaExport) TypeUintptr() ssa.Type { return Types[TUINTPTR] }
func (s *ssaExport) TypeString() ssa.Type { return Types[TSTRING] }
func (s *ssaExport) TypeBytePtr() ssa.Type { return ptrto(Types[TUINT8]) }
// StringData returns a symbol (a *Sym wrapped in an interface) which
// is the data component of a global string constant containing s.
func (*ssaExport) StringData(s string) interface{} {
// TODO: is idealstring correct? It might not matter...
data := stringsym(s)
return &ssa.ExternSymbol{Typ: idealstring, Sym: data}
}
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
func (e *ssaExport) Auto(t ssa.Type) ssa.GCNode {
n := temp(t.(*Type)) // Note: adds new auto to Curfn.Func.Dcl list
return n
}
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
func (e *ssaExport) SplitString(name ssa.LocalSlot) (ssa.LocalSlot, ssa.LocalSlot) {
n := name.N.(*Node)
ptrType := ptrto(Types[TUINT8])
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
lenType := Types[TINT]
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
if n.Class == PAUTO && !n.Addrtaken() {
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
// Split this string up into two separate variables.
p := e.namedAuto(n.Sym.Name+".ptr", ptrType)
l := e.namedAuto(n.Sym.Name+".len", lenType)
return ssa.LocalSlot{N: p, Type: ptrType, Off: 0}, ssa.LocalSlot{N: l, Type: lenType, Off: 0}
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
}
// Return the two parts of the larger variable.
return ssa.LocalSlot{N: n, Type: ptrType, Off: name.Off}, ssa.LocalSlot{N: n, Type: lenType, Off: name.Off + int64(Widthptr)}
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
}
func (e *ssaExport) SplitInterface(name ssa.LocalSlot) (ssa.LocalSlot, ssa.LocalSlot) {
n := name.N.(*Node)
t := ptrto(Types[TUINT8])
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
if n.Class == PAUTO && !n.Addrtaken() {
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
// Split this interface up into two separate variables.
f := ".itab"
if n.Type.IsEmptyInterface() {
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
f = ".type"
}
c := e.namedAuto(n.Sym.Name+f, t)
d := e.namedAuto(n.Sym.Name+".data", t)
return ssa.LocalSlot{N: c, Type: t, Off: 0}, ssa.LocalSlot{N: d, Type: t, Off: 0}
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
}
// Return the two parts of the larger variable.
return ssa.LocalSlot{N: n, Type: t, Off: name.Off}, ssa.LocalSlot{N: n, Type: t, Off: name.Off + int64(Widthptr)}
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
}
func (e *ssaExport) SplitSlice(name ssa.LocalSlot) (ssa.LocalSlot, ssa.LocalSlot, ssa.LocalSlot) {
n := name.N.(*Node)
ptrType := ptrto(name.Type.ElemType().(*Type))
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
lenType := Types[TINT]
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
if n.Class == PAUTO && !n.Addrtaken() {
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
// Split this slice up into three separate variables.
p := e.namedAuto(n.Sym.Name+".ptr", ptrType)
l := e.namedAuto(n.Sym.Name+".len", lenType)
c := e.namedAuto(n.Sym.Name+".cap", lenType)
return ssa.LocalSlot{N: p, Type: ptrType, Off: 0}, ssa.LocalSlot{N: l, Type: lenType, Off: 0}, ssa.LocalSlot{N: c, Type: lenType, Off: 0}
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
}
// Return the three parts of the larger variable.
return ssa.LocalSlot{N: n, Type: ptrType, Off: name.Off},
ssa.LocalSlot{N: n, Type: lenType, Off: name.Off + int64(Widthptr)},
ssa.LocalSlot{N: n, Type: lenType, Off: name.Off + int64(2*Widthptr)}
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
}
func (e *ssaExport) SplitComplex(name ssa.LocalSlot) (ssa.LocalSlot, ssa.LocalSlot) {
n := name.N.(*Node)
s := name.Type.Size() / 2
var t *Type
if s == 8 {
t = Types[TFLOAT64]
} else {
t = Types[TFLOAT32]
}
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
if n.Class == PAUTO && !n.Addrtaken() {
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
// Split this complex up into two separate variables.
c := e.namedAuto(n.Sym.Name+".real", t)
d := e.namedAuto(n.Sym.Name+".imag", t)
return ssa.LocalSlot{N: c, Type: t, Off: 0}, ssa.LocalSlot{N: d, Type: t, Off: 0}
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
}
// Return the two parts of the larger variable.
return ssa.LocalSlot{N: n, Type: t, Off: name.Off}, ssa.LocalSlot{N: n, Type: t, Off: name.Off + s}
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
}
func (e *ssaExport) SplitInt64(name ssa.LocalSlot) (ssa.LocalSlot, ssa.LocalSlot) {
n := name.N.(*Node)
var t *Type
if name.Type.IsSigned() {
t = Types[TINT32]
} else {
t = Types[TUINT32]
}
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
if n.Class == PAUTO && !n.Addrtaken() {
// Split this int64 up into two separate variables.
h := e.namedAuto(n.Sym.Name+".hi", t)
l := e.namedAuto(n.Sym.Name+".lo", Types[TUINT32])
return ssa.LocalSlot{N: h, Type: t, Off: 0}, ssa.LocalSlot{N: l, Type: Types[TUINT32], Off: 0}
}
// Return the two parts of the larger variable.
if Thearch.LinkArch.ByteOrder == binary.BigEndian {
return ssa.LocalSlot{N: n, Type: t, Off: name.Off}, ssa.LocalSlot{N: n, Type: Types[TUINT32], Off: name.Off + 4}
}
return ssa.LocalSlot{N: n, Type: t, Off: name.Off + 4}, ssa.LocalSlot{N: n, Type: Types[TUINT32], Off: name.Off}
}
func (e *ssaExport) SplitStruct(name ssa.LocalSlot, i int) ssa.LocalSlot {
n := name.N.(*Node)
st := name.Type
ft := st.FieldType(i)
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
if n.Class == PAUTO && !n.Addrtaken() {
// Note: the _ field may appear several times. But
// have no fear, identically-named but distinct Autos are
// ok, albeit maybe confusing for a debugger.
x := e.namedAuto(n.Sym.Name+"."+st.FieldName(i), ft)
return ssa.LocalSlot{N: x, Type: ft, Off: 0}
}
return ssa.LocalSlot{N: n, Type: ft, Off: name.Off + st.FieldOff(i)}
}
func (e *ssaExport) SplitArray(name ssa.LocalSlot) ssa.LocalSlot {
n := name.N.(*Node)
at := name.Type
if at.NumElem() != 1 {
Fatalf("bad array size")
}
et := at.ElemType()
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
if n.Class == PAUTO && !n.Addrtaken() {
x := e.namedAuto(n.Sym.Name+"[0]", et)
return ssa.LocalSlot{N: x, Type: et, Off: 0}
}
return ssa.LocalSlot{N: n, Type: et, Off: name.Off}
}
func (e *ssaExport) DerefItab(it *obj.LSym, offset int64) *obj.LSym {
return itabsym(it, offset)
}
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
// namedAuto returns a new AUTO variable with the given name and type.
// These are exposed to the debugger.
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
func (e *ssaExport) namedAuto(name string, typ ssa.Type) ssa.GCNode {
t := typ.(*Type)
s := &Sym{Name: name, Pkg: localpkg}
n := nod(ONAME, nil, nil)
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
s.Def = n
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
s.Def.SetUsed(true)
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
n.Sym = s
n.Type = t
n.Class = PAUTO
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
n.SetAddable(true)
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
n.Esc = EscNever
n.Xoffset = 0
n.Name.Curfn = Curfn
Curfn.Func.Dcl = append(Curfn.Func.Dcl, n)
dowidth(t)
return n
}
func (e *ssaExport) CanSSA(t ssa.Type) bool {
return canSSAType(t.(*Type))
}
2016-12-15 17:17:01 -08:00
func (e *ssaExport) Line(pos src.XPos) string {
[dev.inline] cmd/internal/src: replace src.Pos with syntax.Pos This replaces the src.Pos LineHist-based position tracking with the syntax.Pos implementation and updates all uses. The LineHist table is not used anymore - the respective code is still there but should be removed eventually. CL forthcoming. Passes toolstash -cmp when comparing to the master repo (with the exception of a couple of swapped assembly instructions, likely due to different instruction scheduling because the line-based sorting has changed; though this is won't affect correctness). The sizes of various important compiler data structures have increased significantly (see the various sizes_test.go files); this is probably the reason for an increase of compilation times (to be addressed). Here are the results of compilebench -count 5, run on a "quiet" machine (no apps running besides a terminal): name old time/op new time/op delta Template 256ms ± 1% 280ms ±15% +9.54% (p=0.008 n=5+5) Unicode 132ms ± 1% 132ms ± 1% ~ (p=0.690 n=5+5) GoTypes 891ms ± 1% 917ms ± 2% +2.88% (p=0.008 n=5+5) Compiler 3.84s ± 2% 3.99s ± 2% +3.95% (p=0.016 n=5+5) MakeBash 47.1s ± 1% 47.2s ± 2% ~ (p=0.841 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 326M ± 2% +5.18% (p=0.008 n=5+5) Unicode 165M ± 1% 168M ± 4% ~ (p=0.421 n=5+5) GoTypes 1.14G ± 2% 1.18G ± 1% +3.47% (p=0.008 n=5+5) Compiler 5.00G ± 1% 5.16G ± 1% +3.12% (p=0.008 n=5+5) Change-Id: I241c4246cdff627d7ecb95cac23060b38f9775ec Reviewed-on: https://go-review.googlesource.com/34273 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09 17:15:05 -08:00
return linestr(pos)
}
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
// Log logs a message from the compiler.
func (e *ssaExport) Logf(msg string, args ...interface{}) {
if e.log {
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
fmt.Printf(msg, args...)
}
}
func (e *ssaExport) Log() bool {
return e.log
}
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
// Fatal reports a compiler error and exits.
2016-12-15 17:17:01 -08:00
func (e *ssaExport) Fatalf(pos src.XPos, msg string, args ...interface{}) {
[dev.inline] cmd/internal/src: replace src.Pos with syntax.Pos This replaces the src.Pos LineHist-based position tracking with the syntax.Pos implementation and updates all uses. The LineHist table is not used anymore - the respective code is still there but should be removed eventually. CL forthcoming. Passes toolstash -cmp when comparing to the master repo (with the exception of a couple of swapped assembly instructions, likely due to different instruction scheduling because the line-based sorting has changed; though this is won't affect correctness). The sizes of various important compiler data structures have increased significantly (see the various sizes_test.go files); this is probably the reason for an increase of compilation times (to be addressed). Here are the results of compilebench -count 5, run on a "quiet" machine (no apps running besides a terminal): name old time/op new time/op delta Template 256ms ± 1% 280ms ±15% +9.54% (p=0.008 n=5+5) Unicode 132ms ± 1% 132ms ± 1% ~ (p=0.690 n=5+5) GoTypes 891ms ± 1% 917ms ± 2% +2.88% (p=0.008 n=5+5) Compiler 3.84s ± 2% 3.99s ± 2% +3.95% (p=0.016 n=5+5) MakeBash 47.1s ± 1% 47.2s ± 2% ~ (p=0.841 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 326M ± 2% +5.18% (p=0.008 n=5+5) Unicode 165M ± 1% 168M ± 4% ~ (p=0.421 n=5+5) GoTypes 1.14G ± 2% 1.18G ± 1% +3.47% (p=0.008 n=5+5) Compiler 5.00G ± 1% 5.16G ± 1% +3.12% (p=0.008 n=5+5) Change-Id: I241c4246cdff627d7ecb95cac23060b38f9775ec Reviewed-on: https://go-review.googlesource.com/34273 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09 17:15:05 -08:00
lineno = pos
Fatalf(msg, args...)
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
}
// Error reports a compiler error but keep going.
func (e *ssaExport) Error(pos src.XPos, msg string, args ...interface{}) {
yyerrorl(pos, msg, args...)
}
// Warnl reports a "warning", which is usually flag-triggered
// logging output for the benefit of tests.
2016-12-15 17:17:01 -08:00
func (e *ssaExport) Warnl(pos src.XPos, fmt_ string, args ...interface{}) {
[dev.inline] cmd/internal/src: replace src.Pos with syntax.Pos This replaces the src.Pos LineHist-based position tracking with the syntax.Pos implementation and updates all uses. The LineHist table is not used anymore - the respective code is still there but should be removed eventually. CL forthcoming. Passes toolstash -cmp when comparing to the master repo (with the exception of a couple of swapped assembly instructions, likely due to different instruction scheduling because the line-based sorting has changed; though this is won't affect correctness). The sizes of various important compiler data structures have increased significantly (see the various sizes_test.go files); this is probably the reason for an increase of compilation times (to be addressed). Here are the results of compilebench -count 5, run on a "quiet" machine (no apps running besides a terminal): name old time/op new time/op delta Template 256ms ± 1% 280ms ±15% +9.54% (p=0.008 n=5+5) Unicode 132ms ± 1% 132ms ± 1% ~ (p=0.690 n=5+5) GoTypes 891ms ± 1% 917ms ± 2% +2.88% (p=0.008 n=5+5) Compiler 3.84s ± 2% 3.99s ± 2% +3.95% (p=0.016 n=5+5) MakeBash 47.1s ± 1% 47.2s ± 2% ~ (p=0.841 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 326M ± 2% +5.18% (p=0.008 n=5+5) Unicode 165M ± 1% 168M ± 4% ~ (p=0.421 n=5+5) GoTypes 1.14G ± 2% 1.18G ± 1% +3.47% (p=0.008 n=5+5) Compiler 5.00G ± 1% 5.16G ± 1% +3.12% (p=0.008 n=5+5) Change-Id: I241c4246cdff627d7ecb95cac23060b38f9775ec Reviewed-on: https://go-review.googlesource.com/34273 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09 17:15:05 -08:00
Warnl(pos, fmt_, args...)
}
func (e *ssaExport) Debug_checknil() bool {
return Debug_checknil != 0
}
func (e *ssaExport) Debug_wb() bool {
return Debug_wb != 0
}
func (e *ssaExport) UseWriteBarrier() bool {
return use_writebarrier
}
func (e *ssaExport) Syslook(name string) *obj.LSym {
return Linksym(syslook(name).Sym)
}
func (n *Node) Typ() ssa.Type {
return n.Type
}