go/src/cmd/compile/internal/gc/ssa.go

6114 lines
196 KiB
Go
Raw Normal View History

// Copyright 2015 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package gc
import (
"bufio"
"bytes"
"encoding/binary"
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
"fmt"
"html"
[dev.ssa] cmd/compile: add GOSSAFUNC and GOSSAPKG These temporary environment variables make it possible to enable using SSA-generated code for a particular function or package without having to rebuild the compiler. This makes it possible to start bulk testing SSA generated code. First, bump up the default stack size (_StackMin in runtime/stack2.go) to something large like 32768, because without stackmaps we can't grow stacks. Then run something like: for pkg in `go list std` do GOGC=off GOSSAPKG=`basename $pkg` go test -a $pkg done When a test fails, you can re-run those tests, selectively enabling one function after another, until you find the one that is causing trouble. Doing this right now yields some interesting results: * There are several packages for which we generate some code and whose tests pass. Yay! * We can generate code for encoding/base64, but tests there fail, so there's a bug to fix. * Attempting to build the runtime yields a panic during codegen: panic: interface conversion: ssa.Location is nil, not *ssa.LocalSlot * The top unimplemented codegen items are (simplified): 59 genValue not implemented: REPMOVSB 18 genValue not implemented: REPSTOSQ 14 genValue not implemented: SUBQ 9 branch not implemented: If v -> b b. Control: XORQconst <bool> [1] 8 genValue not implemented: MOVQstoreidx8 4 branch not implemented: If v -> b b. Control: SETG <bool> 3 branch not implemented: If v -> b b. Control: SETLE <bool> 2 load flags not implemented: LoadReg8 <flags> 2 genValue not implemented: InvertFlags <flags> 1 store flags not implemented: StoreReg8 <flags> 1 branch not implemented: If v -> b b. Control: SETGE <bool> Change-Id: Ib64809ac0c917e25bcae27829ae634c70d290c7f Reviewed-on: https://go-review.googlesource.com/12547 Reviewed-by: Keith Randall <khr@golang.org>
2015-07-22 13:13:53 -07:00
"os"
"sort"
"cmd/compile/internal/ssa"
"cmd/compile/internal/types"
"cmd/internal/obj"
"cmd/internal/objabi"
"cmd/internal/src"
"cmd/internal/sys"
)
var ssaConfig *ssa.Config
cmd/compile: add initial backend concurrency support This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-19 08:27:26 -07:00
var ssaCaches []ssa.Cache
var ssaDump string // early copy of $GOSSAFUNC; the func name to dump output for
var ssaDumpStdout bool // whether to dump to stdout
var ssaDumpCFG string // generate CFGs for these phases
const ssaDumpFile = "ssa.html"
// ssaDumpInlined holds all inlined functions when ssaDump contains a function name.
var ssaDumpInlined []*Node
func initssaconfig() {
types_ := ssa.NewTypes()
if thearch.SoftFloat {
softfloatInit()
}
cmd/compile: disable typPtr caching in the backend The only new Types that the backend introduces are pointers to Types generated by the frontend. Usually, when we generate a *T, we cache the resulting Type in T, to avoid recreating it later. However, that caching is not concurrency safe. Rather than add mutexes, this CL disables that caching before starting the backend. The backend generates few enough new *Ts that the performance impact of this is small, particularly if we pre-create some commonly used *Ts. Updates #15756 name old alloc/op new alloc/op delta Template 40.3MB ± 0% 40.4MB ± 0% +0.18% (p=0.001 n=10+10) Unicode 29.8MB ± 0% 29.8MB ± 0% +0.11% (p=0.043 n=10+9) GoTypes 114MB ± 0% 115MB ± 0% +0.33% (p=0.000 n=9+10) SSA 855MB ± 0% 859MB ± 0% +0.40% (p=0.000 n=10+10) Flate 25.7MB ± 0% 25.8MB ± 0% +0.35% (p=0.000 n=10+10) GoParser 31.9MB ± 0% 32.1MB ± 0% +0.58% (p=0.000 n=10+10) Reflect 79.6MB ± 0% 79.9MB ± 0% +0.31% (p=0.000 n=10+10) Tar 26.9MB ± 0% 26.9MB ± 0% +0.21% (p=0.000 n=10+10) XML 42.5MB ± 0% 42.7MB ± 0% +0.52% (p=0.000 n=10+9) name old allocs/op new allocs/op delta Template 394k ± 1% 393k ± 0% ~ (p=0.529 n=10+10) Unicode 319k ± 1% 319k ± 0% ~ (p=0.720 n=10+9) GoTypes 1.15M ± 0% 1.15M ± 0% +0.14% (p=0.035 n=10+10) SSA 7.53M ± 0% 7.56M ± 0% +0.45% (p=0.000 n=9+10) Flate 238k ± 0% 238k ± 1% ~ (p=0.579 n=10+10) GoParser 318k ± 1% 320k ± 1% +0.64% (p=0.001 n=10+10) Reflect 1.00M ± 0% 1.00M ± 0% ~ (p=0.393 n=10+10) Tar 254k ± 0% 254k ± 1% ~ (p=0.075 n=10+10) XML 395k ± 0% 397k ± 0% +0.44% (p=0.001 n=10+9) Change-Id: I6c031ed4f39108f26969c5712b73aa2fc08cd10a Reviewed-on: https://go-review.googlesource.com/38417 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-18 08:11:49 -07:00
// Generate a few pointer types that are uncommon in the frontend but common in the backend.
// Caching is disabled in the backend, so generating these here avoids allocations.
_ = types.NewPtr(types.Types[TINTER]) // *interface{}
_ = types.NewPtr(types.NewPtr(types.Types[TSTRING])) // **string
_ = types.NewPtr(types.NewPtr(types.Idealstring)) // **string
_ = types.NewPtr(types.NewSlice(types.Types[TINTER])) // *[]interface{}
_ = types.NewPtr(types.NewPtr(types.Bytetype)) // **byte
_ = types.NewPtr(types.NewSlice(types.Bytetype)) // *[]byte
_ = types.NewPtr(types.NewSlice(types.Types[TSTRING])) // *[]string
_ = types.NewPtr(types.NewSlice(types.Idealstring)) // *[]string
_ = types.NewPtr(types.NewPtr(types.NewPtr(types.Types[TUINT8]))) // ***uint8
_ = types.NewPtr(types.Types[TINT16]) // *int16
_ = types.NewPtr(types.Types[TINT64]) // *int64
_ = types.NewPtr(types.Errortype) // *error
types.NewPtrCacheEnabled = false
ssaConfig = ssa.NewConfig(thearch.LinkArch.Name, *types_, Ctxt, Debug['N'] == 0)
if thearch.LinkArch.Name == "386" {
ssaConfig.Set387(thearch.Use387)
}
ssaConfig.SoftFloat = thearch.SoftFloat
cmd/compile: fix racy setting of gc's Config.Race ssaConfig.Race was being set by many goroutines concurrently, resulting in a data race seen below. This was very likely introduced by CL 121235. WARNING: DATA RACE Write at 0x00c000344408 by goroutine 12: cmd/compile/internal/gc.buildssa() /workdir/go/src/cmd/compile/internal/gc/ssa.go:134 +0x7a8 cmd/compile/internal/gc.compileSSA() /workdir/go/src/cmd/compile/internal/gc/pgen.go:259 +0x5d cmd/compile/internal/gc.compileFunctions.func2() /workdir/go/src/cmd/compile/internal/gc/pgen.go:323 +0x5a Previous write at 0x00c000344408 by goroutine 11: cmd/compile/internal/gc.buildssa() /workdir/go/src/cmd/compile/internal/gc/ssa.go:134 +0x7a8 cmd/compile/internal/gc.compileSSA() /workdir/go/src/cmd/compile/internal/gc/pgen.go:259 +0x5d cmd/compile/internal/gc.compileFunctions.func2() /workdir/go/src/cmd/compile/internal/gc/pgen.go:323 +0x5a Goroutine 12 (running) created at: cmd/compile/internal/gc.compileFunctions() /workdir/go/src/cmd/compile/internal/gc/pgen.go:321 +0x39b cmd/compile/internal/gc.Main() /workdir/go/src/cmd/compile/internal/gc/main.go:651 +0x437d main.main() /workdir/go/src/cmd/compile/main.go:51 +0x100 Goroutine 11 (running) created at: cmd/compile/internal/gc.compileFunctions() /workdir/go/src/cmd/compile/internal/gc/pgen.go:321 +0x39b cmd/compile/internal/gc.Main() /workdir/go/src/cmd/compile/internal/gc/main.go:651 +0x437d main.main() /workdir/go/src/cmd/compile/main.go:51 +0x100 Instead, set up the field exactly once as part of initssaconfig. Change-Id: I2c30c6b1cf92b8fd98e7cb5c2e10c526467d0b0a Reviewed-on: https://go-review.googlesource.com/130375 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dave Cheney <dave@cheney.net>
2018-08-21 10:26:45 +01:00
ssaConfig.Race = flag_race
cmd/compile: add initial backend concurrency support This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-19 08:27:26 -07:00
ssaCaches = make([]ssa.Cache, nBackendWorkers)
// Set up some runtime functions we'll need to call.
assertE2I = sysfunc("assertE2I")
assertE2I2 = sysfunc("assertE2I2")
assertI2I = sysfunc("assertI2I")
assertI2I2 = sysfunc("assertI2I2")
deferproc = sysfunc("deferproc")
Deferreturn = sysfunc("deferreturn")
cmd/compile, cmd/link: separate stable and internal ABIs This implements compiler and linker support for separating the function calling ABI into two ABIs: a stable and an internal ABI. At the moment, the two ABIs are identical, but we'll be able to evolve the internal ABI without breaking existing assembly code that depends on the stable ABI for calling to and from Go. The Go compiler generates internal ABI symbols for all Go functions. It uses the symabis information produced by the assembler to create ABI wrappers whenever it encounters a body-less Go function that's defined in assembly or a Go function that's referenced from assembly. Since the two ABIs are currently identical, for the moment this is implemented using "ABI alias" symbols, which are just forwarding references to the native ABI symbol for a function. This way there's no actual code involved in the ABI wrapper, which is good because we're not deriving any benefit from it right now. Once the ABIs diverge, we can eliminate ABI aliases. The linker represents these different ABIs internally as different versions of the same symbol. This way, the linker keeps us honest, since every symbol definition and reference also specifies its version. The linker is responsible for resolving ABI aliases. Fixes #27539. Change-Id: I197c52ec9f8fc435db8f7a4259029b20f6d65e95 Reviewed-on: https://go-review.googlesource.com/c/147160 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-11-01 12:30:23 -04:00
Duffcopy = sysvar("duffcopy") // asm func with special ABI
Duffzero = sysvar("duffzero") // asm func with special ABI
gcWriteBarrier = sysvar("gcWriteBarrier") // asm func with special ABI
goschedguarded = sysfunc("goschedguarded")
growslice = sysfunc("growslice")
msanread = sysfunc("msanread")
msanwrite = sysfunc("msanwrite")
cmd/compile: port callnew to ssa conversion This is part of a general effort to shrink walk. In an ideal world, we'd have an SSA op for allocation, but we don't yet have a good mechanism for introducing function calling during SSA compilation. In the meantime, SSA conversion is a better place for it. This also makes it easier to introduce new optimizations; instead of doing the typecheck walk dance, we can simply write what we want the backend to do. I introduced a new opcode in this change because: (a) It avoids a class of bugs involving correctly detecting whether this ONEW is a "before walk" ONEW or an "after walk" ONEW. It also means that using ONEW or ONEWOBJ in the wrong context will generally result in a faster failure. (b) Opcodes are cheap. (c) It provides a better place to put documentation. This change also is also marginally more performant: name old alloc/op new alloc/op delta Template 39.1MB ± 0% 39.0MB ± 0% -0.14% (p=0.008 n=5+5) Unicode 28.4MB ± 0% 28.4MB ± 0% ~ (p=0.421 n=5+5) GoTypes 132MB ± 0% 132MB ± 0% -0.23% (p=0.008 n=5+5) Compiler 608MB ± 0% 607MB ± 0% -0.25% (p=0.008 n=5+5) SSA 2.04GB ± 0% 2.04GB ± 0% -0.01% (p=0.008 n=5+5) Flate 24.4MB ± 0% 24.3MB ± 0% -0.13% (p=0.008 n=5+5) GoParser 29.3MB ± 0% 29.1MB ± 0% -0.54% (p=0.008 n=5+5) Reflect 84.8MB ± 0% 84.7MB ± 0% -0.21% (p=0.008 n=5+5) Tar 36.7MB ± 0% 36.6MB ± 0% -0.10% (p=0.008 n=5+5) XML 48.7MB ± 0% 48.6MB ± 0% -0.24% (p=0.008 n=5+5) [Geo mean] 85.0MB 84.8MB -0.19% name old allocs/op new allocs/op delta Template 383k ± 0% 382k ± 0% -0.26% (p=0.008 n=5+5) Unicode 341k ± 0% 341k ± 0% ~ (p=0.579 n=5+5) GoTypes 1.37M ± 0% 1.36M ± 0% -0.39% (p=0.008 n=5+5) Compiler 5.59M ± 0% 5.56M ± 0% -0.49% (p=0.008 n=5+5) SSA 16.9M ± 0% 16.9M ± 0% -0.03% (p=0.008 n=5+5) Flate 238k ± 0% 238k ± 0% -0.23% (p=0.008 n=5+5) GoParser 306k ± 0% 303k ± 0% -0.93% (p=0.008 n=5+5) Reflect 990k ± 0% 987k ± 0% -0.33% (p=0.008 n=5+5) Tar 356k ± 0% 355k ± 0% -0.20% (p=0.008 n=5+5) XML 444k ± 0% 442k ± 0% -0.45% (p=0.008 n=5+5) [Geo mean] 848k 845k -0.33% Change-Id: I2c36003a7cbf71b53857b7de734852b698f49310 Reviewed-on: https://go-review.googlesource.com/c/go/+/167957 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2019-03-17 07:14:12 -07:00
newobject = sysfunc("newobject")
newproc = sysfunc("newproc")
panicdivide = sysfunc("panicdivide")
panicdottypeE = sysfunc("panicdottypeE")
panicdottypeI = sysfunc("panicdottypeI")
panicnildottype = sysfunc("panicnildottype")
panicoverflow = sysfunc("panicoverflow")
panicshift = sysfunc("panicshift")
raceread = sysfunc("raceread")
racereadrange = sysfunc("racereadrange")
racewrite = sysfunc("racewrite")
racewriterange = sysfunc("racewriterange")
x86HasPOPCNT = sysvar("x86HasPOPCNT") // bool
x86HasSSE41 = sysvar("x86HasSSE41") // bool
arm64HasATOMICS = sysvar("arm64HasATOMICS") // bool
typedmemclr = sysfunc("typedmemclr")
typedmemmove = sysfunc("typedmemmove")
cmd/compile, cmd/link: separate stable and internal ABIs This implements compiler and linker support for separating the function calling ABI into two ABIs: a stable and an internal ABI. At the moment, the two ABIs are identical, but we'll be able to evolve the internal ABI without breaking existing assembly code that depends on the stable ABI for calling to and from Go. The Go compiler generates internal ABI symbols for all Go functions. It uses the symabis information produced by the assembler to create ABI wrappers whenever it encounters a body-less Go function that's defined in assembly or a Go function that's referenced from assembly. Since the two ABIs are currently identical, for the moment this is implemented using "ABI alias" symbols, which are just forwarding references to the native ABI symbol for a function. This way there's no actual code involved in the ABI wrapper, which is good because we're not deriving any benefit from it right now. Once the ABIs diverge, we can eliminate ABI aliases. The linker represents these different ABIs internally as different versions of the same symbol. This way, the linker keeps us honest, since every symbol definition and reference also specifies its version. The linker is responsible for resolving ABI aliases. Fixes #27539. Change-Id: I197c52ec9f8fc435db8f7a4259029b20f6d65e95 Reviewed-on: https://go-review.googlesource.com/c/147160 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-11-01 12:30:23 -04:00
Udiv = sysvar("udiv") // asm func with special ABI
writeBarrier = sysvar("writeBarrier") // struct { bool; ... }
cmd/compile: port callnew to ssa conversion This is part of a general effort to shrink walk. In an ideal world, we'd have an SSA op for allocation, but we don't yet have a good mechanism for introducing function calling during SSA compilation. In the meantime, SSA conversion is a better place for it. This also makes it easier to introduce new optimizations; instead of doing the typecheck walk dance, we can simply write what we want the backend to do. I introduced a new opcode in this change because: (a) It avoids a class of bugs involving correctly detecting whether this ONEW is a "before walk" ONEW or an "after walk" ONEW. It also means that using ONEW or ONEWOBJ in the wrong context will generally result in a faster failure. (b) Opcodes are cheap. (c) It provides a better place to put documentation. This change also is also marginally more performant: name old alloc/op new alloc/op delta Template 39.1MB ± 0% 39.0MB ± 0% -0.14% (p=0.008 n=5+5) Unicode 28.4MB ± 0% 28.4MB ± 0% ~ (p=0.421 n=5+5) GoTypes 132MB ± 0% 132MB ± 0% -0.23% (p=0.008 n=5+5) Compiler 608MB ± 0% 607MB ± 0% -0.25% (p=0.008 n=5+5) SSA 2.04GB ± 0% 2.04GB ± 0% -0.01% (p=0.008 n=5+5) Flate 24.4MB ± 0% 24.3MB ± 0% -0.13% (p=0.008 n=5+5) GoParser 29.3MB ± 0% 29.1MB ± 0% -0.54% (p=0.008 n=5+5) Reflect 84.8MB ± 0% 84.7MB ± 0% -0.21% (p=0.008 n=5+5) Tar 36.7MB ± 0% 36.6MB ± 0% -0.10% (p=0.008 n=5+5) XML 48.7MB ± 0% 48.6MB ± 0% -0.24% (p=0.008 n=5+5) [Geo mean] 85.0MB 84.8MB -0.19% name old allocs/op new allocs/op delta Template 383k ± 0% 382k ± 0% -0.26% (p=0.008 n=5+5) Unicode 341k ± 0% 341k ± 0% ~ (p=0.579 n=5+5) GoTypes 1.37M ± 0% 1.36M ± 0% -0.39% (p=0.008 n=5+5) Compiler 5.59M ± 0% 5.56M ± 0% -0.49% (p=0.008 n=5+5) SSA 16.9M ± 0% 16.9M ± 0% -0.03% (p=0.008 n=5+5) Flate 238k ± 0% 238k ± 0% -0.23% (p=0.008 n=5+5) GoParser 306k ± 0% 303k ± 0% -0.93% (p=0.008 n=5+5) Reflect 990k ± 0% 987k ± 0% -0.33% (p=0.008 n=5+5) Tar 356k ± 0% 355k ± 0% -0.20% (p=0.008 n=5+5) XML 444k ± 0% 442k ± 0% -0.45% (p=0.008 n=5+5) [Geo mean] 848k 845k -0.33% Change-Id: I2c36003a7cbf71b53857b7de734852b698f49310 Reviewed-on: https://go-review.googlesource.com/c/go/+/167957 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2019-03-17 07:14:12 -07:00
zerobaseSym = sysvar("zerobase")
if thearch.LinkArch.Family == sys.Wasm {
BoundsCheckFunc[ssa.BoundsIndex] = sysvar("goPanicIndex")
BoundsCheckFunc[ssa.BoundsIndexU] = sysvar("goPanicIndexU")
BoundsCheckFunc[ssa.BoundsSliceAlen] = sysvar("goPanicSliceAlen")
BoundsCheckFunc[ssa.BoundsSliceAlenU] = sysvar("goPanicSliceAlenU")
BoundsCheckFunc[ssa.BoundsSliceAcap] = sysvar("goPanicSliceAcap")
BoundsCheckFunc[ssa.BoundsSliceAcapU] = sysvar("goPanicSliceAcapU")
BoundsCheckFunc[ssa.BoundsSliceB] = sysvar("goPanicSliceB")
BoundsCheckFunc[ssa.BoundsSliceBU] = sysvar("goPanicSliceBU")
BoundsCheckFunc[ssa.BoundsSlice3Alen] = sysvar("goPanicSlice3Alen")
BoundsCheckFunc[ssa.BoundsSlice3AlenU] = sysvar("goPanicSlice3AlenU")
BoundsCheckFunc[ssa.BoundsSlice3Acap] = sysvar("goPanicSlice3Acap")
BoundsCheckFunc[ssa.BoundsSlice3AcapU] = sysvar("goPanicSlice3AcapU")
BoundsCheckFunc[ssa.BoundsSlice3B] = sysvar("goPanicSlice3B")
BoundsCheckFunc[ssa.BoundsSlice3BU] = sysvar("goPanicSlice3BU")
BoundsCheckFunc[ssa.BoundsSlice3C] = sysvar("goPanicSlice3C")
BoundsCheckFunc[ssa.BoundsSlice3CU] = sysvar("goPanicSlice3CU")
} else {
BoundsCheckFunc[ssa.BoundsIndex] = sysvar("panicIndex")
BoundsCheckFunc[ssa.BoundsIndexU] = sysvar("panicIndexU")
BoundsCheckFunc[ssa.BoundsSliceAlen] = sysvar("panicSliceAlen")
BoundsCheckFunc[ssa.BoundsSliceAlenU] = sysvar("panicSliceAlenU")
BoundsCheckFunc[ssa.BoundsSliceAcap] = sysvar("panicSliceAcap")
BoundsCheckFunc[ssa.BoundsSliceAcapU] = sysvar("panicSliceAcapU")
BoundsCheckFunc[ssa.BoundsSliceB] = sysvar("panicSliceB")
BoundsCheckFunc[ssa.BoundsSliceBU] = sysvar("panicSliceBU")
BoundsCheckFunc[ssa.BoundsSlice3Alen] = sysvar("panicSlice3Alen")
BoundsCheckFunc[ssa.BoundsSlice3AlenU] = sysvar("panicSlice3AlenU")
BoundsCheckFunc[ssa.BoundsSlice3Acap] = sysvar("panicSlice3Acap")
BoundsCheckFunc[ssa.BoundsSlice3AcapU] = sysvar("panicSlice3AcapU")
BoundsCheckFunc[ssa.BoundsSlice3B] = sysvar("panicSlice3B")
BoundsCheckFunc[ssa.BoundsSlice3BU] = sysvar("panicSlice3BU")
BoundsCheckFunc[ssa.BoundsSlice3C] = sysvar("panicSlice3C")
BoundsCheckFunc[ssa.BoundsSlice3CU] = sysvar("panicSlice3CU")
}
if thearch.LinkArch.PtrSize == 4 {
ExtendCheckFunc[ssa.BoundsIndex] = sysvar("panicExtendIndex")
ExtendCheckFunc[ssa.BoundsIndexU] = sysvar("panicExtendIndexU")
ExtendCheckFunc[ssa.BoundsSliceAlen] = sysvar("panicExtendSliceAlen")
ExtendCheckFunc[ssa.BoundsSliceAlenU] = sysvar("panicExtendSliceAlenU")
ExtendCheckFunc[ssa.BoundsSliceAcap] = sysvar("panicExtendSliceAcap")
ExtendCheckFunc[ssa.BoundsSliceAcapU] = sysvar("panicExtendSliceAcapU")
ExtendCheckFunc[ssa.BoundsSliceB] = sysvar("panicExtendSliceB")
ExtendCheckFunc[ssa.BoundsSliceBU] = sysvar("panicExtendSliceBU")
ExtendCheckFunc[ssa.BoundsSlice3Alen] = sysvar("panicExtendSlice3Alen")
ExtendCheckFunc[ssa.BoundsSlice3AlenU] = sysvar("panicExtendSlice3AlenU")
ExtendCheckFunc[ssa.BoundsSlice3Acap] = sysvar("panicExtendSlice3Acap")
ExtendCheckFunc[ssa.BoundsSlice3AcapU] = sysvar("panicExtendSlice3AcapU")
ExtendCheckFunc[ssa.BoundsSlice3B] = sysvar("panicExtendSlice3B")
ExtendCheckFunc[ssa.BoundsSlice3BU] = sysvar("panicExtendSlice3BU")
ExtendCheckFunc[ssa.BoundsSlice3C] = sysvar("panicExtendSlice3C")
ExtendCheckFunc[ssa.BoundsSlice3CU] = sysvar("panicExtendSlice3CU")
}
cmd/compile, cmd/link: separate stable and internal ABIs This implements compiler and linker support for separating the function calling ABI into two ABIs: a stable and an internal ABI. At the moment, the two ABIs are identical, but we'll be able to evolve the internal ABI without breaking existing assembly code that depends on the stable ABI for calling to and from Go. The Go compiler generates internal ABI symbols for all Go functions. It uses the symabis information produced by the assembler to create ABI wrappers whenever it encounters a body-less Go function that's defined in assembly or a Go function that's referenced from assembly. Since the two ABIs are currently identical, for the moment this is implemented using "ABI alias" symbols, which are just forwarding references to the native ABI symbol for a function. This way there's no actual code involved in the ABI wrapper, which is good because we're not deriving any benefit from it right now. Once the ABIs diverge, we can eliminate ABI aliases. The linker represents these different ABIs internally as different versions of the same symbol. This way, the linker keeps us honest, since every symbol definition and reference also specifies its version. The linker is responsible for resolving ABI aliases. Fixes #27539. Change-Id: I197c52ec9f8fc435db8f7a4259029b20f6d65e95 Reviewed-on: https://go-review.googlesource.com/c/147160 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-11-01 12:30:23 -04:00
// GO386=387 runtime definitions
ControlWord64trunc = sysvar("controlWord64trunc") // uint16
ControlWord32 = sysvar("controlWord32") // uint16
cmd/compile, cmd/link: separate stable and internal ABIs This implements compiler and linker support for separating the function calling ABI into two ABIs: a stable and an internal ABI. At the moment, the two ABIs are identical, but we'll be able to evolve the internal ABI without breaking existing assembly code that depends on the stable ABI for calling to and from Go. The Go compiler generates internal ABI symbols for all Go functions. It uses the symabis information produced by the assembler to create ABI wrappers whenever it encounters a body-less Go function that's defined in assembly or a Go function that's referenced from assembly. Since the two ABIs are currently identical, for the moment this is implemented using "ABI alias" symbols, which are just forwarding references to the native ABI symbol for a function. This way there's no actual code involved in the ABI wrapper, which is good because we're not deriving any benefit from it right now. Once the ABIs diverge, we can eliminate ABI aliases. The linker represents these different ABIs internally as different versions of the same symbol. This way, the linker keeps us honest, since every symbol definition and reference also specifies its version. The linker is responsible for resolving ABI aliases. Fixes #27539. Change-Id: I197c52ec9f8fc435db8f7a4259029b20f6d65e95 Reviewed-on: https://go-review.googlesource.com/c/147160 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-11-01 12:30:23 -04:00
// Wasm (all asm funcs with special ABIs)
WasmMove = sysvar("wasmMove")
WasmZero = sysvar("wasmZero")
WasmDiv = sysvar("wasmDiv")
WasmTruncS = sysvar("wasmTruncS")
WasmTruncU = sysvar("wasmTruncU")
cmd/compile: separate data and function LSyms Currently, obj.Ctxt's symbol table does not distinguish between ABI0 and ABIInternal symbols. This is *almost* okay, since a given symbol name in the final object file is only going to belong to one ABI or the other, but it requires that the compiler mark a Sym as being a function symbol before it retrieves its LSym. If it retrieves the LSym first, that LSym will be created as ABI0, and later marking the Sym as a function symbol won't change the LSym's ABI. Marking a Sym as a function symbol before looking up its LSym sounds easy, except Syms have a dual purpose: they are used just as interned strings (every function, variable, parameter, etc with the same textual name shares a Sym), and *also* to store state for whatever package global has that name. As a result, it's easy to slip up and look up an LSym when a Sym is serving as the name of a local variable, and then later mark it as a function when it's serving as the global with the name. In general, we were careful to avoid this, but #29610 demonstrates one case where we messed up. Because of on-demand importing from indexed export data, it's possible to compile a method wrapper for a type imported from another package before importing an init function from that package. If the argument of the method is named "init", the "init" LSym will be created as a data symbol when compiling the wrapper, before it gets marked as a function symbol. To fix this, we separate obj.Ctxt's symbol tables for ABI0 and ABIInternal symbols. This way, the compiler will simply get a different LSym once the Sym takes on its package-global meaning as a function. This fixes the above ordering issue, and means we no longer need to go out of our way to create the "init" function early and mark it as a function symbol. Fixes #29610. Updates #27539. Change-Id: Id9458b40017893d46ef9e4a3f9b47fc49e1ce8df Reviewed-on: https://go-review.googlesource.com/c/157017 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2019-01-08 22:23:52 -05:00
SigPanic = sysfunc("sigpanic")
}
cmd/compile: add initial backend concurrency support This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-19 08:27:26 -07:00
// buildssa builds an SSA function for fn.
// worker indicates which of the backend workers is doing the processing.
func buildssa(fn *Node, worker int) *ssa.Func {
name := fn.funcname()
printssa := name == ssaDump
var astBuf *bytes.Buffer
if printssa {
astBuf = &bytes.Buffer{}
fdumplist(astBuf, "buildssa-enter", fn.Func.Enter)
fdumplist(astBuf, "buildssa-body", fn.Nbody)
fdumplist(astBuf, "buildssa-exit", fn.Func.Exit)
if ssaDumpStdout {
fmt.Println("generating SSA for", name)
fmt.Print(astBuf.String())
}
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
}
var s state
s.pushLine(fn.Pos)
defer s.popLine()
s.hasdefer = fn.Func.HasDefer()
if fn.Func.Pragma&CgoUnsafeArgs != 0 {
s.cgoUnsafeArgs = true
}
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
fe := ssafn{
curfn: fn,
log: printssa && ssaDumpStdout,
}
s.curfn = fn
s.f = ssa.NewFunc(&fe)
s.config = ssaConfig
s.f.Type = fn.Type
s.f.Config = ssaConfig
cmd/compile: add initial backend concurrency support This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-19 08:27:26 -07:00
s.f.Cache = &ssaCaches[worker]
s.f.Cache.Reset()
s.f.DebugTest = s.f.DebugHashMatch("GOSSAHASH", name)
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
s.f.Name = name
s.f.PrintOrHtmlSSA = printssa
if fn.Func.Pragma&Nosplit != 0 {
s.f.NoSplit = true
}
s.panics = map[funcLine]*ssa.Block{}
s.softFloat = s.config.SoftFloat
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
if printssa {
s.f.HTMLWriter = ssa.NewHTMLWriter(ssaDumpFile, s.f.Frontend(), name, ssaDumpCFG)
// TODO: generate and print a mapping from nodes to values and blocks
dumpSourcesColumn(s.f.HTMLWriter, fn)
s.f.HTMLWriter.WriteAST("AST", astBuf)
}
// Allocate starting block
s.f.Entry = s.f.NewBlock(ssa.BlockPlain)
// Allocate starting values
s.labels = map[string]*ssaLabel{}
s.labeledNodes = map[*Node]*ssaLabel{}
s.fwdVars = map[*Node]*ssa.Value{}
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
s.startmem = s.entryNewValue0(ssa.OpInitMem, types.TypeMem)
s.sp = s.entryNewValue0(ssa.OpSP, types.Types[TUINTPTR]) // TODO: use generic pointer type (unsafe.Pointer?) instead
s.sb = s.entryNewValue0(ssa.OpSB, types.Types[TUINTPTR])
s.startBlock(s.f.Entry)
s.vars[&memVar] = s.startmem
// Generate addresses of local declarations
s.decladdrs = map[*Node]*ssa.Value{}
for _, n := range fn.Func.Dcl {
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
switch n.Class() {
case PPARAM, PPARAMOUT:
s.decladdrs[n] = s.entryNewValue2A(ssa.OpLocalAddr, types.NewPtr(n.Type), n, s.sp, s.startmem)
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if n.Class() == PPARAMOUT && s.canSSA(n) {
// Save ssa-able PPARAMOUT variables so we can
// store them back to the stack at the end of
// the function.
s.returns = append(s.returns, n)
}
case PAUTO:
// processed at each use, to prevent Addr coming
// before the decl.
cmd/compile: fix liveness computation for heap-escaped parameters The liveness computation of parameters generally was never correct, but forcing all parameters to be live throughout the function covered up that problem. The new SSA back end is too clever: even though it currently keeps the parameter values live throughout the function, it may find optimizations that mean the current values are not written back to the original parameter stack slots immediately or ever (for example if a parameter is set to nil, SSA constant propagation may replace all later uses of the parameter with a constant nil, eliminating the need to write the nil value back to the stack slot), so the liveness code must now track the actual operations on the stack slots, exposing these problems. One small problem in the handling of arguments is that nodarg can return ONAME PPARAM nodes with adjusted offsets, so that there are actually multiple *Node pointers for the same parameter in the instruction stream. This might be possible to correct, but not in this CL. For now, we fix this by using n.Orig instead of n when considering PPARAM and PPARAMOUT nodes. The major problem in the handling of arguments is general confusion in the liveness code about the meaning of PPARAM|PHEAP and PPARAMOUT|PHEAP nodes, especially as contrasted with PAUTO|PHEAP. The difference between these two is that when a local variable "moves" to the heap, it's really just allocated there to start with; in contrast, when an argument moves to the heap, the actual data has to be copied there from the stack at the beginning of the function, and when a result "moves" to the heap the value in the heap has to be copied back to the stack when the function returns This general confusion is also present in the SSA back end. The PHEAP bit worked decently when I first introduced it 7 years ago (!) in 391425ae. The back end did nothing sophisticated, and in particular there was no analysis at all: no escape analysis, no liveness analysis, and certainly no SSA back end. But the complications caused in the various downstream consumers suggest that this should be a detail kept mainly in the front end. This CL therefore eliminates both the PHEAP bit and even the idea of "heap variables" from the back ends. First, it replaces the PPARAM|PHEAP, PPARAMOUT|PHEAP, and PAUTO|PHEAP variable classes with the single PAUTOHEAP, a pseudo-class indicating a variable maintained on the heap and available by indirecting a local variable kept on the stack (a plain PAUTO). Second, walkexpr replaces all references to PAUTOHEAP variables with indirections of the corresponding PAUTO variable. The back ends and the liveness code now just see plain indirected variables. This may actually produce better code, but the real goal here is to eliminate these little-used and somewhat suspect code paths in the back end analyses. The OPARAM node type goes away too. A followup CL will do the same to PPARAMREF. I'm not sure that the back ends (SSA in particular) are handling those right either, and with the framework established in this CL that change is trivial and the result clearly more correct. Fixes #15747. Change-Id: I2770b1ce3cbc93981bfc7166be66a9da12013d74 Reviewed-on: https://go-review.googlesource.com/23393 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-25 01:33:24 -04:00
case PAUTOHEAP:
// moved to heap - already handled by frontend
case PFUNC:
// local function - already handled by frontend
default:
s.Fatalf("local variable with class %v unimplemented", n.Class())
}
}
// Populate SSAable arguments.
for _, n := range fn.Func.Dcl {
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if n.Class() == PPARAM && s.canSSA(n) {
v := s.newValue0A(ssa.OpArg, n.Type, n)
s.vars[n] = v
s.addNamedValue(n, v) // This helps with debugging information, not needed for compilation itself.
}
}
// Convert the AST-based IR to the SSA-based IR
s.stmtList(fn.Func.Enter)
s.stmtList(fn.Nbody)
// fallthrough to exit
if s.curBlock != nil {
s.pushLine(fn.Func.Endlineno)
s.exit()
s.popLine()
}
for _, b := range s.f.Blocks {
if b.Pos != src.NoXPos {
s.updateUnsetPredPos(b)
}
}
s.insertPhis()
// Main call to ssa package to compile function
ssa.Compile(s.f)
return s.f
}
func dumpSourcesColumn(writer *ssa.HTMLWriter, fn *Node) {
// Read sources of target function fn.
fname := Ctxt.PosTable.Pos(fn.Pos).Filename()
targetFn, err := readFuncLines(fname, fn.Pos.Line(), fn.Func.Endlineno.Line())
if err != nil {
writer.Logger.Logf("cannot read sources for function %v: %v", fn, err)
}
// Read sources of inlined functions.
var inlFns []*ssa.FuncLines
for _, fi := range ssaDumpInlined {
var elno src.XPos
if fi.Name.Defn == nil {
// Endlineno is filled from exported data.
elno = fi.Func.Endlineno
} else {
elno = fi.Name.Defn.Func.Endlineno
}
fname := Ctxt.PosTable.Pos(fi.Pos).Filename()
fnLines, err := readFuncLines(fname, fi.Pos.Line(), elno.Line())
if err != nil {
writer.Logger.Logf("cannot read sources for function %v: %v", fi, err)
continue
}
inlFns = append(inlFns, fnLines)
}
sort.Sort(ssa.ByTopo(inlFns))
if targetFn != nil {
inlFns = append([]*ssa.FuncLines{targetFn}, inlFns...)
}
writer.WriteSources("sources", inlFns)
}
func readFuncLines(file string, start, end uint) (*ssa.FuncLines, error) {
f, err := os.Open(os.ExpandEnv(file))
if err != nil {
return nil, err
}
defer f.Close()
var lines []string
ln := uint(1)
scanner := bufio.NewScanner(f)
for scanner.Scan() && ln <= end {
if ln >= start {
lines = append(lines, scanner.Text())
}
ln++
}
return &ssa.FuncLines{Filename: file, StartLineno: start, Lines: lines}, nil
}
// updateUnsetPredPos propagates the earliest-value position information for b
// towards all of b's predecessors that need a position, and recurs on that
// predecessor if its position is updated. B should have a non-empty position.
func (s *state) updateUnsetPredPos(b *ssa.Block) {
if b.Pos == src.NoXPos {
s.Fatalf("Block %s should have a position", b)
}
bestPos := src.NoXPos
for _, e := range b.Preds {
p := e.Block()
if !p.LackingPos() {
continue
}
if bestPos == src.NoXPos {
bestPos = b.Pos
for _, v := range b.Values {
if v.LackingPos() {
continue
}
if v.Pos != src.NoXPos {
// Assume values are still in roughly textual order;
// TODO: could also seek minimum position?
bestPos = v.Pos
break
}
}
}
p.Pos = bestPos
s.updateUnsetPredPos(p) // We do not expect long chains of these, thus recursion is okay.
}
}
type state struct {
// configuration (arch) information
config *ssa.Config
// function we're building
f *ssa.Func
// Node for function
curfn *Node
// labels and labeled control flow nodes (OFOR, OFORUNTIL, OSWITCH, OSELECT) in f
labels map[string]*ssaLabel
labeledNodes map[*Node]*ssaLabel
// unlabeled break and continue statement tracking
breakTo *ssa.Block // current target for plain break statement
continueTo *ssa.Block // current target for plain continue statement
// current location where we're interpreting the AST
curBlock *ssa.Block
// variable assignments in the current block (map from variable symbol to ssa value)
// *Node is the unique identifier (an ONAME Node) for the variable.
// TODO: keep a single varnum map, then make all of these maps slices instead?
vars map[*Node]*ssa.Value
// fwdVars are variables that are used before they are defined in the current block.
// This map exists just to coalesce multiple references into a single FwdRef op.
// *Node is the unique identifier (an ONAME Node) for the variable.
fwdVars map[*Node]*ssa.Value
// all defined variables at the end of each block. Indexed by block ID.
defvars []map[*Node]*ssa.Value
// addresses of PPARAM and PPARAMOUT variables.
decladdrs map[*Node]*ssa.Value
// starting values. Memory, stack pointer, and globals pointer
startmem *ssa.Value
sp *ssa.Value
sb *ssa.Value
// line number stack. The current line number is top of stack
2016-12-15 17:17:01 -08:00
line []src.XPos
// the last line number processed; it may have been popped
lastPos src.XPos
// list of panic calls by function name and line number.
// Used to deduplicate panic calls.
panics map[funcLine]*ssa.Block
cmd/compile: fix liveness computation for heap-escaped parameters The liveness computation of parameters generally was never correct, but forcing all parameters to be live throughout the function covered up that problem. The new SSA back end is too clever: even though it currently keeps the parameter values live throughout the function, it may find optimizations that mean the current values are not written back to the original parameter stack slots immediately or ever (for example if a parameter is set to nil, SSA constant propagation may replace all later uses of the parameter with a constant nil, eliminating the need to write the nil value back to the stack slot), so the liveness code must now track the actual operations on the stack slots, exposing these problems. One small problem in the handling of arguments is that nodarg can return ONAME PPARAM nodes with adjusted offsets, so that there are actually multiple *Node pointers for the same parameter in the instruction stream. This might be possible to correct, but not in this CL. For now, we fix this by using n.Orig instead of n when considering PPARAM and PPARAMOUT nodes. The major problem in the handling of arguments is general confusion in the liveness code about the meaning of PPARAM|PHEAP and PPARAMOUT|PHEAP nodes, especially as contrasted with PAUTO|PHEAP. The difference between these two is that when a local variable "moves" to the heap, it's really just allocated there to start with; in contrast, when an argument moves to the heap, the actual data has to be copied there from the stack at the beginning of the function, and when a result "moves" to the heap the value in the heap has to be copied back to the stack when the function returns This general confusion is also present in the SSA back end. The PHEAP bit worked decently when I first introduced it 7 years ago (!) in 391425ae. The back end did nothing sophisticated, and in particular there was no analysis at all: no escape analysis, no liveness analysis, and certainly no SSA back end. But the complications caused in the various downstream consumers suggest that this should be a detail kept mainly in the front end. This CL therefore eliminates both the PHEAP bit and even the idea of "heap variables" from the back ends. First, it replaces the PPARAM|PHEAP, PPARAMOUT|PHEAP, and PAUTO|PHEAP variable classes with the single PAUTOHEAP, a pseudo-class indicating a variable maintained on the heap and available by indirecting a local variable kept on the stack (a plain PAUTO). Second, walkexpr replaces all references to PAUTOHEAP variables with indirections of the corresponding PAUTO variable. The back ends and the liveness code now just see plain indirected variables. This may actually produce better code, but the real goal here is to eliminate these little-used and somewhat suspect code paths in the back end analyses. The OPARAM node type goes away too. A followup CL will do the same to PPARAMREF. I'm not sure that the back ends (SSA in particular) are handling those right either, and with the framework established in this CL that change is trivial and the result clearly more correct. Fixes #15747. Change-Id: I2770b1ce3cbc93981bfc7166be66a9da12013d74 Reviewed-on: https://go-review.googlesource.com/23393 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-25 01:33:24 -04:00
// list of PPARAMOUT (return) variables.
returns []*Node
cgoUnsafeArgs bool
hasdefer bool // whether the function contains a defer statement
softFloat bool
}
type funcLine struct {
f *obj.LSym
base *src.PosBase
cmd/compile: restore panic deduplication The switch to detailed position information broke the removal of duplicate panics on the same line. Restore it. Neutral compiler performance impact: name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.8MB ± 0% ~ (p=0.690 n=5+5) Unicode 28.7MB ± 0% 28.7MB ± 0% +0.13% (p=0.032 n=5+5) GoTypes 109MB ± 0% 109MB ± 0% ~ (p=1.000 n=5+5) Compiler 457MB ± 0% 457MB ± 0% ~ (p=0.151 n=5+5) SSA 1.09GB ± 0% 1.10GB ± 0% +0.17% (p=0.008 n=5+5) Flate 24.6MB ± 0% 24.5MB ± 0% -0.35% (p=0.008 n=5+5) GoParser 30.9MB ± 0% 31.0MB ± 0% ~ (p=0.421 n=5+5) Reflect 73.4MB ± 0% 73.4MB ± 0% ~ (p=0.056 n=5+5) Tar 25.6MB ± 0% 25.5MB ± 0% -0.61% (p=0.008 n=5+5) XML 40.9MB ± 0% 40.9MB ± 0% ~ (p=0.841 n=5+5) [Geo mean] 71.6MB 71.6MB -0.07% name old allocs/op new allocs/op delta Template 394k ± 0% 395k ± 1% ~ (p=0.151 n=5+5) Unicode 343k ± 0% 344k ± 0% +0.38% (p=0.032 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=1.000 n=5+5) Compiler 4.41M ± 0% 4.42M ± 0% ~ (p=0.151 n=5+5) SSA 9.79M ± 0% 9.79M ± 0% ~ (p=0.690 n=5+5) Flate 238k ± 1% 238k ± 0% ~ (p=0.151 n=5+5) GoParser 321k ± 0% 321k ± 1% ~ (p=0.548 n=5+5) Reflect 958k ± 0% 957k ± 0% ~ (p=0.841 n=5+5) Tar 252k ± 0% 252k ± 1% ~ (p=0.151 n=5+5) XML 401k ± 0% 400k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 742k +0.08% Reduces object files a little bit: name old object-bytes new object-bytes delta Template 386k ± 0% 386k ± 0% -0.04% (p=0.008 n=5+5) Unicode 202k ± 0% 202k ± 0% ~ (all equal) GoTypes 1.16M ± 0% 1.16M ± 0% -0.04% (p=0.008 n=5+5) Compiler 3.91M ± 0% 3.91M ± 0% -0.08% (p=0.008 n=5+5) SSA 7.91M ± 0% 7.91M ± 0% -0.04% (p=0.008 n=5+5) Flate 228k ± 0% 227k ± 0% -0.28% (p=0.008 n=5+5) GoParser 283k ± 0% 283k ± 0% -0.01% (p=0.008 n=5+5) Reflect 952k ± 0% 951k ± 0% -0.03% (p=0.008 n=5+5) Tar 188k ± 0% 187k ± 0% -0.09% (p=0.008 n=5+5) XML 406k ± 0% 406k ± 0% -0.04% (p=0.008 n=5+5) [Geo mean] 648k 648k -0.06% This was discovered in the context for the Fannkuch benchmark. It shrinks the number of panicindex calls in that function from 13 back to 9, their 1.8.1 level. It shrinks the function text a bit, from 829 to 801 bytes. It slows down execution a little, presumably due to alignment (?). name old time/op new time/op delta Fannkuch11-8 2.68s ± 2% 2.74s ± 1% +2.09% (p=0.000 n=19+20) After this CL, 1.8.1 and tip are identical: name old time/op new time/op delta Fannkuch11-8 2.74s ± 2% 2.74s ± 1% ~ (p=0.301 n=20+20) Fixes #20332 Change-Id: I2aeacc3e8cf2ac1ff10f36c572a27856f4f8f7c9 Reviewed-on: https://go-review.googlesource.com/43291 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-05-11 11:41:02 -07:00
line uint
}
type ssaLabel struct {
target *ssa.Block // block identified by this label
breakTarget *ssa.Block // block to break to in control flow node identified by this label
continueTarget *ssa.Block // block to continue to in control flow node identified by this label
}
// label returns the label associated with sym, creating it if necessary.
func (s *state) label(sym *types.Sym) *ssaLabel {
lab := s.labels[sym.Name]
if lab == nil {
lab = new(ssaLabel)
s.labels[sym.Name] = lab
}
return lab
}
func (s *state) Logf(msg string, args ...interface{}) { s.f.Logf(msg, args...) }
func (s *state) Log() bool { return s.f.Log() }
func (s *state) Fatalf(msg string, args ...interface{}) {
s.f.Frontend().Fatalf(s.peekPos(), msg, args...)
}
func (s *state) Warnl(pos src.XPos, msg string, args ...interface{}) { s.f.Warnl(pos, msg, args...) }
func (s *state) Debug_checknil() bool { return s.f.Frontend().Debug_checknil() }
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
var (
// dummy node for the memory variable
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
memVar = Node{Op: ONAME, Sym: &types.Sym{Name: "mem"}}
// dummy nodes for temporary variables
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
ptrVar = Node{Op: ONAME, Sym: &types.Sym{Name: "ptr"}}
lenVar = Node{Op: ONAME, Sym: &types.Sym{Name: "len"}}
newlenVar = Node{Op: ONAME, Sym: &types.Sym{Name: "newlen"}}
capVar = Node{Op: ONAME, Sym: &types.Sym{Name: "cap"}}
typVar = Node{Op: ONAME, Sym: &types.Sym{Name: "typ"}}
okVar = Node{Op: ONAME, Sym: &types.Sym{Name: "ok"}}
)
// startBlock sets the current block we're generating code in to b.
func (s *state) startBlock(b *ssa.Block) {
if s.curBlock != nil {
s.Fatalf("starting block %v when block %v has not ended", b, s.curBlock)
}
s.curBlock = b
s.vars = map[*Node]*ssa.Value{}
for n := range s.fwdVars {
delete(s.fwdVars, n)
}
}
// endBlock marks the end of generating code for the current block.
// Returns the (former) current block. Returns nil if there is no current
// block, i.e. if no code flows to the current execution point.
func (s *state) endBlock() *ssa.Block {
b := s.curBlock
if b == nil {
return nil
}
for len(s.defvars) <= int(b.ID) {
s.defvars = append(s.defvars, nil)
}
s.defvars[b.ID] = s.vars
s.curBlock = nil
s.vars = nil
if b.LackingPos() {
// Empty plain blocks get the line of their successor (handled after all blocks created),
// except for increment blocks in For statements (handled in ssa conversion of OFOR),
// and for blocks ending in GOTO/BREAK/CONTINUE.
b.Pos = src.NoXPos
} else {
b.Pos = s.lastPos
}
return b
}
// pushLine pushes a line number on the line number stack.
2016-12-15 17:17:01 -08:00
func (s *state) pushLine(line src.XPos) {
if !line.IsKnown() {
// the frontend may emit node with line number missing,
// use the parent line number in this case.
line = s.peekPos()
if Debug['K'] != 0 {
Warn("buildssa: unknown position (line 0)")
}
} else {
s.lastPos = line
}
s.line = append(s.line, line)
}
// popLine pops the top of the line number stack.
func (s *state) popLine() {
s.line = s.line[:len(s.line)-1]
}
// peekPos peeks the top of the line number stack.
2016-12-15 17:17:01 -08:00
func (s *state) peekPos() src.XPos {
return s.line[len(s.line)-1]
}
// newValue0 adds a new value with no arguments to the current block.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) newValue0(op ssa.Op, t *types.Type) *ssa.Value {
return s.curBlock.NewValue0(s.peekPos(), op, t)
}
// newValue0A adds a new value with no arguments and an aux value to the current block.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) newValue0A(op ssa.Op, t *types.Type, aux interface{}) *ssa.Value {
return s.curBlock.NewValue0A(s.peekPos(), op, t, aux)
}
// newValue0I adds a new value with no arguments and an auxint value to the current block.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) newValue0I(op ssa.Op, t *types.Type, auxint int64) *ssa.Value {
return s.curBlock.NewValue0I(s.peekPos(), op, t, auxint)
}
// newValue1 adds a new value with one argument to the current block.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) newValue1(op ssa.Op, t *types.Type, arg *ssa.Value) *ssa.Value {
return s.curBlock.NewValue1(s.peekPos(), op, t, arg)
}
// newValue1A adds a new value with one argument and an aux value to the current block.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) newValue1A(op ssa.Op, t *types.Type, aux interface{}, arg *ssa.Value) *ssa.Value {
return s.curBlock.NewValue1A(s.peekPos(), op, t, aux, arg)
}
// newValue1Apos adds a new value with one argument and an aux value to the current block.
// isStmt determines whether the created values may be a statement or not
// (i.e., false means never, yes means maybe).
func (s *state) newValue1Apos(op ssa.Op, t *types.Type, aux interface{}, arg *ssa.Value, isStmt bool) *ssa.Value {
if isStmt {
return s.curBlock.NewValue1A(s.peekPos(), op, t, aux, arg)
}
return s.curBlock.NewValue1A(s.peekPos().WithNotStmt(), op, t, aux, arg)
}
// newValue1I adds a new value with one argument and an auxint value to the current block.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) newValue1I(op ssa.Op, t *types.Type, aux int64, arg *ssa.Value) *ssa.Value {
return s.curBlock.NewValue1I(s.peekPos(), op, t, aux, arg)
}
// newValue2 adds a new value with two arguments to the current block.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) newValue2(op ssa.Op, t *types.Type, arg0, arg1 *ssa.Value) *ssa.Value {
return s.curBlock.NewValue2(s.peekPos(), op, t, arg0, arg1)
}
// newValue2Apos adds a new value with two arguments and an aux value to the current block.
// isStmt determines whether the created values may be a statement or not
// (i.e., false means never, yes means maybe).
func (s *state) newValue2Apos(op ssa.Op, t *types.Type, aux interface{}, arg0, arg1 *ssa.Value, isStmt bool) *ssa.Value {
if isStmt {
return s.curBlock.NewValue2A(s.peekPos(), op, t, aux, arg0, arg1)
}
return s.curBlock.NewValue2A(s.peekPos().WithNotStmt(), op, t, aux, arg0, arg1)
}
// newValue2I adds a new value with two arguments and an auxint value to the current block.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) newValue2I(op ssa.Op, t *types.Type, aux int64, arg0, arg1 *ssa.Value) *ssa.Value {
return s.curBlock.NewValue2I(s.peekPos(), op, t, aux, arg0, arg1)
}
// newValue3 adds a new value with three arguments to the current block.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) newValue3(op ssa.Op, t *types.Type, arg0, arg1, arg2 *ssa.Value) *ssa.Value {
return s.curBlock.NewValue3(s.peekPos(), op, t, arg0, arg1, arg2)
}
// newValue3I adds a new value with three arguments and an auxint value to the current block.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) newValue3I(op ssa.Op, t *types.Type, aux int64, arg0, arg1, arg2 *ssa.Value) *ssa.Value {
return s.curBlock.NewValue3I(s.peekPos(), op, t, aux, arg0, arg1, arg2)
}
// newValue3A adds a new value with three arguments and an aux value to the current block.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) newValue3A(op ssa.Op, t *types.Type, aux interface{}, arg0, arg1, arg2 *ssa.Value) *ssa.Value {
return s.curBlock.NewValue3A(s.peekPos(), op, t, aux, arg0, arg1, arg2)
}
// newValue3Apos adds a new value with three arguments and an aux value to the current block.
// isStmt determines whether the created values may be a statement or not
// (i.e., false means never, yes means maybe).
func (s *state) newValue3Apos(op ssa.Op, t *types.Type, aux interface{}, arg0, arg1, arg2 *ssa.Value, isStmt bool) *ssa.Value {
if isStmt {
return s.curBlock.NewValue3A(s.peekPos(), op, t, aux, arg0, arg1, arg2)
}
return s.curBlock.NewValue3A(s.peekPos().WithNotStmt(), op, t, aux, arg0, arg1, arg2)
}
// newValue4 adds a new value with four arguments to the current block.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) newValue4(op ssa.Op, t *types.Type, arg0, arg1, arg2, arg3 *ssa.Value) *ssa.Value {
return s.curBlock.NewValue4(s.peekPos(), op, t, arg0, arg1, arg2, arg3)
}
// newValue4 adds a new value with four arguments and an auxint value to the current block.
func (s *state) newValue4I(op ssa.Op, t *types.Type, aux int64, arg0, arg1, arg2, arg3 *ssa.Value) *ssa.Value {
return s.curBlock.NewValue4I(s.peekPos(), op, t, aux, arg0, arg1, arg2, arg3)
}
// entryNewValue0 adds a new value with no arguments to the entry block.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) entryNewValue0(op ssa.Op, t *types.Type) *ssa.Value {
return s.f.Entry.NewValue0(src.NoXPos, op, t)
}
// entryNewValue0A adds a new value with no arguments and an aux value to the entry block.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) entryNewValue0A(op ssa.Op, t *types.Type, aux interface{}) *ssa.Value {
return s.f.Entry.NewValue0A(src.NoXPos, op, t, aux)
}
// entryNewValue1 adds a new value with one argument to the entry block.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) entryNewValue1(op ssa.Op, t *types.Type, arg *ssa.Value) *ssa.Value {
return s.f.Entry.NewValue1(src.NoXPos, op, t, arg)
}
// entryNewValue1 adds a new value with one argument and an auxint value to the entry block.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) entryNewValue1I(op ssa.Op, t *types.Type, auxint int64, arg *ssa.Value) *ssa.Value {
return s.f.Entry.NewValue1I(src.NoXPos, op, t, auxint, arg)
}
// entryNewValue1A adds a new value with one argument and an aux value to the entry block.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) entryNewValue1A(op ssa.Op, t *types.Type, aux interface{}, arg *ssa.Value) *ssa.Value {
return s.f.Entry.NewValue1A(src.NoXPos, op, t, aux, arg)
}
// entryNewValue2 adds a new value with two arguments to the entry block.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) entryNewValue2(op ssa.Op, t *types.Type, arg0, arg1 *ssa.Value) *ssa.Value {
return s.f.Entry.NewValue2(src.NoXPos, op, t, arg0, arg1)
}
// entryNewValue2A adds a new value with two arguments and an aux value to the entry block.
func (s *state) entryNewValue2A(op ssa.Op, t *types.Type, aux interface{}, arg0, arg1 *ssa.Value) *ssa.Value {
return s.f.Entry.NewValue2A(src.NoXPos, op, t, aux, arg0, arg1)
}
// const* routines add a new const value to the entry block.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) constSlice(t *types.Type) *ssa.Value {
return s.f.ConstSlice(t)
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
}
func (s *state) constInterface(t *types.Type) *ssa.Value {
return s.f.ConstInterface(t)
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
}
func (s *state) constNil(t *types.Type) *ssa.Value { return s.f.ConstNil(t) }
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) constEmptyString(t *types.Type) *ssa.Value {
return s.f.ConstEmptyString(t)
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
}
func (s *state) constBool(c bool) *ssa.Value {
return s.f.ConstBool(types.Types[TBOOL], c)
}
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) constInt8(t *types.Type, c int8) *ssa.Value {
return s.f.ConstInt8(t, c)
}
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) constInt16(t *types.Type, c int16) *ssa.Value {
return s.f.ConstInt16(t, c)
}
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) constInt32(t *types.Type, c int32) *ssa.Value {
return s.f.ConstInt32(t, c)
}
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) constInt64(t *types.Type, c int64) *ssa.Value {
return s.f.ConstInt64(t, c)
}
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) constFloat32(t *types.Type, c float64) *ssa.Value {
return s.f.ConstFloat32(t, c)
}
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) constFloat64(t *types.Type, c float64) *ssa.Value {
return s.f.ConstFloat64(t, c)
}
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) constInt(t *types.Type, c int64) *ssa.Value {
if s.config.PtrSize == 8 {
return s.constInt64(t, c)
}
if int64(int32(c)) != c {
s.Fatalf("integer constant too big %d", c)
}
return s.constInt32(t, int32(c))
}
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) constOffPtrSP(t *types.Type, c int64) *ssa.Value {
return s.f.ConstOffPtrSP(t, c, s.sp)
cmd/compile: add OpOffPtr [c] SP to constant cache They accounted for almost 30% of all CSE'd values. By never creating the duplicates in the first place, we reduce the high water mark of Value IDs, which in turn makes all SSA phases cheaper, particularly regalloc. name old time/op new time/op delta Template 200ms ± 3% 198ms ± 4% -0.87% (p=0.016 n=50+49) Unicode 86.9ms ± 2% 85.5ms ± 3% -1.56% (p=0.000 n=49+50) GoTypes 553ms ± 4% 551ms ± 4% ~ (p=0.183 n=50+49) SSA 3.97s ± 3% 3.93s ± 2% -1.06% (p=0.000 n=48+48) Flate 124ms ± 4% 124ms ± 3% ~ (p=0.545 n=48+50) GoParser 146ms ± 4% 146ms ± 4% ~ (p=0.810 n=49+49) Reflect 357ms ± 3% 355ms ± 3% -0.59% (p=0.049 n=50+48) Tar 106ms ± 4% 107ms ± 5% ~ (p=0.454 n=49+50) XML 203ms ± 4% 203ms ± 4% ~ (p=0.726 n=48+50) name old user-ns/op new user-ns/op delta Template 237M ± 3% 235M ± 4% ~ (p=0.208 n=47+48) Unicode 111M ± 4% 108M ± 9% -2.50% (p=0.000 n=47+50) GoTypes 736M ± 5% 729M ± 4% -0.95% (p=0.017 n=50+46) SSA 5.73G ± 4% 5.74G ± 4% ~ (p=0.765 n=50+50) Flate 150M ± 5% 148M ± 6% -0.89% (p=0.045 n=48+47) GoParser 180M ± 5% 178M ± 7% -1.34% (p=0.012 n=50+50) Reflect 450M ± 4% 444M ± 4% -1.40% (p=0.000 n=50+49) Tar 124M ± 7% 123M ± 7% ~ (p=0.092 n=50+50) XML 248M ± 6% 245M ± 5% ~ (p=0.057 n=50+50) name old alloc/op new alloc/op delta Template 39.4MB ± 0% 39.3MB ± 0% -0.37% (p=0.000 n=50+50) Unicode 30.9MB ± 0% 30.9MB ± 0% -0.27% (p=0.000 n=48+50) GoTypes 114MB ± 0% 113MB ± 0% -1.03% (p=0.000 n=50+49) SSA 882MB ± 0% 865MB ± 0% -1.95% (p=0.000 n=49+49) Flate 25.8MB ± 0% 25.7MB ± 0% -0.21% (p=0.000 n=50+50) GoParser 31.7MB ± 0% 31.6MB ± 0% -0.33% (p=0.000 n=50+50) Reflect 79.7MB ± 0% 79.3MB ± 0% -0.49% (p=0.000 n=44+49) Tar 27.2MB ± 0% 27.1MB ± 0% -0.31% (p=0.000 n=50+50) XML 42.7MB ± 0% 42.3MB ± 0% -1.05% (p=0.000 n=48+49) name old allocs/op new allocs/op delta Template 379k ± 1% 380k ± 1% +0.26% (p=0.000 n=50+50) Unicode 324k ± 1% 324k ± 1% ~ (p=0.964 n=49+50) GoTypes 1.14M ± 0% 1.15M ± 0% +0.14% (p=0.000 n=50+49) SSA 7.89M ± 0% 7.89M ± 0% -0.05% (p=0.000 n=49+49) Flate 240k ± 1% 241k ± 1% +0.27% (p=0.001 n=50+50) GoParser 310k ± 1% 311k ± 1% +0.48% (p=0.000 n=50+49) Reflect 1.00M ± 0% 1.00M ± 0% +0.17% (p=0.000 n=48+50) Tar 254k ± 1% 255k ± 1% +0.23% (p=0.005 n=50+50) XML 395k ± 1% 395k ± 1% +0.19% (p=0.002 n=49+47) Change-Id: Iaa8f5f37e23bd81983409f7359f9dcd4dfe2961f Reviewed-on: https://go-review.googlesource.com/38003 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-08 12:50:00 -08:00
}
// newValueOrSfCall* are wrappers around newValue*, which may create a call to a
// soft-float runtime function instead (when emitting soft-float code).
func (s *state) newValueOrSfCall1(op ssa.Op, t *types.Type, arg *ssa.Value) *ssa.Value {
if s.softFloat {
if c, ok := s.sfcall(op, arg); ok {
return c
}
}
return s.newValue1(op, t, arg)
}
func (s *state) newValueOrSfCall2(op ssa.Op, t *types.Type, arg0, arg1 *ssa.Value) *ssa.Value {
if s.softFloat {
if c, ok := s.sfcall(op, arg0, arg1); ok {
return c
}
}
return s.newValue2(op, t, arg0, arg1)
}
func (s *state) instrument(t *types.Type, addr *ssa.Value, wr bool) {
if !s.curfn.Func.InstrumentBody() {
return
}
w := t.Size()
if w == 0 {
return // can't race on zero-sized things
}
if ssa.IsSanitizerSafeAddr(addr) {
return
}
var fn *obj.LSym
needWidth := false
if flag_msan {
fn = msanread
if wr {
fn = msanwrite
}
needWidth = true
} else if flag_race && t.NumComponents(types.CountBlankFields) > 1 {
// for composite objects we have to write every address
// because a write might happen to any subobject.
// composites with only one element don't have subobjects, though.
fn = racereadrange
if wr {
fn = racewriterange
}
needWidth = true
} else if flag_race {
// for non-composite objects we can write just the start
// address, as any write must write the first byte.
fn = raceread
if wr {
fn = racewrite
}
} else {
panic("unreachable")
}
args := []*ssa.Value{addr}
if needWidth {
args = append(args, s.constInt(types.Types[TUINTPTR], w))
}
s.rtcall(fn, true, nil, args...)
}
func (s *state) load(t *types.Type, src *ssa.Value) *ssa.Value {
s.instrument(t, src, false)
return s.rawLoad(t, src)
}
func (s *state) rawLoad(t *types.Type, src *ssa.Value) *ssa.Value {
return s.newValue2(ssa.OpLoad, t, src, s.mem())
}
func (s *state) store(t *types.Type, dst, val *ssa.Value) {
s.vars[&memVar] = s.newValue3A(ssa.OpStore, types.TypeMem, t, dst, val, s.mem())
}
func (s *state) zero(t *types.Type, dst *ssa.Value) {
s.instrument(t, dst, true)
store := s.newValue2I(ssa.OpZero, types.TypeMem, t.Size(), dst, s.mem())
store.Aux = t
s.vars[&memVar] = store
}
func (s *state) move(t *types.Type, dst, src *ssa.Value) {
s.instrument(t, src, false)
s.instrument(t, dst, true)
store := s.newValue3I(ssa.OpMove, types.TypeMem, t.Size(), dst, src, s.mem())
store.Aux = t
s.vars[&memVar] = store
}
// stmtList converts the statement list n to SSA and adds it to s.
func (s *state) stmtList(l Nodes) {
for _, n := range l.Slice() {
s.stmt(n)
}
}
// stmt converts the statement n to SSA and adds it to s.
func (s *state) stmt(n *Node) {
if !(n.Op == OVARKILL || n.Op == OVARLIVE || n.Op == OVARDEF) {
// OVARKILL, OVARLIVE, and OVARDEF are invisible to the programmer, so we don't use their line numbers to avoid confusion in debugging.
s.pushLine(n.Pos)
defer s.popLine()
}
cmd/compile: check labels and gotos before building SSA This CL introduces yet another compiler pass, which checks for correct control flow constructs prior to converting from AST to SSA form. It cannot be integrated with walk, since walk rewrites switch and select statements on the fly. To reduce code duplication, this CL also does some minor refactoring. With this pass in place, the AST to SSA converter can now stop generating SSA for any known-dead code. This minor savings pays for the minor cost of the new pass. Performance is almost a wash: name old time/op new time/op delta Template 206ms ± 4% 205ms ± 4% ~ (p=0.108 n=43+43) Unicode 84.0ms ± 4% 84.0ms ± 4% ~ (p=0.979 n=43+43) GoTypes 550ms ± 3% 553ms ± 3% ~ (p=0.065 n=40+41) Compiler 2.57s ± 4% 2.58s ± 2% ~ (p=0.103 n=44+41) SSA 3.94s ± 3% 3.93s ± 2% ~ (p=0.833 n=44+42) Flate 126ms ± 6% 125ms ± 4% ~ (p=0.941 n=43+39) GoParser 147ms ± 4% 148ms ± 3% ~ (p=0.164 n=42+39) Reflect 359ms ± 3% 357ms ± 5% ~ (p=0.241 n=43+44) Tar 106ms ± 5% 106ms ± 7% ~ (p=0.853 n=40+43) XML 202ms ± 3% 203ms ± 3% ~ (p=0.488 n=42+41) name old user-ns/op new user-ns/op delta Template 240M ± 4% 239M ± 4% ~ (p=0.844 n=42+43) Unicode 107M ± 5% 107M ± 4% ~ (p=0.332 n=40+43) GoTypes 735M ± 3% 731M ± 4% ~ (p=0.141 n=43+44) Compiler 3.51G ± 3% 3.52G ± 3% ~ (p=0.208 n=42+43) SSA 5.72G ± 4% 5.72G ± 3% ~ (p=0.928 n=44+42) Flate 151M ± 7% 150M ± 8% ~ (p=0.662 n=44+43) GoParser 181M ± 5% 181M ± 4% ~ (p=0.379 n=41+44) Reflect 447M ± 4% 445M ± 4% ~ (p=0.344 n=43+43) Tar 125M ± 7% 124M ± 6% ~ (p=0.353 n=43+43) XML 248M ± 4% 250M ± 6% ~ (p=0.158 n=44+44) name old alloc/op new alloc/op delta Template 40.3MB ± 0% 40.2MB ± 0% -0.27% (p=0.000 n=10+10) Unicode 30.3MB ± 0% 30.2MB ± 0% -0.10% (p=0.015 n=10+10) GoTypes 114MB ± 0% 114MB ± 0% -0.06% (p=0.000 n=7+9) Compiler 480MB ± 0% 481MB ± 0% +0.07% (p=0.000 n=10+10) SSA 864MB ± 0% 862MB ± 0% -0.25% (p=0.000 n=9+10) Flate 25.9MB ± 0% 25.9MB ± 0% ~ (p=0.123 n=10+10) GoParser 32.1MB ± 0% 32.1MB ± 0% ~ (p=0.631 n=10+10) Reflect 79.9MB ± 0% 79.6MB ± 0% -0.39% (p=0.000 n=10+9) Tar 27.1MB ± 0% 27.0MB ± 0% -0.18% (p=0.003 n=10+10) XML 42.6MB ± 0% 42.6MB ± 0% ~ (p=0.143 n=10+10) name old allocs/op new allocs/op delta Template 401k ± 0% 401k ± 1% ~ (p=0.353 n=10+10) Unicode 322k ± 0% 322k ± 0% ~ (p=0.739 n=10+10) GoTypes 1.18M ± 0% 1.18M ± 0% +0.25% (p=0.001 n=7+8) Compiler 4.51M ± 0% 4.53M ± 0% +0.37% (p=0.000 n=10+10) SSA 7.91M ± 0% 7.93M ± 0% +0.20% (p=0.000 n=9+10) Flate 244k ± 0% 245k ± 0% ~ (p=0.123 n=10+10) GoParser 323k ± 1% 324k ± 1% +0.40% (p=0.035 n=10+10) Reflect 1.01M ± 0% 1.02M ± 0% +0.37% (p=0.000 n=10+9) Tar 258k ± 1% 258k ± 1% ~ (p=0.661 n=10+9) XML 403k ± 0% 405k ± 0% +0.47% (p=0.004 n=10+10) Updates #15756 Updates #19250 Change-Id: I647bfbb745c35630447eb79dfcaa994b490ce942 Reviewed-on: https://go-review.googlesource.com/38159 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-14 11:05:03 -07:00
// If s.curBlock is nil, and n isn't a label (which might have an associated goto somewhere),
// then this code is dead. Stop here.
if s.curBlock == nil && n.Op != OLABEL {
return
}
s.stmtList(n.Ninit)
switch n.Op {
case OBLOCK:
s.stmtList(n.List)
// No-ops
case OEMPTY, ODCLCONST, ODCLTYPE, OFALL:
// Expression statements
case OCALLFUNC:
if isIntrinsicCall(n) {
s.intrinsicCall(n)
return
}
fallthrough
case OCALLMETH, OCALLINTER:
s.call(n, callNormal)
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if n.Op == OCALLFUNC && n.Left.Op == ONAME && n.Left.Class() == PFUNC {
if fn := n.Left.Sym.Name; compiling_runtime && fn == "throw" ||
n.Left.Sym.Pkg == Runtimepkg && (fn == "throwinit" || fn == "gopanic" || fn == "panicwrap" || fn == "block" || fn == "panicmakeslicelen" || fn == "panicmakeslicecap") {
m := s.mem()
b := s.endBlock()
b.Kind = ssa.BlockExit
b.SetControl(m)
// TODO: never rewrite OPANIC to OCALLFUNC in the
// first place. Need to wait until all backends
// go through SSA.
}
}
case ODEFER:
s.call(n.Left, callDefer)
cmd/compile: bulk rename This change does a bulk rename of several identifiers in the compiler. See #27167 and https://docs.google.com/document/d/19_ExiylD9MRfeAjKIfEsMU1_RGhuxB9sA0b5Zv7byVI/ for context and for discussion of these particular renames. Commands run to generate this change: gorename -from '"cmd/compile/internal/gc".OPROC' -to OGO gorename -from '"cmd/compile/internal/gc".OCOM' -to OBITNOT gorename -from '"cmd/compile/internal/gc".OMINUS' -to ONEG gorename -from '"cmd/compile/internal/gc".OIND' -to ODEREF gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTR' -to OBYTES2STR gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTRTMP' -to OBYTES2STRTMP gorename -from '"cmd/compile/internal/gc".OARRAYRUNESTR' -to ORUNES2STR gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTE' -to OSTR2BYTES gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTETMP' -to OSTR2BYTESTMP gorename -from '"cmd/compile/internal/gc".OSTRARRAYRUNE' -to OSTR2RUNES gorename -from '"cmd/compile/internal/gc".Etop' -to ctxStmt gorename -from '"cmd/compile/internal/gc".Erv' -to ctxExpr gorename -from '"cmd/compile/internal/gc".Ecall' -to ctxCallee gorename -from '"cmd/compile/internal/gc".Efnstruct' -to ctxMultiOK gorename -from '"cmd/compile/internal/gc".Easgn' -to ctxAssign gorename -from '"cmd/compile/internal/gc".Ecomplit' -to ctxCompLit Not altered: parameters and local variables (mostly in typecheck.go) named top, which should probably now be called ctx (and which should probably have a named type). Also not altered: Field called Top in gc.Func. gorename -from '"cmd/compile/internal/gc".Node.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/gc".Node.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/gc".nodeIsddd' -to nodeIsDDD gorename -from '"cmd/compile/internal/types".Field.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/types".Field.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/types".fieldIsddd' -to fieldIsDDD Not altered: function gc.hasddd, params and local variables called isddd Also not altered: fmt.go prints nodes using "isddd(%v)". cd cmd/compile/internal/gc; go generate I then manually found impacted comments using exact string match and fixed them up by hand. The comment changes were trivial. Passes toolstash-check. Fixes #27167. If this experiment is deemed a success, we will open a new tracking issue for renames to do at the end of the 1.13 cycles. Change-Id: I2dc541533d2ab0d06cb3d31d65df205ecfb151e8 Reviewed-on: https://go-review.googlesource.com/c/150140 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-11-18 08:34:38 -08:00
case OGO:
s.call(n.Left, callGo)
case OAS2DOTTYPE:
res, resok := s.dottype(n.Rlist.First(), true)
deref := false
if !canSSAType(n.Rlist.First().Type) {
if res.Op != ssa.OpLoad {
s.Fatalf("dottype of non-load")
}
mem := s.mem()
if mem.Op == ssa.OpVarKill {
mem = mem.Args[0]
}
if res.Args[1] != mem {
s.Fatalf("memory no longer live from 2-result dottype load")
}
deref = true
res = res.Args[0]
}
s.assign(n.List.First(), res, deref, 0)
s.assign(n.List.Second(), resok, false, 0)
return
case OAS2FUNC:
// We come here only when it is an intrinsic call returning two values.
if !isIntrinsicCall(n.Rlist.First()) {
s.Fatalf("non-intrinsic AS2FUNC not expanded %v", n.Rlist.First())
}
v := s.intrinsicCall(n.Rlist.First())
v1 := s.newValue1(ssa.OpSelect0, n.List.First().Type, v)
v2 := s.newValue1(ssa.OpSelect1, n.List.Second().Type, v)
s.assign(n.List.First(), v1, false, 0)
s.assign(n.List.Second(), v2, false, 0)
return
case ODCL:
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if n.Left.Class() == PAUTOHEAP {
cmd/compile: fix liveness computation for heap-escaped parameters The liveness computation of parameters generally was never correct, but forcing all parameters to be live throughout the function covered up that problem. The new SSA back end is too clever: even though it currently keeps the parameter values live throughout the function, it may find optimizations that mean the current values are not written back to the original parameter stack slots immediately or ever (for example if a parameter is set to nil, SSA constant propagation may replace all later uses of the parameter with a constant nil, eliminating the need to write the nil value back to the stack slot), so the liveness code must now track the actual operations on the stack slots, exposing these problems. One small problem in the handling of arguments is that nodarg can return ONAME PPARAM nodes with adjusted offsets, so that there are actually multiple *Node pointers for the same parameter in the instruction stream. This might be possible to correct, but not in this CL. For now, we fix this by using n.Orig instead of n when considering PPARAM and PPARAMOUT nodes. The major problem in the handling of arguments is general confusion in the liveness code about the meaning of PPARAM|PHEAP and PPARAMOUT|PHEAP nodes, especially as contrasted with PAUTO|PHEAP. The difference between these two is that when a local variable "moves" to the heap, it's really just allocated there to start with; in contrast, when an argument moves to the heap, the actual data has to be copied there from the stack at the beginning of the function, and when a result "moves" to the heap the value in the heap has to be copied back to the stack when the function returns This general confusion is also present in the SSA back end. The PHEAP bit worked decently when I first introduced it 7 years ago (!) in 391425ae. The back end did nothing sophisticated, and in particular there was no analysis at all: no escape analysis, no liveness analysis, and certainly no SSA back end. But the complications caused in the various downstream consumers suggest that this should be a detail kept mainly in the front end. This CL therefore eliminates both the PHEAP bit and even the idea of "heap variables" from the back ends. First, it replaces the PPARAM|PHEAP, PPARAMOUT|PHEAP, and PAUTO|PHEAP variable classes with the single PAUTOHEAP, a pseudo-class indicating a variable maintained on the heap and available by indirecting a local variable kept on the stack (a plain PAUTO). Second, walkexpr replaces all references to PAUTOHEAP variables with indirections of the corresponding PAUTO variable. The back ends and the liveness code now just see plain indirected variables. This may actually produce better code, but the real goal here is to eliminate these little-used and somewhat suspect code paths in the back end analyses. The OPARAM node type goes away too. A followup CL will do the same to PPARAMREF. I'm not sure that the back ends (SSA in particular) are handling those right either, and with the framework established in this CL that change is trivial and the result clearly more correct. Fixes #15747. Change-Id: I2770b1ce3cbc93981bfc7166be66a9da12013d74 Reviewed-on: https://go-review.googlesource.com/23393 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-25 01:33:24 -04:00
Fatalf("DCL %v", n)
}
case OLABEL:
sym := n.Sym
lab := s.label(sym)
// Associate label with its control flow node, if any
cmd/compile: check labels and gotos before building SSA This CL introduces yet another compiler pass, which checks for correct control flow constructs prior to converting from AST to SSA form. It cannot be integrated with walk, since walk rewrites switch and select statements on the fly. To reduce code duplication, this CL also does some minor refactoring. With this pass in place, the AST to SSA converter can now stop generating SSA for any known-dead code. This minor savings pays for the minor cost of the new pass. Performance is almost a wash: name old time/op new time/op delta Template 206ms ± 4% 205ms ± 4% ~ (p=0.108 n=43+43) Unicode 84.0ms ± 4% 84.0ms ± 4% ~ (p=0.979 n=43+43) GoTypes 550ms ± 3% 553ms ± 3% ~ (p=0.065 n=40+41) Compiler 2.57s ± 4% 2.58s ± 2% ~ (p=0.103 n=44+41) SSA 3.94s ± 3% 3.93s ± 2% ~ (p=0.833 n=44+42) Flate 126ms ± 6% 125ms ± 4% ~ (p=0.941 n=43+39) GoParser 147ms ± 4% 148ms ± 3% ~ (p=0.164 n=42+39) Reflect 359ms ± 3% 357ms ± 5% ~ (p=0.241 n=43+44) Tar 106ms ± 5% 106ms ± 7% ~ (p=0.853 n=40+43) XML 202ms ± 3% 203ms ± 3% ~ (p=0.488 n=42+41) name old user-ns/op new user-ns/op delta Template 240M ± 4% 239M ± 4% ~ (p=0.844 n=42+43) Unicode 107M ± 5% 107M ± 4% ~ (p=0.332 n=40+43) GoTypes 735M ± 3% 731M ± 4% ~ (p=0.141 n=43+44) Compiler 3.51G ± 3% 3.52G ± 3% ~ (p=0.208 n=42+43) SSA 5.72G ± 4% 5.72G ± 3% ~ (p=0.928 n=44+42) Flate 151M ± 7% 150M ± 8% ~ (p=0.662 n=44+43) GoParser 181M ± 5% 181M ± 4% ~ (p=0.379 n=41+44) Reflect 447M ± 4% 445M ± 4% ~ (p=0.344 n=43+43) Tar 125M ± 7% 124M ± 6% ~ (p=0.353 n=43+43) XML 248M ± 4% 250M ± 6% ~ (p=0.158 n=44+44) name old alloc/op new alloc/op delta Template 40.3MB ± 0% 40.2MB ± 0% -0.27% (p=0.000 n=10+10) Unicode 30.3MB ± 0% 30.2MB ± 0% -0.10% (p=0.015 n=10+10) GoTypes 114MB ± 0% 114MB ± 0% -0.06% (p=0.000 n=7+9) Compiler 480MB ± 0% 481MB ± 0% +0.07% (p=0.000 n=10+10) SSA 864MB ± 0% 862MB ± 0% -0.25% (p=0.000 n=9+10) Flate 25.9MB ± 0% 25.9MB ± 0% ~ (p=0.123 n=10+10) GoParser 32.1MB ± 0% 32.1MB ± 0% ~ (p=0.631 n=10+10) Reflect 79.9MB ± 0% 79.6MB ± 0% -0.39% (p=0.000 n=10+9) Tar 27.1MB ± 0% 27.0MB ± 0% -0.18% (p=0.003 n=10+10) XML 42.6MB ± 0% 42.6MB ± 0% ~ (p=0.143 n=10+10) name old allocs/op new allocs/op delta Template 401k ± 0% 401k ± 1% ~ (p=0.353 n=10+10) Unicode 322k ± 0% 322k ± 0% ~ (p=0.739 n=10+10) GoTypes 1.18M ± 0% 1.18M ± 0% +0.25% (p=0.001 n=7+8) Compiler 4.51M ± 0% 4.53M ± 0% +0.37% (p=0.000 n=10+10) SSA 7.91M ± 0% 7.93M ± 0% +0.20% (p=0.000 n=9+10) Flate 244k ± 0% 245k ± 0% ~ (p=0.123 n=10+10) GoParser 323k ± 1% 324k ± 1% +0.40% (p=0.035 n=10+10) Reflect 1.01M ± 0% 1.02M ± 0% +0.37% (p=0.000 n=10+9) Tar 258k ± 1% 258k ± 1% ~ (p=0.661 n=10+9) XML 403k ± 0% 405k ± 0% +0.47% (p=0.004 n=10+10) Updates #15756 Updates #19250 Change-Id: I647bfbb745c35630447eb79dfcaa994b490ce942 Reviewed-on: https://go-review.googlesource.com/38159 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-14 11:05:03 -07:00
if ctl := n.labeledControl(); ctl != nil {
s.labeledNodes[ctl] = lab
}
// The label might already have a target block via a goto.
if lab.target == nil {
lab.target = s.f.NewBlock(ssa.BlockPlain)
}
cmd/compile: avoid generating some dead blocks We generate a lot of pointless dead blocks during the AST to SSA conversion. There are a few commonly occurring kinds of statements that contain neither variables nor code and that switch to a new block themselves. Stop making dead blocks for them. For the code in #19379, this reduces compilation wall time by 36% and max rss by 28%. This also helps a little for regular code, particularly code heavy on switch statements. name old time/op new time/op delta Template 231ms ± 3% 230ms ± 5% ~ (p=0.402 n=17+16) Unicode 101ms ± 4% 103ms ± 5% ~ (p=0.221 n=19+18) GoTypes 635ms ± 5% 625ms ± 4% ~ (p=0.063 n=20+18) Compiler 2.93s ± 2% 2.89s ± 2% -1.22% (p=0.003 n=20+19) SSA 4.53s ± 3% 4.52s ± 3% ~ (p=0.380 n=20+19) Flate 132ms ± 4% 133ms ± 5% ~ (p=0.647 n=20+19) GoParser 161ms ± 3% 161ms ± 4% ~ (p=0.749 n=20+19) Reflect 403ms ± 4% 397ms ± 3% -1.53% (p=0.030 n=20+19) Tar 121ms ± 2% 121ms ± 8% ~ (p=0.544 n=19+19) XML 225ms ± 3% 224ms ± 4% ~ (p=0.396 n=20+19) name old user-ns/op new user-ns/op delta Template 302user-ms ± 1% 297user-ms ± 7% -1.49% (p=0.048 n=15+18) Unicode 142user-ms ± 3% 143user-ms ± 5% ~ (p=0.363 n=19+17) GoTypes 852user-ms ± 5% 851user-ms ± 3% ~ (p=0.851 n=20+18) Compiler 4.11user-s ± 6% 3.98user-s ± 3% -3.08% (p=0.000 n=20+19) SSA 6.91user-s ± 5% 6.82user-s ± 7% ~ (p=0.113 n=20+19) Flate 164user-ms ± 4% 168user-ms ± 4% +2.42% (p=0.001 n=18+19) GoParser 207user-ms ± 4% 206user-ms ± 4% ~ (p=0.176 n=20+18) Reflect 509user-ms ± 4% 505user-ms ± 4% ~ (p=0.113 n=20+19) Tar 153user-ms ± 7% 151user-ms ± 9% ~ (p=0.283 n=20+19) XML 284user-ms ± 4% 282user-ms ± 4% ~ (p=0.270 n=20+19) name old alloc/op new alloc/op delta Template 42.6MB ± 0% 41.9MB ± 0% -1.55% (p=0.000 n=19+19) Unicode 31.7MB ± 0% 31.7MB ± 0% ~ (p=0.828 n=20+18) GoTypes 124MB ± 0% 121MB ± 0% -2.11% (p=0.000 n=20+17) Compiler 534MB ± 0% 523MB ± 0% -2.06% (p=0.000 n=20+19) SSA 989MB ± 0% 977MB ± 0% -1.28% (p=0.000 n=20+19) Flate 27.8MB ± 0% 27.5MB ± 0% -0.98% (p=0.000 n=20+19) GoParser 34.3MB ± 0% 34.0MB ± 0% -0.81% (p=0.000 n=20+19) Reflect 84.6MB ± 0% 82.9MB ± 0% -2.00% (p=0.000 n=17+18) Tar 28.8MB ± 0% 28.3MB ± 0% -1.52% (p=0.000 n=16+18) XML 47.2MB ± 0% 45.8MB ± 0% -2.99% (p=0.000 n=20+19) name old allocs/op new allocs/op delta Template 421k ± 1% 419k ± 1% -0.41% (p=0.001 n=20+19) Unicode 338k ± 1% 338k ± 1% ~ (p=0.478 n=20+19) GoTypes 1.28M ± 0% 1.28M ± 0% -0.36% (p=0.000 n=20+18) Compiler 5.06M ± 0% 5.03M ± 0% -0.63% (p=0.000 n=20+19) SSA 9.14M ± 0% 9.11M ± 0% -0.34% (p=0.000 n=20+19) Flate 267k ± 1% 266k ± 1% ~ (p=0.149 n=20+19) GoParser 347k ± 0% 347k ± 1% ~ (p=0.103 n=19+19) Reflect 1.07M ± 0% 1.07M ± 0% -0.42% (p=0.000 n=16+18) Tar 274k ± 0% 273k ± 1% ~ (p=0.116 n=19+19) XML 449k ± 0% 446k ± 1% -0.60% (p=0.000 n=20+19) Updates #19379 Change-Id: Ie798c347a0c081f5e349e1529880bebaae290967 Reviewed-on: https://go-review.googlesource.com/37760 Reviewed-by: David Chase <drchase@google.com>
2017-03-03 16:15:35 -08:00
// Go to that label.
// (We pretend "label:" is preceded by "goto label", unless the predecessor is unreachable.)
if s.curBlock != nil {
b := s.endBlock()
b.AddEdgeTo(lab.target)
}
s.startBlock(lab.target)
case OGOTO:
sym := n.Sym
lab := s.label(sym)
if lab.target == nil {
lab.target = s.f.NewBlock(ssa.BlockPlain)
}
b := s.endBlock()
cmd/compile: assign and preserve statement boundaries. A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
b.Pos = s.lastPos.WithIsStmt() // Do this even if b is an empty block.
b.AddEdgeTo(lab.target)
case OAS:
if n.Left == n.Right && n.Left.Op == ONAME {
// An x=x assignment. No point in doing anything
// here. In addition, skipping this assignment
// prevents generating:
// VARDEF x
// COPY x -> x
// which is bad because x is incorrectly considered
// dead before the vardef. See issue #14904.
return
}
// Evaluate RHS.
rhs := n.Right
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
if rhs != nil {
switch rhs.Op {
case OSTRUCTLIT, OARRAYLIT, OSLICELIT:
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
// All literals with nonzero fields have already been
// rewritten during walk. Any that remain are just T{}
// or equivalents. Use the zero value.
if !isZero(rhs) {
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
Fatalf("literal with nonzero value in SSA: %v", rhs)
}
rhs = nil
case OAPPEND:
// Check whether we're writing the result of an append back to the same slice.
// If so, we handle it specially to avoid write barriers on the fast
// (non-growth) path.
if !samesafeexpr(n.Left, rhs.List.First()) || Debug['N'] != 0 {
break
}
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
// If the slice can be SSA'd, it'll be on the stack,
// so there will be no write barriers,
// so there's no need to attempt to prevent them.
if s.canSSA(n.Left) {
if Debug_append > 0 { // replicating old diagnostic message
Warnl(n.Pos, "append: len-only update (in local slice)")
}
break
}
if Debug_append > 0 {
Warnl(n.Pos, "append: len-only update")
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
}
s.append(rhs, true)
return
}
}
if n.Left.isBlank() {
// _ = rhs
// Just evaluate rhs for side-effects.
if rhs != nil {
s.expr(rhs)
}
return
}
var t *types.Type
if n.Right != nil {
t = n.Right.Type
} else {
t = n.Left.Type
}
var r *ssa.Value
deref := !canSSAType(t)
if deref {
if rhs == nil {
r = nil // Signal assign to use OpZero.
} else {
r = s.addr(rhs, false)
}
} else {
if rhs == nil {
r = s.zeroVal(t)
} else {
r = s.expr(rhs)
}
}
var skip skipMask
if rhs != nil && (rhs.Op == OSLICE || rhs.Op == OSLICE3 || rhs.Op == OSLICESTR) && samesafeexpr(rhs.Left, n.Left) {
// We're assigning a slicing operation back to its source.
// Don't write back fields we aren't changing. See issue #14855.
i, j, k := rhs.SliceBounds()
if i != nil && (i.Op == OLITERAL && i.Val().Ctype() == CTINT && i.Int64() == 0) {
// [0:...] is the same as [:...]
i = nil
}
// TODO: detect defaults for len/cap also.
// Currently doesn't really work because (*p)[:len(*p)] appears here as:
// tmp = len(*p)
// (*p)[:tmp]
//if j != nil && (j.Op == OLEN && samesafeexpr(j.Left, n.Left)) {
// j = nil
//}
//if k != nil && (k.Op == OCAP && samesafeexpr(k.Left, n.Left)) {
// k = nil
//}
if i == nil {
skip |= skipPtr
if j == nil {
skip |= skipLen
}
if k == nil {
skip |= skipCap
}
}
}
s.assign(n.Left, r, deref, skip)
case OIF:
bEnd := s.f.NewBlock(ssa.BlockPlain)
var likely int8
if n.Likely() {
likely = 1
}
var bThen *ssa.Block
if n.Nbody.Len() != 0 {
bThen = s.f.NewBlock(ssa.BlockPlain)
} else {
bThen = bEnd
}
var bElse *ssa.Block
if n.Rlist.Len() != 0 {
bElse = s.f.NewBlock(ssa.BlockPlain)
} else {
bElse = bEnd
}
s.condBranch(n.Left, bThen, bElse, likely)
if n.Nbody.Len() != 0 {
s.startBlock(bThen)
s.stmtList(n.Nbody)
if b := s.endBlock(); b != nil {
b.AddEdgeTo(bEnd)
}
}
if n.Rlist.Len() != 0 {
s.startBlock(bElse)
s.stmtList(n.Rlist)
if b := s.endBlock(); b != nil {
b.AddEdgeTo(bEnd)
}
}
s.startBlock(bEnd)
case ORETURN:
s.stmtList(n.List)
b := s.exit()
cmd/compile: assign and preserve statement boundaries. A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
b.Pos = s.lastPos.WithIsStmt()
case ORETJMP:
s.stmtList(n.List)
b := s.exit()
b.Kind = ssa.BlockRetJmp // override BlockRet
b.Aux = n.Sym.Linksym()
case OCONTINUE, OBREAK:
var to *ssa.Block
if n.Sym == nil {
// plain break/continue
cmd/compile: check labels and gotos before building SSA This CL introduces yet another compiler pass, which checks for correct control flow constructs prior to converting from AST to SSA form. It cannot be integrated with walk, since walk rewrites switch and select statements on the fly. To reduce code duplication, this CL also does some minor refactoring. With this pass in place, the AST to SSA converter can now stop generating SSA for any known-dead code. This minor savings pays for the minor cost of the new pass. Performance is almost a wash: name old time/op new time/op delta Template 206ms ± 4% 205ms ± 4% ~ (p=0.108 n=43+43) Unicode 84.0ms ± 4% 84.0ms ± 4% ~ (p=0.979 n=43+43) GoTypes 550ms ± 3% 553ms ± 3% ~ (p=0.065 n=40+41) Compiler 2.57s ± 4% 2.58s ± 2% ~ (p=0.103 n=44+41) SSA 3.94s ± 3% 3.93s ± 2% ~ (p=0.833 n=44+42) Flate 126ms ± 6% 125ms ± 4% ~ (p=0.941 n=43+39) GoParser 147ms ± 4% 148ms ± 3% ~ (p=0.164 n=42+39) Reflect 359ms ± 3% 357ms ± 5% ~ (p=0.241 n=43+44) Tar 106ms ± 5% 106ms ± 7% ~ (p=0.853 n=40+43) XML 202ms ± 3% 203ms ± 3% ~ (p=0.488 n=42+41) name old user-ns/op new user-ns/op delta Template 240M ± 4% 239M ± 4% ~ (p=0.844 n=42+43) Unicode 107M ± 5% 107M ± 4% ~ (p=0.332 n=40+43) GoTypes 735M ± 3% 731M ± 4% ~ (p=0.141 n=43+44) Compiler 3.51G ± 3% 3.52G ± 3% ~ (p=0.208 n=42+43) SSA 5.72G ± 4% 5.72G ± 3% ~ (p=0.928 n=44+42) Flate 151M ± 7% 150M ± 8% ~ (p=0.662 n=44+43) GoParser 181M ± 5% 181M ± 4% ~ (p=0.379 n=41+44) Reflect 447M ± 4% 445M ± 4% ~ (p=0.344 n=43+43) Tar 125M ± 7% 124M ± 6% ~ (p=0.353 n=43+43) XML 248M ± 4% 250M ± 6% ~ (p=0.158 n=44+44) name old alloc/op new alloc/op delta Template 40.3MB ± 0% 40.2MB ± 0% -0.27% (p=0.000 n=10+10) Unicode 30.3MB ± 0% 30.2MB ± 0% -0.10% (p=0.015 n=10+10) GoTypes 114MB ± 0% 114MB ± 0% -0.06% (p=0.000 n=7+9) Compiler 480MB ± 0% 481MB ± 0% +0.07% (p=0.000 n=10+10) SSA 864MB ± 0% 862MB ± 0% -0.25% (p=0.000 n=9+10) Flate 25.9MB ± 0% 25.9MB ± 0% ~ (p=0.123 n=10+10) GoParser 32.1MB ± 0% 32.1MB ± 0% ~ (p=0.631 n=10+10) Reflect 79.9MB ± 0% 79.6MB ± 0% -0.39% (p=0.000 n=10+9) Tar 27.1MB ± 0% 27.0MB ± 0% -0.18% (p=0.003 n=10+10) XML 42.6MB ± 0% 42.6MB ± 0% ~ (p=0.143 n=10+10) name old allocs/op new allocs/op delta Template 401k ± 0% 401k ± 1% ~ (p=0.353 n=10+10) Unicode 322k ± 0% 322k ± 0% ~ (p=0.739 n=10+10) GoTypes 1.18M ± 0% 1.18M ± 0% +0.25% (p=0.001 n=7+8) Compiler 4.51M ± 0% 4.53M ± 0% +0.37% (p=0.000 n=10+10) SSA 7.91M ± 0% 7.93M ± 0% +0.20% (p=0.000 n=9+10) Flate 244k ± 0% 245k ± 0% ~ (p=0.123 n=10+10) GoParser 323k ± 1% 324k ± 1% +0.40% (p=0.035 n=10+10) Reflect 1.01M ± 0% 1.02M ± 0% +0.37% (p=0.000 n=10+9) Tar 258k ± 1% 258k ± 1% ~ (p=0.661 n=10+9) XML 403k ± 0% 405k ± 0% +0.47% (p=0.004 n=10+10) Updates #15756 Updates #19250 Change-Id: I647bfbb745c35630447eb79dfcaa994b490ce942 Reviewed-on: https://go-review.googlesource.com/38159 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-14 11:05:03 -07:00
switch n.Op {
case OCONTINUE:
to = s.continueTo
case OBREAK:
to = s.breakTo
}
} else {
// labeled break/continue; look up the target
sym := n.Sym
lab := s.label(sym)
switch n.Op {
case OCONTINUE:
to = lab.continueTarget
case OBREAK:
to = lab.breakTarget
}
}
cmd/compile: check labels and gotos before building SSA This CL introduces yet another compiler pass, which checks for correct control flow constructs prior to converting from AST to SSA form. It cannot be integrated with walk, since walk rewrites switch and select statements on the fly. To reduce code duplication, this CL also does some minor refactoring. With this pass in place, the AST to SSA converter can now stop generating SSA for any known-dead code. This minor savings pays for the minor cost of the new pass. Performance is almost a wash: name old time/op new time/op delta Template 206ms ± 4% 205ms ± 4% ~ (p=0.108 n=43+43) Unicode 84.0ms ± 4% 84.0ms ± 4% ~ (p=0.979 n=43+43) GoTypes 550ms ± 3% 553ms ± 3% ~ (p=0.065 n=40+41) Compiler 2.57s ± 4% 2.58s ± 2% ~ (p=0.103 n=44+41) SSA 3.94s ± 3% 3.93s ± 2% ~ (p=0.833 n=44+42) Flate 126ms ± 6% 125ms ± 4% ~ (p=0.941 n=43+39) GoParser 147ms ± 4% 148ms ± 3% ~ (p=0.164 n=42+39) Reflect 359ms ± 3% 357ms ± 5% ~ (p=0.241 n=43+44) Tar 106ms ± 5% 106ms ± 7% ~ (p=0.853 n=40+43) XML 202ms ± 3% 203ms ± 3% ~ (p=0.488 n=42+41) name old user-ns/op new user-ns/op delta Template 240M ± 4% 239M ± 4% ~ (p=0.844 n=42+43) Unicode 107M ± 5% 107M ± 4% ~ (p=0.332 n=40+43) GoTypes 735M ± 3% 731M ± 4% ~ (p=0.141 n=43+44) Compiler 3.51G ± 3% 3.52G ± 3% ~ (p=0.208 n=42+43) SSA 5.72G ± 4% 5.72G ± 3% ~ (p=0.928 n=44+42) Flate 151M ± 7% 150M ± 8% ~ (p=0.662 n=44+43) GoParser 181M ± 5% 181M ± 4% ~ (p=0.379 n=41+44) Reflect 447M ± 4% 445M ± 4% ~ (p=0.344 n=43+43) Tar 125M ± 7% 124M ± 6% ~ (p=0.353 n=43+43) XML 248M ± 4% 250M ± 6% ~ (p=0.158 n=44+44) name old alloc/op new alloc/op delta Template 40.3MB ± 0% 40.2MB ± 0% -0.27% (p=0.000 n=10+10) Unicode 30.3MB ± 0% 30.2MB ± 0% -0.10% (p=0.015 n=10+10) GoTypes 114MB ± 0% 114MB ± 0% -0.06% (p=0.000 n=7+9) Compiler 480MB ± 0% 481MB ± 0% +0.07% (p=0.000 n=10+10) SSA 864MB ± 0% 862MB ± 0% -0.25% (p=0.000 n=9+10) Flate 25.9MB ± 0% 25.9MB ± 0% ~ (p=0.123 n=10+10) GoParser 32.1MB ± 0% 32.1MB ± 0% ~ (p=0.631 n=10+10) Reflect 79.9MB ± 0% 79.6MB ± 0% -0.39% (p=0.000 n=10+9) Tar 27.1MB ± 0% 27.0MB ± 0% -0.18% (p=0.003 n=10+10) XML 42.6MB ± 0% 42.6MB ± 0% ~ (p=0.143 n=10+10) name old allocs/op new allocs/op delta Template 401k ± 0% 401k ± 1% ~ (p=0.353 n=10+10) Unicode 322k ± 0% 322k ± 0% ~ (p=0.739 n=10+10) GoTypes 1.18M ± 0% 1.18M ± 0% +0.25% (p=0.001 n=7+8) Compiler 4.51M ± 0% 4.53M ± 0% +0.37% (p=0.000 n=10+10) SSA 7.91M ± 0% 7.93M ± 0% +0.20% (p=0.000 n=9+10) Flate 244k ± 0% 245k ± 0% ~ (p=0.123 n=10+10) GoParser 323k ± 1% 324k ± 1% +0.40% (p=0.035 n=10+10) Reflect 1.01M ± 0% 1.02M ± 0% +0.37% (p=0.000 n=10+9) Tar 258k ± 1% 258k ± 1% ~ (p=0.661 n=10+9) XML 403k ± 0% 405k ± 0% +0.47% (p=0.004 n=10+10) Updates #15756 Updates #19250 Change-Id: I647bfbb745c35630447eb79dfcaa994b490ce942 Reviewed-on: https://go-review.googlesource.com/38159 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-14 11:05:03 -07:00
b := s.endBlock()
cmd/compile: assign and preserve statement boundaries. A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
b.Pos = s.lastPos.WithIsStmt() // Do this even if b is an empty block.
cmd/compile: check labels and gotos before building SSA This CL introduces yet another compiler pass, which checks for correct control flow constructs prior to converting from AST to SSA form. It cannot be integrated with walk, since walk rewrites switch and select statements on the fly. To reduce code duplication, this CL also does some minor refactoring. With this pass in place, the AST to SSA converter can now stop generating SSA for any known-dead code. This minor savings pays for the minor cost of the new pass. Performance is almost a wash: name old time/op new time/op delta Template 206ms ± 4% 205ms ± 4% ~ (p=0.108 n=43+43) Unicode 84.0ms ± 4% 84.0ms ± 4% ~ (p=0.979 n=43+43) GoTypes 550ms ± 3% 553ms ± 3% ~ (p=0.065 n=40+41) Compiler 2.57s ± 4% 2.58s ± 2% ~ (p=0.103 n=44+41) SSA 3.94s ± 3% 3.93s ± 2% ~ (p=0.833 n=44+42) Flate 126ms ± 6% 125ms ± 4% ~ (p=0.941 n=43+39) GoParser 147ms ± 4% 148ms ± 3% ~ (p=0.164 n=42+39) Reflect 359ms ± 3% 357ms ± 5% ~ (p=0.241 n=43+44) Tar 106ms ± 5% 106ms ± 7% ~ (p=0.853 n=40+43) XML 202ms ± 3% 203ms ± 3% ~ (p=0.488 n=42+41) name old user-ns/op new user-ns/op delta Template 240M ± 4% 239M ± 4% ~ (p=0.844 n=42+43) Unicode 107M ± 5% 107M ± 4% ~ (p=0.332 n=40+43) GoTypes 735M ± 3% 731M ± 4% ~ (p=0.141 n=43+44) Compiler 3.51G ± 3% 3.52G ± 3% ~ (p=0.208 n=42+43) SSA 5.72G ± 4% 5.72G ± 3% ~ (p=0.928 n=44+42) Flate 151M ± 7% 150M ± 8% ~ (p=0.662 n=44+43) GoParser 181M ± 5% 181M ± 4% ~ (p=0.379 n=41+44) Reflect 447M ± 4% 445M ± 4% ~ (p=0.344 n=43+43) Tar 125M ± 7% 124M ± 6% ~ (p=0.353 n=43+43) XML 248M ± 4% 250M ± 6% ~ (p=0.158 n=44+44) name old alloc/op new alloc/op delta Template 40.3MB ± 0% 40.2MB ± 0% -0.27% (p=0.000 n=10+10) Unicode 30.3MB ± 0% 30.2MB ± 0% -0.10% (p=0.015 n=10+10) GoTypes 114MB ± 0% 114MB ± 0% -0.06% (p=0.000 n=7+9) Compiler 480MB ± 0% 481MB ± 0% +0.07% (p=0.000 n=10+10) SSA 864MB ± 0% 862MB ± 0% -0.25% (p=0.000 n=9+10) Flate 25.9MB ± 0% 25.9MB ± 0% ~ (p=0.123 n=10+10) GoParser 32.1MB ± 0% 32.1MB ± 0% ~ (p=0.631 n=10+10) Reflect 79.9MB ± 0% 79.6MB ± 0% -0.39% (p=0.000 n=10+9) Tar 27.1MB ± 0% 27.0MB ± 0% -0.18% (p=0.003 n=10+10) XML 42.6MB ± 0% 42.6MB ± 0% ~ (p=0.143 n=10+10) name old allocs/op new allocs/op delta Template 401k ± 0% 401k ± 1% ~ (p=0.353 n=10+10) Unicode 322k ± 0% 322k ± 0% ~ (p=0.739 n=10+10) GoTypes 1.18M ± 0% 1.18M ± 0% +0.25% (p=0.001 n=7+8) Compiler 4.51M ± 0% 4.53M ± 0% +0.37% (p=0.000 n=10+10) SSA 7.91M ± 0% 7.93M ± 0% +0.20% (p=0.000 n=9+10) Flate 244k ± 0% 245k ± 0% ~ (p=0.123 n=10+10) GoParser 323k ± 1% 324k ± 1% +0.40% (p=0.035 n=10+10) Reflect 1.01M ± 0% 1.02M ± 0% +0.37% (p=0.000 n=10+9) Tar 258k ± 1% 258k ± 1% ~ (p=0.661 n=10+9) XML 403k ± 0% 405k ± 0% +0.47% (p=0.004 n=10+10) Updates #15756 Updates #19250 Change-Id: I647bfbb745c35630447eb79dfcaa994b490ce942 Reviewed-on: https://go-review.googlesource.com/38159 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-14 11:05:03 -07:00
b.AddEdgeTo(to)
case OFOR, OFORUNTIL:
// OFOR: for Ninit; Left; Right { Nbody }
cmd/compile: don't produce a past-the-end pointer in range loops Currently, range loops over slices and arrays are compiled roughly like: for i, x := range s { b } ⇓ for i, _n, _p := 0, len(s), &s[0]; i < _n; i, _p = i+1, _p + unsafe.Sizeof(s[0]) { b } ⇓ i, _n, _p := 0, len(s), &s[0] goto cond body: { b } i, _p = i+1, _p + unsafe.Sizeof(s[0]) cond: if i < _n { goto body } else { goto end } end: The problem with this lowering is that _p may temporarily point past the end of the allocation the moment before the loop terminates. Right now this isn't a problem because there's never a safe-point during this brief moment. We're about to introduce safe-points everywhere, so this bad pointer is going to be a problem. We could mark the increment as an unsafe block, but this inhibits reordering opportunities and could result in infrequent safe-points if the body is short. Instead, this CL fixes this by changing how we compile range loops to never produce this past-the-end pointer. It changes the lowering to roughly: i, _n, _p := 0, len(s), &s[0] if i < _n { goto body } else { goto end } top: _p += unsafe.Sizeof(s[0]) body: { b } i++ if i < _n { goto top } else { goto end } end: Notably, the increment is split into two parts: we increment the index before checking the condition, but increment the pointer only *after* the condition check has succeeded. The implementation builds on the OFORUNTIL construct that was introduced during the loop preemption experiments, since OFORUNTIL places the increment and condition after the loop body. To support the extra "late increment" step, we further define OFORUNTIL's "List" field to contain the late increment statements. This makes all of this a relatively small change. This depends on the improvements to the prove pass in CL 102603. With the current lowering, bounds-check elimination knows that i < _n in the body because the body block is dominated by the cond block. In the new lowering, deriving this fact requires detecting that i < _n on *both* paths into body and hence is true in body. CL 102603 made prove able to detect this. The code size effect of this is minimal. The cmd/go binary on linux/amd64 increases by 0.17%. Performance-wise, this actually appears to be a net win, though it's mostly noise: name old time/op new time/op delta BinaryTree17-12 2.80s ± 0% 2.61s ± 1% -6.88% (p=0.000 n=20+18) Fannkuch11-12 2.41s ± 0% 2.42s ± 0% +0.05% (p=0.005 n=20+20) FmtFprintfEmpty-12 41.6ns ± 5% 41.4ns ± 6% ~ (p=0.765 n=20+19) FmtFprintfString-12 69.4ns ± 3% 69.3ns ± 1% ~ (p=0.084 n=19+17) FmtFprintfInt-12 76.1ns ± 1% 77.3ns ± 1% +1.57% (p=0.000 n=19+19) FmtFprintfIntInt-12 122ns ± 2% 123ns ± 3% +0.95% (p=0.015 n=20+20) FmtFprintfPrefixedInt-12 153ns ± 2% 151ns ± 3% -1.27% (p=0.013 n=20+20) FmtFprintfFloat-12 215ns ± 0% 216ns ± 0% +0.47% (p=0.000 n=20+16) FmtManyArgs-12 486ns ± 1% 498ns ± 0% +2.40% (p=0.000 n=20+17) GobDecode-12 6.43ms ± 0% 6.50ms ± 0% +1.08% (p=0.000 n=18+19) GobEncode-12 5.43ms ± 1% 5.47ms ± 0% +0.76% (p=0.000 n=20+20) Gzip-12 218ms ± 1% 218ms ± 1% ~ (p=0.883 n=20+20) Gunzip-12 38.8ms ± 0% 38.9ms ± 0% ~ (p=0.644 n=19+19) HTTPClientServer-12 76.2µs ± 1% 76.4µs ± 2% ~ (p=0.218 n=20+20) JSONEncode-12 12.2ms ± 0% 12.3ms ± 1% +0.45% (p=0.000 n=19+19) JSONDecode-12 54.2ms ± 1% 53.3ms ± 0% -1.67% (p=0.000 n=20+20) Mandelbrot200-12 3.71ms ± 0% 3.71ms ± 0% ~ (p=0.143 n=19+20) GoParse-12 3.22ms ± 0% 3.19ms ± 1% -0.72% (p=0.000 n=20+20) RegexpMatchEasy0_32-12 76.7ns ± 1% 75.8ns ± 1% -1.19% (p=0.000 n=20+17) RegexpMatchEasy0_1K-12 245ns ± 1% 243ns ± 0% -0.72% (p=0.000 n=18+17) RegexpMatchEasy1_32-12 71.9ns ± 0% 71.7ns ± 1% -0.39% (p=0.006 n=12+18) RegexpMatchEasy1_1K-12 358ns ± 1% 354ns ± 1% -1.13% (p=0.000 n=20+19) RegexpMatchMedium_32-12 105ns ± 2% 105ns ± 1% -0.63% (p=0.007 n=19+20) RegexpMatchMedium_1K-12 31.9µs ± 1% 31.9µs ± 1% ~ (p=1.000 n=17+17) RegexpMatchHard_32-12 1.51µs ± 1% 1.52µs ± 2% +0.46% (p=0.042 n=18+18) RegexpMatchHard_1K-12 45.3µs ± 1% 45.5µs ± 2% +0.44% (p=0.029 n=18+19) Revcomp-12 388ms ± 1% 385ms ± 0% -0.57% (p=0.000 n=19+18) Template-12 63.0ms ± 1% 63.3ms ± 0% +0.50% (p=0.000 n=19+20) TimeParse-12 309ns ± 1% 307ns ± 0% -0.62% (p=0.000 n=20+20) TimeFormat-12 328ns ± 0% 333ns ± 0% +1.35% (p=0.000 n=19+19) [Geo mean] 47.0µs 46.9µs -0.20% (https://perf.golang.org/search?q=upload:20180326.1) For #10958. For #24543. Change-Id: Icbd52e711fdbe7938a1fea3e6baca1104b53ac3a Reviewed-on: https://go-review.googlesource.com/102604 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-03-22 12:04:51 -04:00
// cond (Left); body (Nbody); incr (Right)
//
// OFORUNTIL: for Ninit; Left; Right; List { Nbody }
// => body: { Nbody }; incr: Right; if Left { lateincr: List; goto body }; end:
bCond := s.f.NewBlock(ssa.BlockPlain)
bBody := s.f.NewBlock(ssa.BlockPlain)
bIncr := s.f.NewBlock(ssa.BlockPlain)
bEnd := s.f.NewBlock(ssa.BlockPlain)
// ensure empty for loops have correct position; issue #30167
bBody.Pos = n.Pos
// first, jump to condition test (OFOR) or body (OFORUNTIL)
b := s.endBlock()
if n.Op == OFOR {
b.AddEdgeTo(bCond)
// generate code to test condition
s.startBlock(bCond)
if n.Left != nil {
s.condBranch(n.Left, bBody, bEnd, 1)
} else {
b := s.endBlock()
b.Kind = ssa.BlockPlain
b.AddEdgeTo(bBody)
}
} else {
b.AddEdgeTo(bBody)
}
// set up for continue/break in body
prevContinue := s.continueTo
prevBreak := s.breakTo
s.continueTo = bIncr
s.breakTo = bEnd
lab := s.labeledNodes[n]
if lab != nil {
// labeled for loop
lab.continueTarget = bIncr
lab.breakTarget = bEnd
}
// generate body
s.startBlock(bBody)
s.stmtList(n.Nbody)
// tear down continue/break
s.continueTo = prevContinue
s.breakTo = prevBreak
if lab != nil {
lab.continueTarget = nil
lab.breakTarget = nil
}
// done with body, goto incr
if b := s.endBlock(); b != nil {
b.AddEdgeTo(bIncr)
}
cmd/compile: don't produce a past-the-end pointer in range loops Currently, range loops over slices and arrays are compiled roughly like: for i, x := range s { b } ⇓ for i, _n, _p := 0, len(s), &s[0]; i < _n; i, _p = i+1, _p + unsafe.Sizeof(s[0]) { b } ⇓ i, _n, _p := 0, len(s), &s[0] goto cond body: { b } i, _p = i+1, _p + unsafe.Sizeof(s[0]) cond: if i < _n { goto body } else { goto end } end: The problem with this lowering is that _p may temporarily point past the end of the allocation the moment before the loop terminates. Right now this isn't a problem because there's never a safe-point during this brief moment. We're about to introduce safe-points everywhere, so this bad pointer is going to be a problem. We could mark the increment as an unsafe block, but this inhibits reordering opportunities and could result in infrequent safe-points if the body is short. Instead, this CL fixes this by changing how we compile range loops to never produce this past-the-end pointer. It changes the lowering to roughly: i, _n, _p := 0, len(s), &s[0] if i < _n { goto body } else { goto end } top: _p += unsafe.Sizeof(s[0]) body: { b } i++ if i < _n { goto top } else { goto end } end: Notably, the increment is split into two parts: we increment the index before checking the condition, but increment the pointer only *after* the condition check has succeeded. The implementation builds on the OFORUNTIL construct that was introduced during the loop preemption experiments, since OFORUNTIL places the increment and condition after the loop body. To support the extra "late increment" step, we further define OFORUNTIL's "List" field to contain the late increment statements. This makes all of this a relatively small change. This depends on the improvements to the prove pass in CL 102603. With the current lowering, bounds-check elimination knows that i < _n in the body because the body block is dominated by the cond block. In the new lowering, deriving this fact requires detecting that i < _n on *both* paths into body and hence is true in body. CL 102603 made prove able to detect this. The code size effect of this is minimal. The cmd/go binary on linux/amd64 increases by 0.17%. Performance-wise, this actually appears to be a net win, though it's mostly noise: name old time/op new time/op delta BinaryTree17-12 2.80s ± 0% 2.61s ± 1% -6.88% (p=0.000 n=20+18) Fannkuch11-12 2.41s ± 0% 2.42s ± 0% +0.05% (p=0.005 n=20+20) FmtFprintfEmpty-12 41.6ns ± 5% 41.4ns ± 6% ~ (p=0.765 n=20+19) FmtFprintfString-12 69.4ns ± 3% 69.3ns ± 1% ~ (p=0.084 n=19+17) FmtFprintfInt-12 76.1ns ± 1% 77.3ns ± 1% +1.57% (p=0.000 n=19+19) FmtFprintfIntInt-12 122ns ± 2% 123ns ± 3% +0.95% (p=0.015 n=20+20) FmtFprintfPrefixedInt-12 153ns ± 2% 151ns ± 3% -1.27% (p=0.013 n=20+20) FmtFprintfFloat-12 215ns ± 0% 216ns ± 0% +0.47% (p=0.000 n=20+16) FmtManyArgs-12 486ns ± 1% 498ns ± 0% +2.40% (p=0.000 n=20+17) GobDecode-12 6.43ms ± 0% 6.50ms ± 0% +1.08% (p=0.000 n=18+19) GobEncode-12 5.43ms ± 1% 5.47ms ± 0% +0.76% (p=0.000 n=20+20) Gzip-12 218ms ± 1% 218ms ± 1% ~ (p=0.883 n=20+20) Gunzip-12 38.8ms ± 0% 38.9ms ± 0% ~ (p=0.644 n=19+19) HTTPClientServer-12 76.2µs ± 1% 76.4µs ± 2% ~ (p=0.218 n=20+20) JSONEncode-12 12.2ms ± 0% 12.3ms ± 1% +0.45% (p=0.000 n=19+19) JSONDecode-12 54.2ms ± 1% 53.3ms ± 0% -1.67% (p=0.000 n=20+20) Mandelbrot200-12 3.71ms ± 0% 3.71ms ± 0% ~ (p=0.143 n=19+20) GoParse-12 3.22ms ± 0% 3.19ms ± 1% -0.72% (p=0.000 n=20+20) RegexpMatchEasy0_32-12 76.7ns ± 1% 75.8ns ± 1% -1.19% (p=0.000 n=20+17) RegexpMatchEasy0_1K-12 245ns ± 1% 243ns ± 0% -0.72% (p=0.000 n=18+17) RegexpMatchEasy1_32-12 71.9ns ± 0% 71.7ns ± 1% -0.39% (p=0.006 n=12+18) RegexpMatchEasy1_1K-12 358ns ± 1% 354ns ± 1% -1.13% (p=0.000 n=20+19) RegexpMatchMedium_32-12 105ns ± 2% 105ns ± 1% -0.63% (p=0.007 n=19+20) RegexpMatchMedium_1K-12 31.9µs ± 1% 31.9µs ± 1% ~ (p=1.000 n=17+17) RegexpMatchHard_32-12 1.51µs ± 1% 1.52µs ± 2% +0.46% (p=0.042 n=18+18) RegexpMatchHard_1K-12 45.3µs ± 1% 45.5µs ± 2% +0.44% (p=0.029 n=18+19) Revcomp-12 388ms ± 1% 385ms ± 0% -0.57% (p=0.000 n=19+18) Template-12 63.0ms ± 1% 63.3ms ± 0% +0.50% (p=0.000 n=19+20) TimeParse-12 309ns ± 1% 307ns ± 0% -0.62% (p=0.000 n=20+20) TimeFormat-12 328ns ± 0% 333ns ± 0% +1.35% (p=0.000 n=19+19) [Geo mean] 47.0µs 46.9µs -0.20% (https://perf.golang.org/search?q=upload:20180326.1) For #10958. For #24543. Change-Id: Icbd52e711fdbe7938a1fea3e6baca1104b53ac3a Reviewed-on: https://go-review.googlesource.com/102604 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-03-22 12:04:51 -04:00
// generate incr (and, for OFORUNTIL, condition)
s.startBlock(bIncr)
if n.Right != nil {
s.stmt(n.Right)
}
cmd/compile: don't produce a past-the-end pointer in range loops Currently, range loops over slices and arrays are compiled roughly like: for i, x := range s { b } ⇓ for i, _n, _p := 0, len(s), &s[0]; i < _n; i, _p = i+1, _p + unsafe.Sizeof(s[0]) { b } ⇓ i, _n, _p := 0, len(s), &s[0] goto cond body: { b } i, _p = i+1, _p + unsafe.Sizeof(s[0]) cond: if i < _n { goto body } else { goto end } end: The problem with this lowering is that _p may temporarily point past the end of the allocation the moment before the loop terminates. Right now this isn't a problem because there's never a safe-point during this brief moment. We're about to introduce safe-points everywhere, so this bad pointer is going to be a problem. We could mark the increment as an unsafe block, but this inhibits reordering opportunities and could result in infrequent safe-points if the body is short. Instead, this CL fixes this by changing how we compile range loops to never produce this past-the-end pointer. It changes the lowering to roughly: i, _n, _p := 0, len(s), &s[0] if i < _n { goto body } else { goto end } top: _p += unsafe.Sizeof(s[0]) body: { b } i++ if i < _n { goto top } else { goto end } end: Notably, the increment is split into two parts: we increment the index before checking the condition, but increment the pointer only *after* the condition check has succeeded. The implementation builds on the OFORUNTIL construct that was introduced during the loop preemption experiments, since OFORUNTIL places the increment and condition after the loop body. To support the extra "late increment" step, we further define OFORUNTIL's "List" field to contain the late increment statements. This makes all of this a relatively small change. This depends on the improvements to the prove pass in CL 102603. With the current lowering, bounds-check elimination knows that i < _n in the body because the body block is dominated by the cond block. In the new lowering, deriving this fact requires detecting that i < _n on *both* paths into body and hence is true in body. CL 102603 made prove able to detect this. The code size effect of this is minimal. The cmd/go binary on linux/amd64 increases by 0.17%. Performance-wise, this actually appears to be a net win, though it's mostly noise: name old time/op new time/op delta BinaryTree17-12 2.80s ± 0% 2.61s ± 1% -6.88% (p=0.000 n=20+18) Fannkuch11-12 2.41s ± 0% 2.42s ± 0% +0.05% (p=0.005 n=20+20) FmtFprintfEmpty-12 41.6ns ± 5% 41.4ns ± 6% ~ (p=0.765 n=20+19) FmtFprintfString-12 69.4ns ± 3% 69.3ns ± 1% ~ (p=0.084 n=19+17) FmtFprintfInt-12 76.1ns ± 1% 77.3ns ± 1% +1.57% (p=0.000 n=19+19) FmtFprintfIntInt-12 122ns ± 2% 123ns ± 3% +0.95% (p=0.015 n=20+20) FmtFprintfPrefixedInt-12 153ns ± 2% 151ns ± 3% -1.27% (p=0.013 n=20+20) FmtFprintfFloat-12 215ns ± 0% 216ns ± 0% +0.47% (p=0.000 n=20+16) FmtManyArgs-12 486ns ± 1% 498ns ± 0% +2.40% (p=0.000 n=20+17) GobDecode-12 6.43ms ± 0% 6.50ms ± 0% +1.08% (p=0.000 n=18+19) GobEncode-12 5.43ms ± 1% 5.47ms ± 0% +0.76% (p=0.000 n=20+20) Gzip-12 218ms ± 1% 218ms ± 1% ~ (p=0.883 n=20+20) Gunzip-12 38.8ms ± 0% 38.9ms ± 0% ~ (p=0.644 n=19+19) HTTPClientServer-12 76.2µs ± 1% 76.4µs ± 2% ~ (p=0.218 n=20+20) JSONEncode-12 12.2ms ± 0% 12.3ms ± 1% +0.45% (p=0.000 n=19+19) JSONDecode-12 54.2ms ± 1% 53.3ms ± 0% -1.67% (p=0.000 n=20+20) Mandelbrot200-12 3.71ms ± 0% 3.71ms ± 0% ~ (p=0.143 n=19+20) GoParse-12 3.22ms ± 0% 3.19ms ± 1% -0.72% (p=0.000 n=20+20) RegexpMatchEasy0_32-12 76.7ns ± 1% 75.8ns ± 1% -1.19% (p=0.000 n=20+17) RegexpMatchEasy0_1K-12 245ns ± 1% 243ns ± 0% -0.72% (p=0.000 n=18+17) RegexpMatchEasy1_32-12 71.9ns ± 0% 71.7ns ± 1% -0.39% (p=0.006 n=12+18) RegexpMatchEasy1_1K-12 358ns ± 1% 354ns ± 1% -1.13% (p=0.000 n=20+19) RegexpMatchMedium_32-12 105ns ± 2% 105ns ± 1% -0.63% (p=0.007 n=19+20) RegexpMatchMedium_1K-12 31.9µs ± 1% 31.9µs ± 1% ~ (p=1.000 n=17+17) RegexpMatchHard_32-12 1.51µs ± 1% 1.52µs ± 2% +0.46% (p=0.042 n=18+18) RegexpMatchHard_1K-12 45.3µs ± 1% 45.5µs ± 2% +0.44% (p=0.029 n=18+19) Revcomp-12 388ms ± 1% 385ms ± 0% -0.57% (p=0.000 n=19+18) Template-12 63.0ms ± 1% 63.3ms ± 0% +0.50% (p=0.000 n=19+20) TimeParse-12 309ns ± 1% 307ns ± 0% -0.62% (p=0.000 n=20+20) TimeFormat-12 328ns ± 0% 333ns ± 0% +1.35% (p=0.000 n=19+19) [Geo mean] 47.0µs 46.9µs -0.20% (https://perf.golang.org/search?q=upload:20180326.1) For #10958. For #24543. Change-Id: Icbd52e711fdbe7938a1fea3e6baca1104b53ac3a Reviewed-on: https://go-review.googlesource.com/102604 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-03-22 12:04:51 -04:00
if n.Op == OFOR {
if b := s.endBlock(); b != nil {
b.AddEdgeTo(bCond)
// It can happen that bIncr ends in a block containing only VARKILL,
// and that muddles the debugging experience.
if n.Op != OFORUNTIL && b.Pos == src.NoXPos {
b.Pos = bCond.Pos
}
}
cmd/compile: don't produce a past-the-end pointer in range loops Currently, range loops over slices and arrays are compiled roughly like: for i, x := range s { b } ⇓ for i, _n, _p := 0, len(s), &s[0]; i < _n; i, _p = i+1, _p + unsafe.Sizeof(s[0]) { b } ⇓ i, _n, _p := 0, len(s), &s[0] goto cond body: { b } i, _p = i+1, _p + unsafe.Sizeof(s[0]) cond: if i < _n { goto body } else { goto end } end: The problem with this lowering is that _p may temporarily point past the end of the allocation the moment before the loop terminates. Right now this isn't a problem because there's never a safe-point during this brief moment. We're about to introduce safe-points everywhere, so this bad pointer is going to be a problem. We could mark the increment as an unsafe block, but this inhibits reordering opportunities and could result in infrequent safe-points if the body is short. Instead, this CL fixes this by changing how we compile range loops to never produce this past-the-end pointer. It changes the lowering to roughly: i, _n, _p := 0, len(s), &s[0] if i < _n { goto body } else { goto end } top: _p += unsafe.Sizeof(s[0]) body: { b } i++ if i < _n { goto top } else { goto end } end: Notably, the increment is split into two parts: we increment the index before checking the condition, but increment the pointer only *after* the condition check has succeeded. The implementation builds on the OFORUNTIL construct that was introduced during the loop preemption experiments, since OFORUNTIL places the increment and condition after the loop body. To support the extra "late increment" step, we further define OFORUNTIL's "List" field to contain the late increment statements. This makes all of this a relatively small change. This depends on the improvements to the prove pass in CL 102603. With the current lowering, bounds-check elimination knows that i < _n in the body because the body block is dominated by the cond block. In the new lowering, deriving this fact requires detecting that i < _n on *both* paths into body and hence is true in body. CL 102603 made prove able to detect this. The code size effect of this is minimal. The cmd/go binary on linux/amd64 increases by 0.17%. Performance-wise, this actually appears to be a net win, though it's mostly noise: name old time/op new time/op delta BinaryTree17-12 2.80s ± 0% 2.61s ± 1% -6.88% (p=0.000 n=20+18) Fannkuch11-12 2.41s ± 0% 2.42s ± 0% +0.05% (p=0.005 n=20+20) FmtFprintfEmpty-12 41.6ns ± 5% 41.4ns ± 6% ~ (p=0.765 n=20+19) FmtFprintfString-12 69.4ns ± 3% 69.3ns ± 1% ~ (p=0.084 n=19+17) FmtFprintfInt-12 76.1ns ± 1% 77.3ns ± 1% +1.57% (p=0.000 n=19+19) FmtFprintfIntInt-12 122ns ± 2% 123ns ± 3% +0.95% (p=0.015 n=20+20) FmtFprintfPrefixedInt-12 153ns ± 2% 151ns ± 3% -1.27% (p=0.013 n=20+20) FmtFprintfFloat-12 215ns ± 0% 216ns ± 0% +0.47% (p=0.000 n=20+16) FmtManyArgs-12 486ns ± 1% 498ns ± 0% +2.40% (p=0.000 n=20+17) GobDecode-12 6.43ms ± 0% 6.50ms ± 0% +1.08% (p=0.000 n=18+19) GobEncode-12 5.43ms ± 1% 5.47ms ± 0% +0.76% (p=0.000 n=20+20) Gzip-12 218ms ± 1% 218ms ± 1% ~ (p=0.883 n=20+20) Gunzip-12 38.8ms ± 0% 38.9ms ± 0% ~ (p=0.644 n=19+19) HTTPClientServer-12 76.2µs ± 1% 76.4µs ± 2% ~ (p=0.218 n=20+20) JSONEncode-12 12.2ms ± 0% 12.3ms ± 1% +0.45% (p=0.000 n=19+19) JSONDecode-12 54.2ms ± 1% 53.3ms ± 0% -1.67% (p=0.000 n=20+20) Mandelbrot200-12 3.71ms ± 0% 3.71ms ± 0% ~ (p=0.143 n=19+20) GoParse-12 3.22ms ± 0% 3.19ms ± 1% -0.72% (p=0.000 n=20+20) RegexpMatchEasy0_32-12 76.7ns ± 1% 75.8ns ± 1% -1.19% (p=0.000 n=20+17) RegexpMatchEasy0_1K-12 245ns ± 1% 243ns ± 0% -0.72% (p=0.000 n=18+17) RegexpMatchEasy1_32-12 71.9ns ± 0% 71.7ns ± 1% -0.39% (p=0.006 n=12+18) RegexpMatchEasy1_1K-12 358ns ± 1% 354ns ± 1% -1.13% (p=0.000 n=20+19) RegexpMatchMedium_32-12 105ns ± 2% 105ns ± 1% -0.63% (p=0.007 n=19+20) RegexpMatchMedium_1K-12 31.9µs ± 1% 31.9µs ± 1% ~ (p=1.000 n=17+17) RegexpMatchHard_32-12 1.51µs ± 1% 1.52µs ± 2% +0.46% (p=0.042 n=18+18) RegexpMatchHard_1K-12 45.3µs ± 1% 45.5µs ± 2% +0.44% (p=0.029 n=18+19) Revcomp-12 388ms ± 1% 385ms ± 0% -0.57% (p=0.000 n=19+18) Template-12 63.0ms ± 1% 63.3ms ± 0% +0.50% (p=0.000 n=19+20) TimeParse-12 309ns ± 1% 307ns ± 0% -0.62% (p=0.000 n=20+20) TimeFormat-12 328ns ± 0% 333ns ± 0% +1.35% (p=0.000 n=19+19) [Geo mean] 47.0µs 46.9µs -0.20% (https://perf.golang.org/search?q=upload:20180326.1) For #10958. For #24543. Change-Id: Icbd52e711fdbe7938a1fea3e6baca1104b53ac3a Reviewed-on: https://go-review.googlesource.com/102604 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-03-22 12:04:51 -04:00
} else {
// bCond is unused in OFORUNTIL, so repurpose it.
bLateIncr := bCond
// test condition
s.condBranch(n.Left, bLateIncr, bEnd, 1)
// generate late increment
s.startBlock(bLateIncr)
s.stmtList(n.List)
s.endBlock().AddEdgeTo(bBody)
}
s.startBlock(bEnd)
case OSWITCH, OSELECT:
// These have been mostly rewritten by the front end into their Nbody fields.
// Our main task is to correctly hook up any break statements.
bEnd := s.f.NewBlock(ssa.BlockPlain)
prevBreak := s.breakTo
s.breakTo = bEnd
lab := s.labeledNodes[n]
if lab != nil {
// labeled
lab.breakTarget = bEnd
}
// generate body code
s.stmtList(n.Nbody)
s.breakTo = prevBreak
if lab != nil {
lab.breakTarget = nil
}
// walk adds explicit OBREAK nodes to the end of all reachable code paths.
// If we still have a current block here, then mark it unreachable.
if s.curBlock != nil {
m := s.mem()
b := s.endBlock()
b.Kind = ssa.BlockExit
b.SetControl(m)
}
s.startBlock(bEnd)
case OVARDEF:
if !s.canSSA(n.Left) {
s.vars[&memVar] = s.newValue1Apos(ssa.OpVarDef, types.TypeMem, n.Left, s.mem(), false)
}
case OVARKILL:
// Insert a varkill op to record that a variable is no longer live.
// We only care about liveness info at call sites, so putting the
// varkill in the store chain is enough to keep it correctly ordered
// with respect to call ops.
if !s.canSSA(n.Left) {
s.vars[&memVar] = s.newValue1Apos(ssa.OpVarKill, types.TypeMem, n.Left, s.mem(), false)
}
case OVARLIVE:
// Insert a varlive op to record that a variable is still live.
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
if !n.Left.Addrtaken() {
s.Fatalf("VARLIVE variable %v must have Addrtaken set", n.Left)
}
switch n.Left.Class() {
case PAUTO, PPARAM, PPARAMOUT:
default:
s.Fatalf("VARLIVE variable %v must be Auto or Arg", n.Left)
}
s.vars[&memVar] = s.newValue1A(ssa.OpVarLive, types.TypeMem, n.Left, s.mem())
case OCHECKNIL:
p := s.expr(n.Left)
s.nilCheck(p)
case OINLMARK:
s.newValue1I(ssa.OpInlMark, types.TypeVoid, n.Xoffset, s.mem())
default:
s.Fatalf("unhandled stmt %v", n.Op)
}
}
// exit processes any code that needs to be generated just before returning.
// It returns a BlockRet block that ends the control flow. Its control value
// will be set to the final memory state.
func (s *state) exit() *ssa.Block {
if s.hasdefer {
s.rtcall(Deferreturn, true, nil)
}
// Run exit code. Typically, this code copies heap-allocated PPARAMOUT
// variables back to the stack.
s.stmtList(s.curfn.Func.Exit)
// Store SSAable PPARAMOUT variables back to stack locations.
for _, n := range s.returns {
addr := s.decladdrs[n]
val := s.variable(n, n.Type)
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
s.vars[&memVar] = s.newValue1A(ssa.OpVarDef, types.TypeMem, n, s.mem())
s.store(n.Type, addr, val)
// TODO: if val is ever spilled, we'd like to use the
// PPARAMOUT slot for spilling it. That won't happen
// currently.
}
// Do actual return.
m := s.mem()
b := s.endBlock()
b.Kind = ssa.BlockRet
b.SetControl(m)
return b
}
type opAndType struct {
op Op
etype types.EType
}
var opToSSA = map[opAndType]ssa.Op{
opAndType{OADD, TINT8}: ssa.OpAdd8,
opAndType{OADD, TUINT8}: ssa.OpAdd8,
opAndType{OADD, TINT16}: ssa.OpAdd16,
opAndType{OADD, TUINT16}: ssa.OpAdd16,
opAndType{OADD, TINT32}: ssa.OpAdd32,
opAndType{OADD, TUINT32}: ssa.OpAdd32,
opAndType{OADD, TINT64}: ssa.OpAdd64,
opAndType{OADD, TUINT64}: ssa.OpAdd64,
opAndType{OADD, TFLOAT32}: ssa.OpAdd32F,
opAndType{OADD, TFLOAT64}: ssa.OpAdd64F,
opAndType{OSUB, TINT8}: ssa.OpSub8,
opAndType{OSUB, TUINT8}: ssa.OpSub8,
opAndType{OSUB, TINT16}: ssa.OpSub16,
opAndType{OSUB, TUINT16}: ssa.OpSub16,
opAndType{OSUB, TINT32}: ssa.OpSub32,
opAndType{OSUB, TUINT32}: ssa.OpSub32,
opAndType{OSUB, TINT64}: ssa.OpSub64,
opAndType{OSUB, TUINT64}: ssa.OpSub64,
opAndType{OSUB, TFLOAT32}: ssa.OpSub32F,
opAndType{OSUB, TFLOAT64}: ssa.OpSub64F,
opAndType{ONOT, TBOOL}: ssa.OpNot,
cmd/compile: bulk rename This change does a bulk rename of several identifiers in the compiler. See #27167 and https://docs.google.com/document/d/19_ExiylD9MRfeAjKIfEsMU1_RGhuxB9sA0b5Zv7byVI/ for context and for discussion of these particular renames. Commands run to generate this change: gorename -from '"cmd/compile/internal/gc".OPROC' -to OGO gorename -from '"cmd/compile/internal/gc".OCOM' -to OBITNOT gorename -from '"cmd/compile/internal/gc".OMINUS' -to ONEG gorename -from '"cmd/compile/internal/gc".OIND' -to ODEREF gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTR' -to OBYTES2STR gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTRTMP' -to OBYTES2STRTMP gorename -from '"cmd/compile/internal/gc".OARRAYRUNESTR' -to ORUNES2STR gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTE' -to OSTR2BYTES gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTETMP' -to OSTR2BYTESTMP gorename -from '"cmd/compile/internal/gc".OSTRARRAYRUNE' -to OSTR2RUNES gorename -from '"cmd/compile/internal/gc".Etop' -to ctxStmt gorename -from '"cmd/compile/internal/gc".Erv' -to ctxExpr gorename -from '"cmd/compile/internal/gc".Ecall' -to ctxCallee gorename -from '"cmd/compile/internal/gc".Efnstruct' -to ctxMultiOK gorename -from '"cmd/compile/internal/gc".Easgn' -to ctxAssign gorename -from '"cmd/compile/internal/gc".Ecomplit' -to ctxCompLit Not altered: parameters and local variables (mostly in typecheck.go) named top, which should probably now be called ctx (and which should probably have a named type). Also not altered: Field called Top in gc.Func. gorename -from '"cmd/compile/internal/gc".Node.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/gc".Node.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/gc".nodeIsddd' -to nodeIsDDD gorename -from '"cmd/compile/internal/types".Field.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/types".Field.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/types".fieldIsddd' -to fieldIsDDD Not altered: function gc.hasddd, params and local variables called isddd Also not altered: fmt.go prints nodes using "isddd(%v)". cd cmd/compile/internal/gc; go generate I then manually found impacted comments using exact string match and fixed them up by hand. The comment changes were trivial. Passes toolstash-check. Fixes #27167. If this experiment is deemed a success, we will open a new tracking issue for renames to do at the end of the 1.13 cycles. Change-Id: I2dc541533d2ab0d06cb3d31d65df205ecfb151e8 Reviewed-on: https://go-review.googlesource.com/c/150140 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-11-18 08:34:38 -08:00
opAndType{ONEG, TINT8}: ssa.OpNeg8,
opAndType{ONEG, TUINT8}: ssa.OpNeg8,
opAndType{ONEG, TINT16}: ssa.OpNeg16,
opAndType{ONEG, TUINT16}: ssa.OpNeg16,
opAndType{ONEG, TINT32}: ssa.OpNeg32,
opAndType{ONEG, TUINT32}: ssa.OpNeg32,
opAndType{ONEG, TINT64}: ssa.OpNeg64,
opAndType{ONEG, TUINT64}: ssa.OpNeg64,
opAndType{ONEG, TFLOAT32}: ssa.OpNeg32F,
opAndType{ONEG, TFLOAT64}: ssa.OpNeg64F,
opAndType{OBITNOT, TINT8}: ssa.OpCom8,
opAndType{OBITNOT, TUINT8}: ssa.OpCom8,
opAndType{OBITNOT, TINT16}: ssa.OpCom16,
opAndType{OBITNOT, TUINT16}: ssa.OpCom16,
opAndType{OBITNOT, TINT32}: ssa.OpCom32,
opAndType{OBITNOT, TUINT32}: ssa.OpCom32,
opAndType{OBITNOT, TINT64}: ssa.OpCom64,
opAndType{OBITNOT, TUINT64}: ssa.OpCom64,
opAndType{OIMAG, TCOMPLEX64}: ssa.OpComplexImag,
opAndType{OIMAG, TCOMPLEX128}: ssa.OpComplexImag,
opAndType{OREAL, TCOMPLEX64}: ssa.OpComplexReal,
opAndType{OREAL, TCOMPLEX128}: ssa.OpComplexReal,
opAndType{OMUL, TINT8}: ssa.OpMul8,
opAndType{OMUL, TUINT8}: ssa.OpMul8,
opAndType{OMUL, TINT16}: ssa.OpMul16,
opAndType{OMUL, TUINT16}: ssa.OpMul16,
opAndType{OMUL, TINT32}: ssa.OpMul32,
opAndType{OMUL, TUINT32}: ssa.OpMul32,
opAndType{OMUL, TINT64}: ssa.OpMul64,
opAndType{OMUL, TUINT64}: ssa.OpMul64,
opAndType{OMUL, TFLOAT32}: ssa.OpMul32F,
opAndType{OMUL, TFLOAT64}: ssa.OpMul64F,
opAndType{ODIV, TFLOAT32}: ssa.OpDiv32F,
opAndType{ODIV, TFLOAT64}: ssa.OpDiv64F,
opAndType{ODIV, TINT8}: ssa.OpDiv8,
opAndType{ODIV, TUINT8}: ssa.OpDiv8u,
opAndType{ODIV, TINT16}: ssa.OpDiv16,
opAndType{ODIV, TUINT16}: ssa.OpDiv16u,
opAndType{ODIV, TINT32}: ssa.OpDiv32,
opAndType{ODIV, TUINT32}: ssa.OpDiv32u,
opAndType{ODIV, TINT64}: ssa.OpDiv64,
opAndType{ODIV, TUINT64}: ssa.OpDiv64u,
opAndType{OMOD, TINT8}: ssa.OpMod8,
opAndType{OMOD, TUINT8}: ssa.OpMod8u,
opAndType{OMOD, TINT16}: ssa.OpMod16,
opAndType{OMOD, TUINT16}: ssa.OpMod16u,
opAndType{OMOD, TINT32}: ssa.OpMod32,
opAndType{OMOD, TUINT32}: ssa.OpMod32u,
opAndType{OMOD, TINT64}: ssa.OpMod64,
opAndType{OMOD, TUINT64}: ssa.OpMod64u,
opAndType{OAND, TINT8}: ssa.OpAnd8,
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
opAndType{OAND, TUINT8}: ssa.OpAnd8,
opAndType{OAND, TINT16}: ssa.OpAnd16,
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
opAndType{OAND, TUINT16}: ssa.OpAnd16,
opAndType{OAND, TINT32}: ssa.OpAnd32,
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
opAndType{OAND, TUINT32}: ssa.OpAnd32,
opAndType{OAND, TINT64}: ssa.OpAnd64,
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
opAndType{OAND, TUINT64}: ssa.OpAnd64,
opAndType{OOR, TINT8}: ssa.OpOr8,
opAndType{OOR, TUINT8}: ssa.OpOr8,
opAndType{OOR, TINT16}: ssa.OpOr16,
opAndType{OOR, TUINT16}: ssa.OpOr16,
opAndType{OOR, TINT32}: ssa.OpOr32,
opAndType{OOR, TUINT32}: ssa.OpOr32,
opAndType{OOR, TINT64}: ssa.OpOr64,
opAndType{OOR, TUINT64}: ssa.OpOr64,
opAndType{OXOR, TINT8}: ssa.OpXor8,
opAndType{OXOR, TUINT8}: ssa.OpXor8,
opAndType{OXOR, TINT16}: ssa.OpXor16,
opAndType{OXOR, TUINT16}: ssa.OpXor16,
opAndType{OXOR, TINT32}: ssa.OpXor32,
opAndType{OXOR, TUINT32}: ssa.OpXor32,
opAndType{OXOR, TINT64}: ssa.OpXor64,
opAndType{OXOR, TUINT64}: ssa.OpXor64,
opAndType{OEQ, TBOOL}: ssa.OpEqB,
opAndType{OEQ, TINT8}: ssa.OpEq8,
opAndType{OEQ, TUINT8}: ssa.OpEq8,
opAndType{OEQ, TINT16}: ssa.OpEq16,
opAndType{OEQ, TUINT16}: ssa.OpEq16,
opAndType{OEQ, TINT32}: ssa.OpEq32,
opAndType{OEQ, TUINT32}: ssa.OpEq32,
opAndType{OEQ, TINT64}: ssa.OpEq64,
opAndType{OEQ, TUINT64}: ssa.OpEq64,
opAndType{OEQ, TINTER}: ssa.OpEqInter,
opAndType{OEQ, TSLICE}: ssa.OpEqSlice,
opAndType{OEQ, TFUNC}: ssa.OpEqPtr,
opAndType{OEQ, TMAP}: ssa.OpEqPtr,
opAndType{OEQ, TCHAN}: ssa.OpEqPtr,
opAndType{OEQ, TPTR}: ssa.OpEqPtr,
opAndType{OEQ, TUINTPTR}: ssa.OpEqPtr,
opAndType{OEQ, TUNSAFEPTR}: ssa.OpEqPtr,
opAndType{OEQ, TFLOAT64}: ssa.OpEq64F,
opAndType{OEQ, TFLOAT32}: ssa.OpEq32F,
opAndType{ONE, TBOOL}: ssa.OpNeqB,
opAndType{ONE, TINT8}: ssa.OpNeq8,
opAndType{ONE, TUINT8}: ssa.OpNeq8,
opAndType{ONE, TINT16}: ssa.OpNeq16,
opAndType{ONE, TUINT16}: ssa.OpNeq16,
opAndType{ONE, TINT32}: ssa.OpNeq32,
opAndType{ONE, TUINT32}: ssa.OpNeq32,
opAndType{ONE, TINT64}: ssa.OpNeq64,
opAndType{ONE, TUINT64}: ssa.OpNeq64,
opAndType{ONE, TINTER}: ssa.OpNeqInter,
opAndType{ONE, TSLICE}: ssa.OpNeqSlice,
opAndType{ONE, TFUNC}: ssa.OpNeqPtr,
opAndType{ONE, TMAP}: ssa.OpNeqPtr,
opAndType{ONE, TCHAN}: ssa.OpNeqPtr,
opAndType{ONE, TPTR}: ssa.OpNeqPtr,
opAndType{ONE, TUINTPTR}: ssa.OpNeqPtr,
opAndType{ONE, TUNSAFEPTR}: ssa.OpNeqPtr,
opAndType{ONE, TFLOAT64}: ssa.OpNeq64F,
opAndType{ONE, TFLOAT32}: ssa.OpNeq32F,
opAndType{OLT, TINT8}: ssa.OpLess8,
opAndType{OLT, TUINT8}: ssa.OpLess8U,
opAndType{OLT, TINT16}: ssa.OpLess16,
opAndType{OLT, TUINT16}: ssa.OpLess16U,
opAndType{OLT, TINT32}: ssa.OpLess32,
opAndType{OLT, TUINT32}: ssa.OpLess32U,
opAndType{OLT, TINT64}: ssa.OpLess64,
opAndType{OLT, TUINT64}: ssa.OpLess64U,
opAndType{OLT, TFLOAT64}: ssa.OpLess64F,
opAndType{OLT, TFLOAT32}: ssa.OpLess32F,
opAndType{OGT, TINT8}: ssa.OpGreater8,
opAndType{OGT, TUINT8}: ssa.OpGreater8U,
opAndType{OGT, TINT16}: ssa.OpGreater16,
opAndType{OGT, TUINT16}: ssa.OpGreater16U,
opAndType{OGT, TINT32}: ssa.OpGreater32,
opAndType{OGT, TUINT32}: ssa.OpGreater32U,
opAndType{OGT, TINT64}: ssa.OpGreater64,
opAndType{OGT, TUINT64}: ssa.OpGreater64U,
opAndType{OGT, TFLOAT64}: ssa.OpGreater64F,
opAndType{OGT, TFLOAT32}: ssa.OpGreater32F,
opAndType{OLE, TINT8}: ssa.OpLeq8,
opAndType{OLE, TUINT8}: ssa.OpLeq8U,
opAndType{OLE, TINT16}: ssa.OpLeq16,
opAndType{OLE, TUINT16}: ssa.OpLeq16U,
opAndType{OLE, TINT32}: ssa.OpLeq32,
opAndType{OLE, TUINT32}: ssa.OpLeq32U,
opAndType{OLE, TINT64}: ssa.OpLeq64,
opAndType{OLE, TUINT64}: ssa.OpLeq64U,
opAndType{OLE, TFLOAT64}: ssa.OpLeq64F,
opAndType{OLE, TFLOAT32}: ssa.OpLeq32F,
opAndType{OGE, TINT8}: ssa.OpGeq8,
opAndType{OGE, TUINT8}: ssa.OpGeq8U,
opAndType{OGE, TINT16}: ssa.OpGeq16,
opAndType{OGE, TUINT16}: ssa.OpGeq16U,
opAndType{OGE, TINT32}: ssa.OpGeq32,
opAndType{OGE, TUINT32}: ssa.OpGeq32U,
opAndType{OGE, TINT64}: ssa.OpGeq64,
opAndType{OGE, TUINT64}: ssa.OpGeq64U,
opAndType{OGE, TFLOAT64}: ssa.OpGeq64F,
opAndType{OGE, TFLOAT32}: ssa.OpGeq32F,
}
func (s *state) concreteEtype(t *types.Type) types.EType {
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
e := t.Etype
switch e {
default:
return e
case TINT:
if s.config.PtrSize == 8 {
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
return TINT64
}
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
return TINT32
case TUINT:
if s.config.PtrSize == 8 {
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
return TUINT64
}
return TUINT32
case TUINTPTR:
if s.config.PtrSize == 8 {
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
return TUINT64
}
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
return TUINT32
}
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
}
func (s *state) ssaOp(op Op, t *types.Type) ssa.Op {
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
etype := s.concreteEtype(t)
x, ok := opToSSA[opAndType{op, etype}]
if !ok {
s.Fatalf("unhandled binary op %v %s", op, etype)
}
return x
}
func floatForComplex(t *types.Type) *types.Type {
if t.Size() == 8 {
return types.Types[TFLOAT32]
} else {
return types.Types[TFLOAT64]
}
}
type opAndTwoTypes struct {
op Op
etype1 types.EType
etype2 types.EType
}
type twoTypes struct {
etype1 types.EType
etype2 types.EType
}
type twoOpsAndType struct {
op1 ssa.Op
op2 ssa.Op
intermediateType types.EType
}
var fpConvOpToSSA = map[twoTypes]twoOpsAndType{
twoTypes{TINT8, TFLOAT32}: twoOpsAndType{ssa.OpSignExt8to32, ssa.OpCvt32to32F, TINT32},
twoTypes{TINT16, TFLOAT32}: twoOpsAndType{ssa.OpSignExt16to32, ssa.OpCvt32to32F, TINT32},
twoTypes{TINT32, TFLOAT32}: twoOpsAndType{ssa.OpCopy, ssa.OpCvt32to32F, TINT32},
twoTypes{TINT64, TFLOAT32}: twoOpsAndType{ssa.OpCopy, ssa.OpCvt64to32F, TINT64},
twoTypes{TINT8, TFLOAT64}: twoOpsAndType{ssa.OpSignExt8to32, ssa.OpCvt32to64F, TINT32},
twoTypes{TINT16, TFLOAT64}: twoOpsAndType{ssa.OpSignExt16to32, ssa.OpCvt32to64F, TINT32},
twoTypes{TINT32, TFLOAT64}: twoOpsAndType{ssa.OpCopy, ssa.OpCvt32to64F, TINT32},
twoTypes{TINT64, TFLOAT64}: twoOpsAndType{ssa.OpCopy, ssa.OpCvt64to64F, TINT64},
twoTypes{TFLOAT32, TINT8}: twoOpsAndType{ssa.OpCvt32Fto32, ssa.OpTrunc32to8, TINT32},
twoTypes{TFLOAT32, TINT16}: twoOpsAndType{ssa.OpCvt32Fto32, ssa.OpTrunc32to16, TINT32},
twoTypes{TFLOAT32, TINT32}: twoOpsAndType{ssa.OpCvt32Fto32, ssa.OpCopy, TINT32},
twoTypes{TFLOAT32, TINT64}: twoOpsAndType{ssa.OpCvt32Fto64, ssa.OpCopy, TINT64},
twoTypes{TFLOAT64, TINT8}: twoOpsAndType{ssa.OpCvt64Fto32, ssa.OpTrunc32to8, TINT32},
twoTypes{TFLOAT64, TINT16}: twoOpsAndType{ssa.OpCvt64Fto32, ssa.OpTrunc32to16, TINT32},
twoTypes{TFLOAT64, TINT32}: twoOpsAndType{ssa.OpCvt64Fto32, ssa.OpCopy, TINT32},
twoTypes{TFLOAT64, TINT64}: twoOpsAndType{ssa.OpCvt64Fto64, ssa.OpCopy, TINT64},
// unsigned
twoTypes{TUINT8, TFLOAT32}: twoOpsAndType{ssa.OpZeroExt8to32, ssa.OpCvt32to32F, TINT32},
twoTypes{TUINT16, TFLOAT32}: twoOpsAndType{ssa.OpZeroExt16to32, ssa.OpCvt32to32F, TINT32},
twoTypes{TUINT32, TFLOAT32}: twoOpsAndType{ssa.OpZeroExt32to64, ssa.OpCvt64to32F, TINT64}, // go wide to dodge unsigned
twoTypes{TUINT64, TFLOAT32}: twoOpsAndType{ssa.OpCopy, ssa.OpInvalid, TUINT64}, // Cvt64Uto32F, branchy code expansion instead
twoTypes{TUINT8, TFLOAT64}: twoOpsAndType{ssa.OpZeroExt8to32, ssa.OpCvt32to64F, TINT32},
twoTypes{TUINT16, TFLOAT64}: twoOpsAndType{ssa.OpZeroExt16to32, ssa.OpCvt32to64F, TINT32},
twoTypes{TUINT32, TFLOAT64}: twoOpsAndType{ssa.OpZeroExt32to64, ssa.OpCvt64to64F, TINT64}, // go wide to dodge unsigned
twoTypes{TUINT64, TFLOAT64}: twoOpsAndType{ssa.OpCopy, ssa.OpInvalid, TUINT64}, // Cvt64Uto64F, branchy code expansion instead
twoTypes{TFLOAT32, TUINT8}: twoOpsAndType{ssa.OpCvt32Fto32, ssa.OpTrunc32to8, TINT32},
twoTypes{TFLOAT32, TUINT16}: twoOpsAndType{ssa.OpCvt32Fto32, ssa.OpTrunc32to16, TINT32},
twoTypes{TFLOAT32, TUINT32}: twoOpsAndType{ssa.OpCvt32Fto64, ssa.OpTrunc64to32, TINT64}, // go wide to dodge unsigned
twoTypes{TFLOAT32, TUINT64}: twoOpsAndType{ssa.OpInvalid, ssa.OpCopy, TUINT64}, // Cvt32Fto64U, branchy code expansion instead
twoTypes{TFLOAT64, TUINT8}: twoOpsAndType{ssa.OpCvt64Fto32, ssa.OpTrunc32to8, TINT32},
twoTypes{TFLOAT64, TUINT16}: twoOpsAndType{ssa.OpCvt64Fto32, ssa.OpTrunc32to16, TINT32},
twoTypes{TFLOAT64, TUINT32}: twoOpsAndType{ssa.OpCvt64Fto64, ssa.OpTrunc64to32, TINT64}, // go wide to dodge unsigned
twoTypes{TFLOAT64, TUINT64}: twoOpsAndType{ssa.OpInvalid, ssa.OpCopy, TUINT64}, // Cvt64Fto64U, branchy code expansion instead
// float
twoTypes{TFLOAT64, TFLOAT32}: twoOpsAndType{ssa.OpCvt64Fto32F, ssa.OpCopy, TFLOAT32},
twoTypes{TFLOAT64, TFLOAT64}: twoOpsAndType{ssa.OpRound64F, ssa.OpCopy, TFLOAT64},
twoTypes{TFLOAT32, TFLOAT32}: twoOpsAndType{ssa.OpRound32F, ssa.OpCopy, TFLOAT32},
twoTypes{TFLOAT32, TFLOAT64}: twoOpsAndType{ssa.OpCvt32Fto64F, ssa.OpCopy, TFLOAT64},
}
// this map is used only for 32-bit arch, and only includes the difference
// on 32-bit arch, don't use int64<->float conversion for uint32
var fpConvOpToSSA32 = map[twoTypes]twoOpsAndType{
twoTypes{TUINT32, TFLOAT32}: twoOpsAndType{ssa.OpCopy, ssa.OpCvt32Uto32F, TUINT32},
twoTypes{TUINT32, TFLOAT64}: twoOpsAndType{ssa.OpCopy, ssa.OpCvt32Uto64F, TUINT32},
twoTypes{TFLOAT32, TUINT32}: twoOpsAndType{ssa.OpCvt32Fto32U, ssa.OpCopy, TUINT32},
twoTypes{TFLOAT64, TUINT32}: twoOpsAndType{ssa.OpCvt64Fto32U, ssa.OpCopy, TUINT32},
}
// uint64<->float conversions, only on machines that have intructions for that
var uint64fpConvOpToSSA = map[twoTypes]twoOpsAndType{
twoTypes{TUINT64, TFLOAT32}: twoOpsAndType{ssa.OpCopy, ssa.OpCvt64Uto32F, TUINT64},
twoTypes{TUINT64, TFLOAT64}: twoOpsAndType{ssa.OpCopy, ssa.OpCvt64Uto64F, TUINT64},
twoTypes{TFLOAT32, TUINT64}: twoOpsAndType{ssa.OpCvt32Fto64U, ssa.OpCopy, TUINT64},
twoTypes{TFLOAT64, TUINT64}: twoOpsAndType{ssa.OpCvt64Fto64U, ssa.OpCopy, TUINT64},
}
var shiftOpToSSA = map[opAndTwoTypes]ssa.Op{
opAndTwoTypes{OLSH, TINT8, TUINT8}: ssa.OpLsh8x8,
opAndTwoTypes{OLSH, TUINT8, TUINT8}: ssa.OpLsh8x8,
opAndTwoTypes{OLSH, TINT8, TUINT16}: ssa.OpLsh8x16,
opAndTwoTypes{OLSH, TUINT8, TUINT16}: ssa.OpLsh8x16,
opAndTwoTypes{OLSH, TINT8, TUINT32}: ssa.OpLsh8x32,
opAndTwoTypes{OLSH, TUINT8, TUINT32}: ssa.OpLsh8x32,
opAndTwoTypes{OLSH, TINT8, TUINT64}: ssa.OpLsh8x64,
opAndTwoTypes{OLSH, TUINT8, TUINT64}: ssa.OpLsh8x64,
opAndTwoTypes{OLSH, TINT16, TUINT8}: ssa.OpLsh16x8,
opAndTwoTypes{OLSH, TUINT16, TUINT8}: ssa.OpLsh16x8,
opAndTwoTypes{OLSH, TINT16, TUINT16}: ssa.OpLsh16x16,
opAndTwoTypes{OLSH, TUINT16, TUINT16}: ssa.OpLsh16x16,
opAndTwoTypes{OLSH, TINT16, TUINT32}: ssa.OpLsh16x32,
opAndTwoTypes{OLSH, TUINT16, TUINT32}: ssa.OpLsh16x32,
opAndTwoTypes{OLSH, TINT16, TUINT64}: ssa.OpLsh16x64,
opAndTwoTypes{OLSH, TUINT16, TUINT64}: ssa.OpLsh16x64,
opAndTwoTypes{OLSH, TINT32, TUINT8}: ssa.OpLsh32x8,
opAndTwoTypes{OLSH, TUINT32, TUINT8}: ssa.OpLsh32x8,
opAndTwoTypes{OLSH, TINT32, TUINT16}: ssa.OpLsh32x16,
opAndTwoTypes{OLSH, TUINT32, TUINT16}: ssa.OpLsh32x16,
opAndTwoTypes{OLSH, TINT32, TUINT32}: ssa.OpLsh32x32,
opAndTwoTypes{OLSH, TUINT32, TUINT32}: ssa.OpLsh32x32,
opAndTwoTypes{OLSH, TINT32, TUINT64}: ssa.OpLsh32x64,
opAndTwoTypes{OLSH, TUINT32, TUINT64}: ssa.OpLsh32x64,
opAndTwoTypes{OLSH, TINT64, TUINT8}: ssa.OpLsh64x8,
opAndTwoTypes{OLSH, TUINT64, TUINT8}: ssa.OpLsh64x8,
opAndTwoTypes{OLSH, TINT64, TUINT16}: ssa.OpLsh64x16,
opAndTwoTypes{OLSH, TUINT64, TUINT16}: ssa.OpLsh64x16,
opAndTwoTypes{OLSH, TINT64, TUINT32}: ssa.OpLsh64x32,
opAndTwoTypes{OLSH, TUINT64, TUINT32}: ssa.OpLsh64x32,
opAndTwoTypes{OLSH, TINT64, TUINT64}: ssa.OpLsh64x64,
opAndTwoTypes{OLSH, TUINT64, TUINT64}: ssa.OpLsh64x64,
opAndTwoTypes{ORSH, TINT8, TUINT8}: ssa.OpRsh8x8,
opAndTwoTypes{ORSH, TUINT8, TUINT8}: ssa.OpRsh8Ux8,
opAndTwoTypes{ORSH, TINT8, TUINT16}: ssa.OpRsh8x16,
opAndTwoTypes{ORSH, TUINT8, TUINT16}: ssa.OpRsh8Ux16,
opAndTwoTypes{ORSH, TINT8, TUINT32}: ssa.OpRsh8x32,
opAndTwoTypes{ORSH, TUINT8, TUINT32}: ssa.OpRsh8Ux32,
opAndTwoTypes{ORSH, TINT8, TUINT64}: ssa.OpRsh8x64,
opAndTwoTypes{ORSH, TUINT8, TUINT64}: ssa.OpRsh8Ux64,
opAndTwoTypes{ORSH, TINT16, TUINT8}: ssa.OpRsh16x8,
opAndTwoTypes{ORSH, TUINT16, TUINT8}: ssa.OpRsh16Ux8,
opAndTwoTypes{ORSH, TINT16, TUINT16}: ssa.OpRsh16x16,
opAndTwoTypes{ORSH, TUINT16, TUINT16}: ssa.OpRsh16Ux16,
opAndTwoTypes{ORSH, TINT16, TUINT32}: ssa.OpRsh16x32,
opAndTwoTypes{ORSH, TUINT16, TUINT32}: ssa.OpRsh16Ux32,
opAndTwoTypes{ORSH, TINT16, TUINT64}: ssa.OpRsh16x64,
opAndTwoTypes{ORSH, TUINT16, TUINT64}: ssa.OpRsh16Ux64,
opAndTwoTypes{ORSH, TINT32, TUINT8}: ssa.OpRsh32x8,
opAndTwoTypes{ORSH, TUINT32, TUINT8}: ssa.OpRsh32Ux8,
opAndTwoTypes{ORSH, TINT32, TUINT16}: ssa.OpRsh32x16,
opAndTwoTypes{ORSH, TUINT32, TUINT16}: ssa.OpRsh32Ux16,
opAndTwoTypes{ORSH, TINT32, TUINT32}: ssa.OpRsh32x32,
opAndTwoTypes{ORSH, TUINT32, TUINT32}: ssa.OpRsh32Ux32,
opAndTwoTypes{ORSH, TINT32, TUINT64}: ssa.OpRsh32x64,
opAndTwoTypes{ORSH, TUINT32, TUINT64}: ssa.OpRsh32Ux64,
opAndTwoTypes{ORSH, TINT64, TUINT8}: ssa.OpRsh64x8,
opAndTwoTypes{ORSH, TUINT64, TUINT8}: ssa.OpRsh64Ux8,
opAndTwoTypes{ORSH, TINT64, TUINT16}: ssa.OpRsh64x16,
opAndTwoTypes{ORSH, TUINT64, TUINT16}: ssa.OpRsh64Ux16,
opAndTwoTypes{ORSH, TINT64, TUINT32}: ssa.OpRsh64x32,
opAndTwoTypes{ORSH, TUINT64, TUINT32}: ssa.OpRsh64Ux32,
opAndTwoTypes{ORSH, TINT64, TUINT64}: ssa.OpRsh64x64,
opAndTwoTypes{ORSH, TUINT64, TUINT64}: ssa.OpRsh64Ux64,
}
func (s *state) ssaShiftOp(op Op, t *types.Type, u *types.Type) ssa.Op {
etype1 := s.concreteEtype(t)
etype2 := s.concreteEtype(u)
x, ok := shiftOpToSSA[opAndTwoTypes{op, etype1, etype2}]
if !ok {
s.Fatalf("unhandled shift op %v etype=%s/%s", op, etype1, etype2)
}
return x
}
// expr converts the expression n to ssa, adds it to s and returns the ssa result.
func (s *state) expr(n *Node) *ssa.Value {
if !(n.Op == ONAME || n.Op == OLITERAL && n.Sym != nil) {
// ONAMEs and named OLITERALs have the line number
// of the decl, not the use. See issue 14742.
s.pushLine(n.Pos)
defer s.popLine()
}
s.stmtList(n.Ninit)
switch n.Op {
cmd/compile: bulk rename This change does a bulk rename of several identifiers in the compiler. See #27167 and https://docs.google.com/document/d/19_ExiylD9MRfeAjKIfEsMU1_RGhuxB9sA0b5Zv7byVI/ for context and for discussion of these particular renames. Commands run to generate this change: gorename -from '"cmd/compile/internal/gc".OPROC' -to OGO gorename -from '"cmd/compile/internal/gc".OCOM' -to OBITNOT gorename -from '"cmd/compile/internal/gc".OMINUS' -to ONEG gorename -from '"cmd/compile/internal/gc".OIND' -to ODEREF gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTR' -to OBYTES2STR gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTRTMP' -to OBYTES2STRTMP gorename -from '"cmd/compile/internal/gc".OARRAYRUNESTR' -to ORUNES2STR gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTE' -to OSTR2BYTES gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTETMP' -to OSTR2BYTESTMP gorename -from '"cmd/compile/internal/gc".OSTRARRAYRUNE' -to OSTR2RUNES gorename -from '"cmd/compile/internal/gc".Etop' -to ctxStmt gorename -from '"cmd/compile/internal/gc".Erv' -to ctxExpr gorename -from '"cmd/compile/internal/gc".Ecall' -to ctxCallee gorename -from '"cmd/compile/internal/gc".Efnstruct' -to ctxMultiOK gorename -from '"cmd/compile/internal/gc".Easgn' -to ctxAssign gorename -from '"cmd/compile/internal/gc".Ecomplit' -to ctxCompLit Not altered: parameters and local variables (mostly in typecheck.go) named top, which should probably now be called ctx (and which should probably have a named type). Also not altered: Field called Top in gc.Func. gorename -from '"cmd/compile/internal/gc".Node.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/gc".Node.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/gc".nodeIsddd' -to nodeIsDDD gorename -from '"cmd/compile/internal/types".Field.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/types".Field.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/types".fieldIsddd' -to fieldIsDDD Not altered: function gc.hasddd, params and local variables called isddd Also not altered: fmt.go prints nodes using "isddd(%v)". cd cmd/compile/internal/gc; go generate I then manually found impacted comments using exact string match and fixed them up by hand. The comment changes were trivial. Passes toolstash-check. Fixes #27167. If this experiment is deemed a success, we will open a new tracking issue for renames to do at the end of the 1.13 cycles. Change-Id: I2dc541533d2ab0d06cb3d31d65df205ecfb151e8 Reviewed-on: https://go-review.googlesource.com/c/150140 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-11-18 08:34:38 -08:00
case OBYTES2STRTMP:
slice := s.expr(n.Left)
ptr := s.newValue1(ssa.OpSlicePtr, s.f.Config.Types.BytePtr, slice)
len := s.newValue1(ssa.OpSliceLen, types.Types[TINT], slice)
return s.newValue2(ssa.OpStringMake, n.Type, ptr, len)
cmd/compile: bulk rename This change does a bulk rename of several identifiers in the compiler. See #27167 and https://docs.google.com/document/d/19_ExiylD9MRfeAjKIfEsMU1_RGhuxB9sA0b5Zv7byVI/ for context and for discussion of these particular renames. Commands run to generate this change: gorename -from '"cmd/compile/internal/gc".OPROC' -to OGO gorename -from '"cmd/compile/internal/gc".OCOM' -to OBITNOT gorename -from '"cmd/compile/internal/gc".OMINUS' -to ONEG gorename -from '"cmd/compile/internal/gc".OIND' -to ODEREF gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTR' -to OBYTES2STR gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTRTMP' -to OBYTES2STRTMP gorename -from '"cmd/compile/internal/gc".OARRAYRUNESTR' -to ORUNES2STR gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTE' -to OSTR2BYTES gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTETMP' -to OSTR2BYTESTMP gorename -from '"cmd/compile/internal/gc".OSTRARRAYRUNE' -to OSTR2RUNES gorename -from '"cmd/compile/internal/gc".Etop' -to ctxStmt gorename -from '"cmd/compile/internal/gc".Erv' -to ctxExpr gorename -from '"cmd/compile/internal/gc".Ecall' -to ctxCallee gorename -from '"cmd/compile/internal/gc".Efnstruct' -to ctxMultiOK gorename -from '"cmd/compile/internal/gc".Easgn' -to ctxAssign gorename -from '"cmd/compile/internal/gc".Ecomplit' -to ctxCompLit Not altered: parameters and local variables (mostly in typecheck.go) named top, which should probably now be called ctx (and which should probably have a named type). Also not altered: Field called Top in gc.Func. gorename -from '"cmd/compile/internal/gc".Node.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/gc".Node.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/gc".nodeIsddd' -to nodeIsDDD gorename -from '"cmd/compile/internal/types".Field.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/types".Field.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/types".fieldIsddd' -to fieldIsDDD Not altered: function gc.hasddd, params and local variables called isddd Also not altered: fmt.go prints nodes using "isddd(%v)". cd cmd/compile/internal/gc; go generate I then manually found impacted comments using exact string match and fixed them up by hand. The comment changes were trivial. Passes toolstash-check. Fixes #27167. If this experiment is deemed a success, we will open a new tracking issue for renames to do at the end of the 1.13 cycles. Change-Id: I2dc541533d2ab0d06cb3d31d65df205ecfb151e8 Reviewed-on: https://go-review.googlesource.com/c/150140 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-11-18 08:34:38 -08:00
case OSTR2BYTESTMP:
str := s.expr(n.Left)
ptr := s.newValue1(ssa.OpStringPtr, s.f.Config.Types.BytePtr, str)
len := s.newValue1(ssa.OpStringLen, types.Types[TINT], str)
return s.newValue3(ssa.OpSliceMake, n.Type, ptr, len, len)
case OCFUNC:
aux := n.Left.Sym.Linksym()
return s.entryNewValue1A(ssa.OpAddr, n.Type, aux, s.sb)
case ONAME:
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if n.Class() == PFUNC {
// "value" of a function is the address of the function's closure
sym := funcsym(n.Sym).Linksym()
return s.entryNewValue1A(ssa.OpAddr, types.NewPtr(n.Type), sym, s.sb)
}
if s.canSSA(n) {
return s.variable(n, n.Type)
}
addr := s.addr(n, false)
return s.load(n.Type, addr)
case OCLOSUREVAR:
addr := s.addr(n, false)
return s.load(n.Type, addr)
case OLITERAL:
switch u := n.Val().U.(type) {
case *Mpint:
i := u.Int64()
switch n.Type.Size() {
case 1:
return s.constInt8(n.Type, int8(i))
case 2:
return s.constInt16(n.Type, int16(i))
case 4:
return s.constInt32(n.Type, int32(i))
case 8:
return s.constInt64(n.Type, i)
default:
s.Fatalf("bad integer size %d", n.Type.Size())
return nil
}
case string:
if u == "" {
return s.constEmptyString(n.Type)
}
return s.entryNewValue0A(ssa.OpConstString, n.Type, u)
case bool:
return s.constBool(u)
case *NilVal:
t := n.Type
switch {
case t.IsSlice():
return s.constSlice(t)
case t.IsInterface():
return s.constInterface(t)
default:
return s.constNil(t)
}
case *Mpflt:
switch n.Type.Size() {
case 4:
return s.constFloat32(n.Type, u.Float32())
case 8:
return s.constFloat64(n.Type, u.Float64())
default:
s.Fatalf("bad float size %d", n.Type.Size())
return nil
}
case *Mpcplx:
r := &u.Real
i := &u.Imag
switch n.Type.Size() {
case 8:
pt := types.Types[TFLOAT32]
return s.newValue2(ssa.OpComplexMake, n.Type,
s.constFloat32(pt, r.Float32()),
s.constFloat32(pt, i.Float32()))
case 16:
pt := types.Types[TFLOAT64]
return s.newValue2(ssa.OpComplexMake, n.Type,
s.constFloat64(pt, r.Float64()),
s.constFloat64(pt, i.Float64()))
default:
s.Fatalf("bad float size %d", n.Type.Size())
return nil
}
default:
s.Fatalf("unhandled OLITERAL %v", n.Val().Ctype())
return nil
}
case OCONVNOP:
to := n.Type
from := n.Left.Type
// Assume everything will work out, so set up our return value.
// Anything interesting that happens from here is a fatal.
x := s.expr(n.Left)
// Special case for not confusing GC and liveness.
// We don't want pointers accidentally classified
// as not-pointers or vice-versa because of copy
// elision.
if to.IsPtrShaped() != from.IsPtrShaped() {
return s.newValue2(ssa.OpConvert, to, x, s.mem())
}
v := s.newValue1(ssa.OpCopy, to, x) // ensure that v has the right type
// CONVNOP closure
if to.Etype == TFUNC && from.IsPtrShaped() {
return v
}
// named <--> unnamed type or typed <--> untyped const
if from.Etype == to.Etype {
return v
}
// unsafe.Pointer <--> *T
if to.Etype == TUNSAFEPTR && from.IsPtrShaped() || from.Etype == TUNSAFEPTR && to.IsPtrShaped() {
return v
}
// map <--> *hmap
if to.Etype == TMAP && from.IsPtr() &&
to.MapType().Hmap == from.Elem() {
return v
}
dowidth(from)
dowidth(to)
if from.Width != to.Width {
s.Fatalf("CONVNOP width mismatch %v (%d) -> %v (%d)\n", from, from.Width, to, to.Width)
return nil
}
if etypesign(from.Etype) != etypesign(to.Etype) {
s.Fatalf("CONVNOP sign mismatch %v (%s) -> %v (%s)\n", from, from.Etype, to, to.Etype)
return nil
}
if instrumenting {
// These appear to be fine, but they fail the
// integer constraint below, so okay them here.
// Sample non-integer conversion: map[string]string -> *uint8
return v
}
if etypesign(from.Etype) == 0 {
s.Fatalf("CONVNOP unrecognized non-integer %v -> %v\n", from, to)
return nil
}
// integer, same width, same sign
return v
case OCONV:
x := s.expr(n.Left)
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
ft := n.Left.Type // from type
tt := n.Type // to type
if ft.IsBoolean() && tt.IsKind(TUINT8) {
// Bool -> uint8 is generated internally when indexing into runtime.staticbyte.
return s.newValue1(ssa.OpCopy, n.Type, x)
}
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
if ft.IsInteger() && tt.IsInteger() {
var op ssa.Op
if tt.Size() == ft.Size() {
op = ssa.OpCopy
} else if tt.Size() < ft.Size() {
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
// truncation
switch 10*ft.Size() + tt.Size() {
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
case 21:
op = ssa.OpTrunc16to8
case 41:
op = ssa.OpTrunc32to8
case 42:
op = ssa.OpTrunc32to16
case 81:
op = ssa.OpTrunc64to8
case 82:
op = ssa.OpTrunc64to16
case 84:
op = ssa.OpTrunc64to32
default:
s.Fatalf("weird integer truncation %v -> %v", ft, tt)
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
}
} else if ft.IsSigned() {
// sign extension
switch 10*ft.Size() + tt.Size() {
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
case 12:
op = ssa.OpSignExt8to16
case 14:
op = ssa.OpSignExt8to32
case 18:
op = ssa.OpSignExt8to64
case 24:
op = ssa.OpSignExt16to32
case 28:
op = ssa.OpSignExt16to64
case 48:
op = ssa.OpSignExt32to64
default:
s.Fatalf("bad integer sign extension %v -> %v", ft, tt)
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
}
} else {
// zero extension
switch 10*ft.Size() + tt.Size() {
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
case 12:
op = ssa.OpZeroExt8to16
case 14:
op = ssa.OpZeroExt8to32
case 18:
op = ssa.OpZeroExt8to64
case 24:
op = ssa.OpZeroExt16to32
case 28:
op = ssa.OpZeroExt16to64
case 48:
op = ssa.OpZeroExt32to64
default:
s.Fatalf("weird integer sign extension %v -> %v", ft, tt)
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
}
}
return s.newValue1(op, n.Type, x)
}
if ft.IsFloat() || tt.IsFloat() {
conv, ok := fpConvOpToSSA[twoTypes{s.concreteEtype(ft), s.concreteEtype(tt)}]
if s.config.RegSize == 4 && thearch.LinkArch.Family != sys.MIPS && !s.softFloat {
if conv1, ok1 := fpConvOpToSSA32[twoTypes{s.concreteEtype(ft), s.concreteEtype(tt)}]; ok1 {
conv = conv1
}
}
if thearch.LinkArch.Family == sys.ARM64 || thearch.LinkArch.Family == sys.Wasm || s.softFloat {
if conv1, ok1 := uint64fpConvOpToSSA[twoTypes{s.concreteEtype(ft), s.concreteEtype(tt)}]; ok1 {
conv = conv1
}
}
if thearch.LinkArch.Family == sys.MIPS && !s.softFloat {
if ft.Size() == 4 && ft.IsInteger() && !ft.IsSigned() {
// tt is float32 or float64, and ft is also unsigned
if tt.Size() == 4 {
return s.uint32Tofloat32(n, x, ft, tt)
}
if tt.Size() == 8 {
return s.uint32Tofloat64(n, x, ft, tt)
}
} else if tt.Size() == 4 && tt.IsInteger() && !tt.IsSigned() {
// ft is float32 or float64, and tt is unsigned integer
if ft.Size() == 4 {
return s.float32ToUint32(n, x, ft, tt)
}
if ft.Size() == 8 {
return s.float64ToUint32(n, x, ft, tt)
}
}
}
if !ok {
s.Fatalf("weird float conversion %v -> %v", ft, tt)
}
op1, op2, it := conv.op1, conv.op2, conv.intermediateType
if op1 != ssa.OpInvalid && op2 != ssa.OpInvalid {
// normal case, not tripping over unsigned 64
if op1 == ssa.OpCopy {
if op2 == ssa.OpCopy {
return x
}
return s.newValueOrSfCall1(op2, n.Type, x)
}
if op2 == ssa.OpCopy {
return s.newValueOrSfCall1(op1, n.Type, x)
}
return s.newValueOrSfCall1(op2, n.Type, s.newValueOrSfCall1(op1, types.Types[it], x))
}
// Tricky 64-bit unsigned cases.
if ft.IsInteger() {
// tt is float32 or float64, and ft is also unsigned
if tt.Size() == 4 {
return s.uint64Tofloat32(n, x, ft, tt)
}
if tt.Size() == 8 {
return s.uint64Tofloat64(n, x, ft, tt)
}
s.Fatalf("weird unsigned integer to float conversion %v -> %v", ft, tt)
}
// ft is float32 or float64, and tt is unsigned integer
if ft.Size() == 4 {
return s.float32ToUint64(n, x, ft, tt)
}
if ft.Size() == 8 {
return s.float64ToUint64(n, x, ft, tt)
}
s.Fatalf("weird float to unsigned integer conversion %v -> %v", ft, tt)
return nil
}
if ft.IsComplex() && tt.IsComplex() {
var op ssa.Op
if ft.Size() == tt.Size() {
switch ft.Size() {
case 8:
op = ssa.OpRound32F
case 16:
op = ssa.OpRound64F
default:
s.Fatalf("weird complex conversion %v -> %v", ft, tt)
}
} else if ft.Size() == 8 && tt.Size() == 16 {
op = ssa.OpCvt32Fto64F
} else if ft.Size() == 16 && tt.Size() == 8 {
op = ssa.OpCvt64Fto32F
} else {
s.Fatalf("weird complex conversion %v -> %v", ft, tt)
}
ftp := floatForComplex(ft)
ttp := floatForComplex(tt)
return s.newValue2(ssa.OpComplexMake, tt,
s.newValueOrSfCall1(op, ttp, s.newValue1(ssa.OpComplexReal, ftp, x)),
s.newValueOrSfCall1(op, ttp, s.newValue1(ssa.OpComplexImag, ftp, x)))
}
s.Fatalf("unhandled OCONV %s -> %s", n.Left.Type.Etype, n.Type.Etype)
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
return nil
case ODOTTYPE:
res, _ := s.dottype(n, false)
return res
// binary ops
case OLT, OEQ, ONE, OLE, OGE, OGT:
a := s.expr(n.Left)
b := s.expr(n.Right)
if n.Left.Type.IsComplex() {
pt := floatForComplex(n.Left.Type)
op := s.ssaOp(OEQ, pt)
r := s.newValueOrSfCall2(op, types.Types[TBOOL], s.newValue1(ssa.OpComplexReal, pt, a), s.newValue1(ssa.OpComplexReal, pt, b))
i := s.newValueOrSfCall2(op, types.Types[TBOOL], s.newValue1(ssa.OpComplexImag, pt, a), s.newValue1(ssa.OpComplexImag, pt, b))
c := s.newValue2(ssa.OpAndB, types.Types[TBOOL], r, i)
switch n.Op {
case OEQ:
return c
case ONE:
return s.newValue1(ssa.OpNot, types.Types[TBOOL], c)
default:
s.Fatalf("ordered complex compare %v", n.Op)
}
}
if n.Left.Type.IsFloat() {
return s.newValueOrSfCall2(s.ssaOp(n.Op, n.Left.Type), types.Types[TBOOL], a, b)
}
return s.newValue2(s.ssaOp(n.Op, n.Left.Type), types.Types[TBOOL], a, b)
case OMUL:
a := s.expr(n.Left)
b := s.expr(n.Right)
if n.Type.IsComplex() {
mulop := ssa.OpMul64F
addop := ssa.OpAdd64F
subop := ssa.OpSub64F
pt := floatForComplex(n.Type) // Could be Float32 or Float64
wt := types.Types[TFLOAT64] // Compute in Float64 to minimize cancelation error
areal := s.newValue1(ssa.OpComplexReal, pt, a)
breal := s.newValue1(ssa.OpComplexReal, pt, b)
aimag := s.newValue1(ssa.OpComplexImag, pt, a)
bimag := s.newValue1(ssa.OpComplexImag, pt, b)
if pt != wt { // Widen for calculation
areal = s.newValueOrSfCall1(ssa.OpCvt32Fto64F, wt, areal)
breal = s.newValueOrSfCall1(ssa.OpCvt32Fto64F, wt, breal)
aimag = s.newValueOrSfCall1(ssa.OpCvt32Fto64F, wt, aimag)
bimag = s.newValueOrSfCall1(ssa.OpCvt32Fto64F, wt, bimag)
}
xreal := s.newValueOrSfCall2(subop, wt, s.newValueOrSfCall2(mulop, wt, areal, breal), s.newValueOrSfCall2(mulop, wt, aimag, bimag))
ximag := s.newValueOrSfCall2(addop, wt, s.newValueOrSfCall2(mulop, wt, areal, bimag), s.newValueOrSfCall2(mulop, wt, aimag, breal))
if pt != wt { // Narrow to store back
xreal = s.newValueOrSfCall1(ssa.OpCvt64Fto32F, pt, xreal)
ximag = s.newValueOrSfCall1(ssa.OpCvt64Fto32F, pt, ximag)
}
return s.newValue2(ssa.OpComplexMake, n.Type, xreal, ximag)
}
if n.Type.IsFloat() {
return s.newValueOrSfCall2(s.ssaOp(n.Op, n.Type), a.Type, a, b)
}
return s.newValue2(s.ssaOp(n.Op, n.Type), a.Type, a, b)
case ODIV:
a := s.expr(n.Left)
b := s.expr(n.Right)
if n.Type.IsComplex() {
// TODO this is not executed because the front-end substitutes a runtime call.
// That probably ought to change; with modest optimization the widen/narrow
// conversions could all be elided in larger expression trees.
mulop := ssa.OpMul64F
addop := ssa.OpAdd64F
subop := ssa.OpSub64F
divop := ssa.OpDiv64F
pt := floatForComplex(n.Type) // Could be Float32 or Float64
wt := types.Types[TFLOAT64] // Compute in Float64 to minimize cancelation error
areal := s.newValue1(ssa.OpComplexReal, pt, a)
breal := s.newValue1(ssa.OpComplexReal, pt, b)
aimag := s.newValue1(ssa.OpComplexImag, pt, a)
bimag := s.newValue1(ssa.OpComplexImag, pt, b)
if pt != wt { // Widen for calculation
areal = s.newValueOrSfCall1(ssa.OpCvt32Fto64F, wt, areal)
breal = s.newValueOrSfCall1(ssa.OpCvt32Fto64F, wt, breal)
aimag = s.newValueOrSfCall1(ssa.OpCvt32Fto64F, wt, aimag)
bimag = s.newValueOrSfCall1(ssa.OpCvt32Fto64F, wt, bimag)
}
denom := s.newValueOrSfCall2(addop, wt, s.newValueOrSfCall2(mulop, wt, breal, breal), s.newValueOrSfCall2(mulop, wt, bimag, bimag))
xreal := s.newValueOrSfCall2(addop, wt, s.newValueOrSfCall2(mulop, wt, areal, breal), s.newValueOrSfCall2(mulop, wt, aimag, bimag))
ximag := s.newValueOrSfCall2(subop, wt, s.newValueOrSfCall2(mulop, wt, aimag, breal), s.newValueOrSfCall2(mulop, wt, areal, bimag))
// TODO not sure if this is best done in wide precision or narrow
// Double-rounding might be an issue.
// Note that the pre-SSA implementation does the entire calculation
// in wide format, so wide is compatible.
xreal = s.newValueOrSfCall2(divop, wt, xreal, denom)
ximag = s.newValueOrSfCall2(divop, wt, ximag, denom)
if pt != wt { // Narrow to store back
xreal = s.newValueOrSfCall1(ssa.OpCvt64Fto32F, pt, xreal)
ximag = s.newValueOrSfCall1(ssa.OpCvt64Fto32F, pt, ximag)
}
return s.newValue2(ssa.OpComplexMake, n.Type, xreal, ximag)
}
if n.Type.IsFloat() {
return s.newValueOrSfCall2(s.ssaOp(n.Op, n.Type), a.Type, a, b)
}
return s.intDivide(n, a, b)
case OMOD:
a := s.expr(n.Left)
b := s.expr(n.Right)
return s.intDivide(n, a, b)
case OADD, OSUB:
a := s.expr(n.Left)
b := s.expr(n.Right)
if n.Type.IsComplex() {
pt := floatForComplex(n.Type)
op := s.ssaOp(n.Op, pt)
return s.newValue2(ssa.OpComplexMake, n.Type,
s.newValueOrSfCall2(op, pt, s.newValue1(ssa.OpComplexReal, pt, a), s.newValue1(ssa.OpComplexReal, pt, b)),
s.newValueOrSfCall2(op, pt, s.newValue1(ssa.OpComplexImag, pt, a), s.newValue1(ssa.OpComplexImag, pt, b)))
}
if n.Type.IsFloat() {
return s.newValueOrSfCall2(s.ssaOp(n.Op, n.Type), a.Type, a, b)
}
return s.newValue2(s.ssaOp(n.Op, n.Type), a.Type, a, b)
case OAND, OOR, OXOR:
a := s.expr(n.Left)
b := s.expr(n.Right)
return s.newValue2(s.ssaOp(n.Op, n.Type), a.Type, a, b)
case OLSH, ORSH:
a := s.expr(n.Left)
b := s.expr(n.Right)
bt := b.Type
if bt.IsSigned() {
cmp := s.newValue2(s.ssaOp(OGE, bt), types.Types[TBOOL], b, s.zeroVal(bt))
s.check(cmp, panicshift)
bt = bt.ToUnsigned()
}
return s.newValue2(s.ssaShiftOp(n.Op, n.Type, bt), a.Type, a, b)
case OANDAND, OOROR:
// To implement OANDAND (and OOROR), we introduce a
// new temporary variable to hold the result. The
// variable is associated with the OANDAND node in the
// s.vars table (normally variables are only
// associated with ONAME nodes). We convert
// A && B
// to
// var = A
// if var {
// var = B
// }
// Using var in the subsequent block introduces the
// necessary phi variable.
el := s.expr(n.Left)
s.vars[n] = el
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(el)
// In theory, we should set b.Likely here based on context.
// However, gc only gives us likeliness hints
// in a single place, for plain OIF statements,
// and passing around context is finnicky, so don't bother for now.
bRight := s.f.NewBlock(ssa.BlockPlain)
bResult := s.f.NewBlock(ssa.BlockPlain)
if n.Op == OANDAND {
b.AddEdgeTo(bRight)
b.AddEdgeTo(bResult)
} else if n.Op == OOROR {
b.AddEdgeTo(bResult)
b.AddEdgeTo(bRight)
}
s.startBlock(bRight)
er := s.expr(n.Right)
s.vars[n] = er
b = s.endBlock()
b.AddEdgeTo(bResult)
s.startBlock(bResult)
return s.variable(n, types.Types[TBOOL])
case OCOMPLEX:
r := s.expr(n.Left)
i := s.expr(n.Right)
return s.newValue2(ssa.OpComplexMake, n.Type, r, i)
// unary ops
cmd/compile: bulk rename This change does a bulk rename of several identifiers in the compiler. See #27167 and https://docs.google.com/document/d/19_ExiylD9MRfeAjKIfEsMU1_RGhuxB9sA0b5Zv7byVI/ for context and for discussion of these particular renames. Commands run to generate this change: gorename -from '"cmd/compile/internal/gc".OPROC' -to OGO gorename -from '"cmd/compile/internal/gc".OCOM' -to OBITNOT gorename -from '"cmd/compile/internal/gc".OMINUS' -to ONEG gorename -from '"cmd/compile/internal/gc".OIND' -to ODEREF gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTR' -to OBYTES2STR gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTRTMP' -to OBYTES2STRTMP gorename -from '"cmd/compile/internal/gc".OARRAYRUNESTR' -to ORUNES2STR gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTE' -to OSTR2BYTES gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTETMP' -to OSTR2BYTESTMP gorename -from '"cmd/compile/internal/gc".OSTRARRAYRUNE' -to OSTR2RUNES gorename -from '"cmd/compile/internal/gc".Etop' -to ctxStmt gorename -from '"cmd/compile/internal/gc".Erv' -to ctxExpr gorename -from '"cmd/compile/internal/gc".Ecall' -to ctxCallee gorename -from '"cmd/compile/internal/gc".Efnstruct' -to ctxMultiOK gorename -from '"cmd/compile/internal/gc".Easgn' -to ctxAssign gorename -from '"cmd/compile/internal/gc".Ecomplit' -to ctxCompLit Not altered: parameters and local variables (mostly in typecheck.go) named top, which should probably now be called ctx (and which should probably have a named type). Also not altered: Field called Top in gc.Func. gorename -from '"cmd/compile/internal/gc".Node.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/gc".Node.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/gc".nodeIsddd' -to nodeIsDDD gorename -from '"cmd/compile/internal/types".Field.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/types".Field.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/types".fieldIsddd' -to fieldIsDDD Not altered: function gc.hasddd, params and local variables called isddd Also not altered: fmt.go prints nodes using "isddd(%v)". cd cmd/compile/internal/gc; go generate I then manually found impacted comments using exact string match and fixed them up by hand. The comment changes were trivial. Passes toolstash-check. Fixes #27167. If this experiment is deemed a success, we will open a new tracking issue for renames to do at the end of the 1.13 cycles. Change-Id: I2dc541533d2ab0d06cb3d31d65df205ecfb151e8 Reviewed-on: https://go-review.googlesource.com/c/150140 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-11-18 08:34:38 -08:00
case ONEG:
a := s.expr(n.Left)
if n.Type.IsComplex() {
tp := floatForComplex(n.Type)
negop := s.ssaOp(n.Op, tp)
return s.newValue2(ssa.OpComplexMake, n.Type,
s.newValue1(negop, tp, s.newValue1(ssa.OpComplexReal, tp, a)),
s.newValue1(negop, tp, s.newValue1(ssa.OpComplexImag, tp, a)))
}
return s.newValue1(s.ssaOp(n.Op, n.Type), a.Type, a)
cmd/compile: bulk rename This change does a bulk rename of several identifiers in the compiler. See #27167 and https://docs.google.com/document/d/19_ExiylD9MRfeAjKIfEsMU1_RGhuxB9sA0b5Zv7byVI/ for context and for discussion of these particular renames. Commands run to generate this change: gorename -from '"cmd/compile/internal/gc".OPROC' -to OGO gorename -from '"cmd/compile/internal/gc".OCOM' -to OBITNOT gorename -from '"cmd/compile/internal/gc".OMINUS' -to ONEG gorename -from '"cmd/compile/internal/gc".OIND' -to ODEREF gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTR' -to OBYTES2STR gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTRTMP' -to OBYTES2STRTMP gorename -from '"cmd/compile/internal/gc".OARRAYRUNESTR' -to ORUNES2STR gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTE' -to OSTR2BYTES gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTETMP' -to OSTR2BYTESTMP gorename -from '"cmd/compile/internal/gc".OSTRARRAYRUNE' -to OSTR2RUNES gorename -from '"cmd/compile/internal/gc".Etop' -to ctxStmt gorename -from '"cmd/compile/internal/gc".Erv' -to ctxExpr gorename -from '"cmd/compile/internal/gc".Ecall' -to ctxCallee gorename -from '"cmd/compile/internal/gc".Efnstruct' -to ctxMultiOK gorename -from '"cmd/compile/internal/gc".Easgn' -to ctxAssign gorename -from '"cmd/compile/internal/gc".Ecomplit' -to ctxCompLit Not altered: parameters and local variables (mostly in typecheck.go) named top, which should probably now be called ctx (and which should probably have a named type). Also not altered: Field called Top in gc.Func. gorename -from '"cmd/compile/internal/gc".Node.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/gc".Node.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/gc".nodeIsddd' -to nodeIsDDD gorename -from '"cmd/compile/internal/types".Field.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/types".Field.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/types".fieldIsddd' -to fieldIsDDD Not altered: function gc.hasddd, params and local variables called isddd Also not altered: fmt.go prints nodes using "isddd(%v)". cd cmd/compile/internal/gc; go generate I then manually found impacted comments using exact string match and fixed them up by hand. The comment changes were trivial. Passes toolstash-check. Fixes #27167. If this experiment is deemed a success, we will open a new tracking issue for renames to do at the end of the 1.13 cycles. Change-Id: I2dc541533d2ab0d06cb3d31d65df205ecfb151e8 Reviewed-on: https://go-review.googlesource.com/c/150140 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-11-18 08:34:38 -08:00
case ONOT, OBITNOT:
a := s.expr(n.Left)
return s.newValue1(s.ssaOp(n.Op, n.Type), a.Type, a)
case OIMAG, OREAL:
a := s.expr(n.Left)
return s.newValue1(s.ssaOp(n.Op, n.Left.Type), n.Type, a)
case OPLUS:
return s.expr(n.Left)
case OADDR:
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
return s.addr(n.Left, n.Bounded())
case OINDREGSP:
addr := s.constOffPtrSP(types.NewPtr(n.Type), n.Xoffset)
return s.load(n.Type, addr)
cmd/compile: bulk rename This change does a bulk rename of several identifiers in the compiler. See #27167 and https://docs.google.com/document/d/19_ExiylD9MRfeAjKIfEsMU1_RGhuxB9sA0b5Zv7byVI/ for context and for discussion of these particular renames. Commands run to generate this change: gorename -from '"cmd/compile/internal/gc".OPROC' -to OGO gorename -from '"cmd/compile/internal/gc".OCOM' -to OBITNOT gorename -from '"cmd/compile/internal/gc".OMINUS' -to ONEG gorename -from '"cmd/compile/internal/gc".OIND' -to ODEREF gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTR' -to OBYTES2STR gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTRTMP' -to OBYTES2STRTMP gorename -from '"cmd/compile/internal/gc".OARRAYRUNESTR' -to ORUNES2STR gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTE' -to OSTR2BYTES gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTETMP' -to OSTR2BYTESTMP gorename -from '"cmd/compile/internal/gc".OSTRARRAYRUNE' -to OSTR2RUNES gorename -from '"cmd/compile/internal/gc".Etop' -to ctxStmt gorename -from '"cmd/compile/internal/gc".Erv' -to ctxExpr gorename -from '"cmd/compile/internal/gc".Ecall' -to ctxCallee gorename -from '"cmd/compile/internal/gc".Efnstruct' -to ctxMultiOK gorename -from '"cmd/compile/internal/gc".Easgn' -to ctxAssign gorename -from '"cmd/compile/internal/gc".Ecomplit' -to ctxCompLit Not altered: parameters and local variables (mostly in typecheck.go) named top, which should probably now be called ctx (and which should probably have a named type). Also not altered: Field called Top in gc.Func. gorename -from '"cmd/compile/internal/gc".Node.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/gc".Node.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/gc".nodeIsddd' -to nodeIsDDD gorename -from '"cmd/compile/internal/types".Field.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/types".Field.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/types".fieldIsddd' -to fieldIsDDD Not altered: function gc.hasddd, params and local variables called isddd Also not altered: fmt.go prints nodes using "isddd(%v)". cd cmd/compile/internal/gc; go generate I then manually found impacted comments using exact string match and fixed them up by hand. The comment changes were trivial. Passes toolstash-check. Fixes #27167. If this experiment is deemed a success, we will open a new tracking issue for renames to do at the end of the 1.13 cycles. Change-Id: I2dc541533d2ab0d06cb3d31d65df205ecfb151e8 Reviewed-on: https://go-review.googlesource.com/c/150140 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-11-18 08:34:38 -08:00
case ODEREF:
p := s.exprPtr(n.Left, false, n.Pos)
return s.load(n.Type, p)
case ODOT:
if n.Left.Op == OSTRUCTLIT {
// All literals with nonzero fields have already been
// rewritten during walk. Any that remain are just T{}
// or equivalents. Use the zero value.
if !isZero(n.Left) {
Fatalf("literal with nonzero value in SSA: %v", n.Left)
}
return s.zeroVal(n.Type)
}
// If n is addressable and can't be represented in
// SSA, then load just the selected field. This
// prevents false memory dependencies in race/msan
// instrumentation.
if islvalue(n) && !s.canSSA(n) {
p := s.addr(n, false)
return s.load(n.Type, p)
}
v := s.expr(n.Left)
return s.newValue1I(ssa.OpStructSelect, n.Type, int64(fieldIdx(n)), v)
case ODOTPTR:
p := s.exprPtr(n.Left, false, n.Pos)
p = s.newValue1I(ssa.OpOffPtr, types.NewPtr(n.Type), n.Xoffset, p)
return s.load(n.Type, p)
case OINDEX:
switch {
case n.Left.Type.IsString():
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
if n.Bounded() && Isconst(n.Left, CTSTR) && Isconst(n.Right, CTINT) {
// Replace "abc"[1] with 'b'.
// Delayed until now because "abc"[1] is not an ideal constant.
// See test/fixedbugs/issue11370.go.
return s.newValue0I(ssa.OpConst8, types.Types[TUINT8], int64(int8(n.Left.Val().U.(string)[n.Right.Int64()])))
}
a := s.expr(n.Left)
i := s.expr(n.Right)
len := s.newValue1(ssa.OpStringLen, types.Types[TINT], a)
i = s.boundsCheck(i, len, ssa.BoundsIndex, n.Bounded())
ptrtyp := s.f.Config.Types.BytePtr
ptr := s.newValue1(ssa.OpStringPtr, ptrtyp, a)
if Isconst(n.Right, CTINT) {
ptr = s.newValue1I(ssa.OpOffPtr, ptrtyp, n.Right.Int64(), ptr)
} else {
ptr = s.newValue2(ssa.OpAddPtr, ptrtyp, ptr, i)
}
return s.load(types.Types[TUINT8], ptr)
case n.Left.Type.IsSlice():
p := s.addr(n, false)
return s.load(n.Left.Type.Elem(), p)
case n.Left.Type.IsArray():
if canSSAType(n.Left.Type) {
// SSA can handle arrays of length at most 1.
bound := n.Left.Type.NumElem()
a := s.expr(n.Left)
i := s.expr(n.Right)
if bound == 0 {
// Bounds check will never succeed. Might as well
// use constants for the bounds check.
z := s.constInt(types.Types[TINT], 0)
s.boundsCheck(z, z, ssa.BoundsIndex, false)
// The return value won't be live, return junk.
return s.newValue0(ssa.OpUnknown, n.Type)
}
len := s.constInt(types.Types[TINT], bound)
i = s.boundsCheck(i, len, ssa.BoundsIndex, n.Bounded())
return s.newValue1I(ssa.OpArraySelect, n.Type, 0, a)
}
p := s.addr(n, false)
return s.load(n.Left.Type.Elem(), p)
default:
s.Fatalf("bad type for index %v", n.Left.Type)
return nil
}
case OLEN, OCAP:
switch {
case n.Left.Type.IsSlice():
op := ssa.OpSliceLen
if n.Op == OCAP {
op = ssa.OpSliceCap
}
return s.newValue1(op, types.Types[TINT], s.expr(n.Left))
case n.Left.Type.IsString(): // string; not reachable for OCAP
return s.newValue1(ssa.OpStringLen, types.Types[TINT], s.expr(n.Left))
case n.Left.Type.IsMap(), n.Left.Type.IsChan():
return s.referenceTypeBuiltin(n, s.expr(n.Left))
default: // array
return s.constInt(types.Types[TINT], n.Left.Type.NumElem())
}
case OSPTR:
a := s.expr(n.Left)
if n.Left.Type.IsSlice() {
return s.newValue1(ssa.OpSlicePtr, n.Type, a)
} else {
return s.newValue1(ssa.OpStringPtr, n.Type, a)
}
case OITAB:
a := s.expr(n.Left)
return s.newValue1(ssa.OpITab, n.Type, a)
case OIDATA:
a := s.expr(n.Left)
return s.newValue1(ssa.OpIData, n.Type, a)
case OEFACE:
tab := s.expr(n.Left)
data := s.expr(n.Right)
return s.newValue2(ssa.OpIMake, n.Type, tab, data)
case OSLICEHEADER:
p := s.expr(n.Left)
l := s.expr(n.List.First())
c := s.expr(n.List.Second())
return s.newValue3(ssa.OpSliceMake, n.Type, p, l, c)
case OSLICE, OSLICEARR, OSLICE3, OSLICE3ARR:
v := s.expr(n.Left)
var i, j, k *ssa.Value
low, high, max := n.SliceBounds()
if low != nil {
i = s.expr(low)
}
if high != nil {
j = s.expr(high)
}
if max != nil {
k = s.expr(max)
}
p, l, c := s.slice(v, i, j, k, n.Bounded())
return s.newValue3(ssa.OpSliceMake, n.Type, p, l, c)
case OSLICESTR:
v := s.expr(n.Left)
var i, j *ssa.Value
low, high, _ := n.SliceBounds()
if low != nil {
i = s.expr(low)
}
if high != nil {
j = s.expr(high)
}
p, l, _ := s.slice(v, i, j, nil, n.Bounded())
return s.newValue2(ssa.OpStringMake, n.Type, p, l)
case OCALLFUNC:
if isIntrinsicCall(n) {
return s.intrinsicCall(n)
}
fallthrough
case OCALLINTER, OCALLMETH:
a := s.call(n, callNormal)
return s.load(n.Type, a)
case OGETG:
return s.newValue1(ssa.OpGetG, n.Type, s.mem())
case OAPPEND:
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
return s.append(n, false)
case OSTRUCTLIT, OARRAYLIT:
// All literals with nonzero fields have already been
// rewritten during walk. Any that remain are just T{}
// or equivalents. Use the zero value.
if !isZero(n) {
Fatalf("literal with nonzero value in SSA: %v", n)
}
return s.zeroVal(n.Type)
cmd/compile: port callnew to ssa conversion This is part of a general effort to shrink walk. In an ideal world, we'd have an SSA op for allocation, but we don't yet have a good mechanism for introducing function calling during SSA compilation. In the meantime, SSA conversion is a better place for it. This also makes it easier to introduce new optimizations; instead of doing the typecheck walk dance, we can simply write what we want the backend to do. I introduced a new opcode in this change because: (a) It avoids a class of bugs involving correctly detecting whether this ONEW is a "before walk" ONEW or an "after walk" ONEW. It also means that using ONEW or ONEWOBJ in the wrong context will generally result in a faster failure. (b) Opcodes are cheap. (c) It provides a better place to put documentation. This change also is also marginally more performant: name old alloc/op new alloc/op delta Template 39.1MB ± 0% 39.0MB ± 0% -0.14% (p=0.008 n=5+5) Unicode 28.4MB ± 0% 28.4MB ± 0% ~ (p=0.421 n=5+5) GoTypes 132MB ± 0% 132MB ± 0% -0.23% (p=0.008 n=5+5) Compiler 608MB ± 0% 607MB ± 0% -0.25% (p=0.008 n=5+5) SSA 2.04GB ± 0% 2.04GB ± 0% -0.01% (p=0.008 n=5+5) Flate 24.4MB ± 0% 24.3MB ± 0% -0.13% (p=0.008 n=5+5) GoParser 29.3MB ± 0% 29.1MB ± 0% -0.54% (p=0.008 n=5+5) Reflect 84.8MB ± 0% 84.7MB ± 0% -0.21% (p=0.008 n=5+5) Tar 36.7MB ± 0% 36.6MB ± 0% -0.10% (p=0.008 n=5+5) XML 48.7MB ± 0% 48.6MB ± 0% -0.24% (p=0.008 n=5+5) [Geo mean] 85.0MB 84.8MB -0.19% name old allocs/op new allocs/op delta Template 383k ± 0% 382k ± 0% -0.26% (p=0.008 n=5+5) Unicode 341k ± 0% 341k ± 0% ~ (p=0.579 n=5+5) GoTypes 1.37M ± 0% 1.36M ± 0% -0.39% (p=0.008 n=5+5) Compiler 5.59M ± 0% 5.56M ± 0% -0.49% (p=0.008 n=5+5) SSA 16.9M ± 0% 16.9M ± 0% -0.03% (p=0.008 n=5+5) Flate 238k ± 0% 238k ± 0% -0.23% (p=0.008 n=5+5) GoParser 306k ± 0% 303k ± 0% -0.93% (p=0.008 n=5+5) Reflect 990k ± 0% 987k ± 0% -0.33% (p=0.008 n=5+5) Tar 356k ± 0% 355k ± 0% -0.20% (p=0.008 n=5+5) XML 444k ± 0% 442k ± 0% -0.45% (p=0.008 n=5+5) [Geo mean] 848k 845k -0.33% Change-Id: I2c36003a7cbf71b53857b7de734852b698f49310 Reviewed-on: https://go-review.googlesource.com/c/go/+/167957 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2019-03-17 07:14:12 -07:00
case ONEWOBJ:
if n.Type.Elem().Size() == 0 {
return s.newValue1A(ssa.OpAddr, n.Type, zerobaseSym, s.sb)
}
typ := s.expr(n.Left)
vv := s.rtcall(newobject, true, []*types.Type{n.Type}, typ)
return vv[0]
default:
s.Fatalf("unhandled expr %v", n.Op)
return nil
}
}
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
// append converts an OAPPEND node to SSA.
// If inplace is false, it converts the OAPPEND expression n to an ssa.Value,
// adds it to s, and returns the Value.
// If inplace is true, it writes the result of the OAPPEND expression n
// back to the slice being appended to, and returns nil.
// inplace MUST be set to false if the slice can be SSA'd.
func (s *state) append(n *Node, inplace bool) *ssa.Value {
// If inplace is false, process as expression "append(s, e1, e2, e3)":
//
cmd/compile: avoid a spill in append fast path Instead of spilling newlen, recalculate it. This removes a spill from the fast path, at the cost of a cheap recalculation on the (rare) growth path. This uses 8 bytes less of stack space. It generates two more bytes of code, but that is due to suboptimal register allocation; see far below. Runtime append microbenchmarks are all over the map, presumably due to incidental code movement. Sample code: func s(b []byte) []byte { b = append(b, 1, 2, 3) return b } Before: "".s t=1 size=160 args=0x30 locals=0x48 0x0000 00000 (append.go:8) TEXT "".s(SB), $72-48 0x0000 00000 (append.go:8) MOVQ (TLS), CX 0x0009 00009 (append.go:8) CMPQ SP, 16(CX) 0x000d 00013 (append.go:8) JLS 149 0x0013 00019 (append.go:8) SUBQ $72, SP 0x0017 00023 (append.go:8) FUNCDATA $0, gclocals·6432f8c6a0d23fa7bee6c5d96f21a92a(SB) 0x0017 00023 (append.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:9) MOVQ "".b+88(FP), CX 0x001c 00028 (append.go:9) LEAQ 3(CX), DX 0x0020 00032 (append.go:9) MOVQ DX, "".autotmp_0+64(SP) 0x0025 00037 (append.go:9) MOVQ "".b+96(FP), BX 0x002a 00042 (append.go:9) CMPQ DX, BX 0x002d 00045 (append.go:9) JGT $0, 86 0x002f 00047 (append.go:8) MOVQ "".b+80(FP), AX 0x0034 00052 (append.go:9) MOVB $1, (AX)(CX*1) 0x0038 00056 (append.go:9) MOVB $2, 1(AX)(CX*1) 0x003d 00061 (append.go:9) MOVB $3, 2(AX)(CX*1) 0x0042 00066 (append.go:10) MOVQ AX, "".~r1+104(FP) 0x0047 00071 (append.go:10) MOVQ DX, "".~r1+112(FP) 0x004c 00076 (append.go:10) MOVQ BX, "".~r1+120(FP) 0x0051 00081 (append.go:10) ADDQ $72, SP 0x0055 00085 (append.go:10) RET 0x0056 00086 (append.go:9) LEAQ type.[]uint8(SB), AX 0x005d 00093 (append.go:9) MOVQ AX, (SP) 0x0061 00097 (append.go:9) MOVQ "".b+80(FP), BP 0x0066 00102 (append.go:9) MOVQ BP, 8(SP) 0x006b 00107 (append.go:9) MOVQ CX, 16(SP) 0x0070 00112 (append.go:9) MOVQ BX, 24(SP) 0x0075 00117 (append.go:9) MOVQ DX, 32(SP) 0x007a 00122 (append.go:9) PCDATA $0, $0 0x007a 00122 (append.go:9) CALL runtime.growslice(SB) 0x007f 00127 (append.go:9) MOVQ 40(SP), AX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:8) MOVQ "".b+88(FP), CX 0x008e 00142 (append.go:9) MOVQ "".autotmp_0+64(SP), DX 0x0093 00147 (append.go:9) JMP 52 0x0095 00149 (append.go:9) NOP 0x0095 00149 (append.go:8) CALL runtime.morestack_noctxt(SB) 0x009a 00154 (append.go:8) JMP 0 After: "".s t=1 size=176 args=0x30 locals=0x40 0x0000 00000 (append.go:8) TEXT "".s(SB), $64-48 0x0000 00000 (append.go:8) MOVQ (TLS), CX 0x0009 00009 (append.go:8) CMPQ SP, 16(CX) 0x000d 00013 (append.go:8) JLS 151 0x0013 00019 (append.go:8) SUBQ $64, SP 0x0017 00023 (append.go:8) FUNCDATA $0, gclocals·6432f8c6a0d23fa7bee6c5d96f21a92a(SB) 0x0017 00023 (append.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:9) MOVQ "".b+80(FP), CX 0x001c 00028 (append.go:9) LEAQ 3(CX), DX 0x0020 00032 (append.go:9) MOVQ "".b+88(FP), BX 0x0025 00037 (append.go:9) CMPQ DX, BX 0x0028 00040 (append.go:9) JGT $0, 81 0x002a 00042 (append.go:8) MOVQ "".b+72(FP), AX 0x002f 00047 (append.go:9) MOVB $1, (AX)(CX*1) 0x0033 00051 (append.go:9) MOVB $2, 1(AX)(CX*1) 0x0038 00056 (append.go:9) MOVB $3, 2(AX)(CX*1) 0x003d 00061 (append.go:10) MOVQ AX, "".~r1+96(FP) 0x0042 00066 (append.go:10) MOVQ DX, "".~r1+104(FP) 0x0047 00071 (append.go:10) MOVQ BX, "".~r1+112(FP) 0x004c 00076 (append.go:10) ADDQ $64, SP 0x0050 00080 (append.go:10) RET 0x0051 00081 (append.go:9) LEAQ type.[]uint8(SB), AX 0x0058 00088 (append.go:9) MOVQ AX, (SP) 0x005c 00092 (append.go:9) MOVQ "".b+72(FP), BP 0x0061 00097 (append.go:9) MOVQ BP, 8(SP) 0x0066 00102 (append.go:9) MOVQ CX, 16(SP) 0x006b 00107 (append.go:9) MOVQ BX, 24(SP) 0x0070 00112 (append.go:9) MOVQ DX, 32(SP) 0x0075 00117 (append.go:9) PCDATA $0, $0 0x0075 00117 (append.go:9) CALL runtime.growslice(SB) 0x007a 00122 (append.go:9) MOVQ 40(SP), AX 0x007f 00127 (append.go:9) MOVQ 48(SP), CX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:9) ADDQ $3, CX 0x008d 00141 (append.go:9) MOVQ CX, DX 0x0090 00144 (append.go:8) MOVQ "".b+80(FP), CX 0x0095 00149 (append.go:9) JMP 47 0x0097 00151 (append.go:9) NOP 0x0097 00151 (append.go:8) CALL runtime.morestack_noctxt(SB) 0x009c 00156 (append.go:8) JMP 0 Observe that in the following sequence, we should use DX directly instead of using CX as a temporary register, which would make the new code a strict improvement on the old: 0x007f 00127 (append.go:9) MOVQ 48(SP), CX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:9) ADDQ $3, CX 0x008d 00141 (append.go:9) MOVQ CX, DX 0x0090 00144 (append.go:8) MOVQ "".b+80(FP), CX Change-Id: I4ee50b18fa53865901d2d7f86c2cbb54c6fa6924 Reviewed-on: https://go-review.googlesource.com/21812 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:08:00 -07:00
// ptr, len, cap := s
// newlen := len + 3
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
// if newlen > cap {
cmd/compile: avoid a spill in append fast path Instead of spilling newlen, recalculate it. This removes a spill from the fast path, at the cost of a cheap recalculation on the (rare) growth path. This uses 8 bytes less of stack space. It generates two more bytes of code, but that is due to suboptimal register allocation; see far below. Runtime append microbenchmarks are all over the map, presumably due to incidental code movement. Sample code: func s(b []byte) []byte { b = append(b, 1, 2, 3) return b } Before: "".s t=1 size=160 args=0x30 locals=0x48 0x0000 00000 (append.go:8) TEXT "".s(SB), $72-48 0x0000 00000 (append.go:8) MOVQ (TLS), CX 0x0009 00009 (append.go:8) CMPQ SP, 16(CX) 0x000d 00013 (append.go:8) JLS 149 0x0013 00019 (append.go:8) SUBQ $72, SP 0x0017 00023 (append.go:8) FUNCDATA $0, gclocals·6432f8c6a0d23fa7bee6c5d96f21a92a(SB) 0x0017 00023 (append.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:9) MOVQ "".b+88(FP), CX 0x001c 00028 (append.go:9) LEAQ 3(CX), DX 0x0020 00032 (append.go:9) MOVQ DX, "".autotmp_0+64(SP) 0x0025 00037 (append.go:9) MOVQ "".b+96(FP), BX 0x002a 00042 (append.go:9) CMPQ DX, BX 0x002d 00045 (append.go:9) JGT $0, 86 0x002f 00047 (append.go:8) MOVQ "".b+80(FP), AX 0x0034 00052 (append.go:9) MOVB $1, (AX)(CX*1) 0x0038 00056 (append.go:9) MOVB $2, 1(AX)(CX*1) 0x003d 00061 (append.go:9) MOVB $3, 2(AX)(CX*1) 0x0042 00066 (append.go:10) MOVQ AX, "".~r1+104(FP) 0x0047 00071 (append.go:10) MOVQ DX, "".~r1+112(FP) 0x004c 00076 (append.go:10) MOVQ BX, "".~r1+120(FP) 0x0051 00081 (append.go:10) ADDQ $72, SP 0x0055 00085 (append.go:10) RET 0x0056 00086 (append.go:9) LEAQ type.[]uint8(SB), AX 0x005d 00093 (append.go:9) MOVQ AX, (SP) 0x0061 00097 (append.go:9) MOVQ "".b+80(FP), BP 0x0066 00102 (append.go:9) MOVQ BP, 8(SP) 0x006b 00107 (append.go:9) MOVQ CX, 16(SP) 0x0070 00112 (append.go:9) MOVQ BX, 24(SP) 0x0075 00117 (append.go:9) MOVQ DX, 32(SP) 0x007a 00122 (append.go:9) PCDATA $0, $0 0x007a 00122 (append.go:9) CALL runtime.growslice(SB) 0x007f 00127 (append.go:9) MOVQ 40(SP), AX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:8) MOVQ "".b+88(FP), CX 0x008e 00142 (append.go:9) MOVQ "".autotmp_0+64(SP), DX 0x0093 00147 (append.go:9) JMP 52 0x0095 00149 (append.go:9) NOP 0x0095 00149 (append.go:8) CALL runtime.morestack_noctxt(SB) 0x009a 00154 (append.go:8) JMP 0 After: "".s t=1 size=176 args=0x30 locals=0x40 0x0000 00000 (append.go:8) TEXT "".s(SB), $64-48 0x0000 00000 (append.go:8) MOVQ (TLS), CX 0x0009 00009 (append.go:8) CMPQ SP, 16(CX) 0x000d 00013 (append.go:8) JLS 151 0x0013 00019 (append.go:8) SUBQ $64, SP 0x0017 00023 (append.go:8) FUNCDATA $0, gclocals·6432f8c6a0d23fa7bee6c5d96f21a92a(SB) 0x0017 00023 (append.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:9) MOVQ "".b+80(FP), CX 0x001c 00028 (append.go:9) LEAQ 3(CX), DX 0x0020 00032 (append.go:9) MOVQ "".b+88(FP), BX 0x0025 00037 (append.go:9) CMPQ DX, BX 0x0028 00040 (append.go:9) JGT $0, 81 0x002a 00042 (append.go:8) MOVQ "".b+72(FP), AX 0x002f 00047 (append.go:9) MOVB $1, (AX)(CX*1) 0x0033 00051 (append.go:9) MOVB $2, 1(AX)(CX*1) 0x0038 00056 (append.go:9) MOVB $3, 2(AX)(CX*1) 0x003d 00061 (append.go:10) MOVQ AX, "".~r1+96(FP) 0x0042 00066 (append.go:10) MOVQ DX, "".~r1+104(FP) 0x0047 00071 (append.go:10) MOVQ BX, "".~r1+112(FP) 0x004c 00076 (append.go:10) ADDQ $64, SP 0x0050 00080 (append.go:10) RET 0x0051 00081 (append.go:9) LEAQ type.[]uint8(SB), AX 0x0058 00088 (append.go:9) MOVQ AX, (SP) 0x005c 00092 (append.go:9) MOVQ "".b+72(FP), BP 0x0061 00097 (append.go:9) MOVQ BP, 8(SP) 0x0066 00102 (append.go:9) MOVQ CX, 16(SP) 0x006b 00107 (append.go:9) MOVQ BX, 24(SP) 0x0070 00112 (append.go:9) MOVQ DX, 32(SP) 0x0075 00117 (append.go:9) PCDATA $0, $0 0x0075 00117 (append.go:9) CALL runtime.growslice(SB) 0x007a 00122 (append.go:9) MOVQ 40(SP), AX 0x007f 00127 (append.go:9) MOVQ 48(SP), CX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:9) ADDQ $3, CX 0x008d 00141 (append.go:9) MOVQ CX, DX 0x0090 00144 (append.go:8) MOVQ "".b+80(FP), CX 0x0095 00149 (append.go:9) JMP 47 0x0097 00151 (append.go:9) NOP 0x0097 00151 (append.go:8) CALL runtime.morestack_noctxt(SB) 0x009c 00156 (append.go:8) JMP 0 Observe that in the following sequence, we should use DX directly instead of using CX as a temporary register, which would make the new code a strict improvement on the old: 0x007f 00127 (append.go:9) MOVQ 48(SP), CX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:9) ADDQ $3, CX 0x008d 00141 (append.go:9) MOVQ CX, DX 0x0090 00144 (append.go:8) MOVQ "".b+80(FP), CX Change-Id: I4ee50b18fa53865901d2d7f86c2cbb54c6fa6924 Reviewed-on: https://go-review.googlesource.com/21812 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:08:00 -07:00
// ptr, len, cap = growslice(s, newlen)
// newlen = len + 3 // recalculate to avoid a spill
// }
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
// // with write barriers, if needed:
// *(ptr+len) = e1
// *(ptr+len+1) = e2
// *(ptr+len+2) = e3
// return makeslice(ptr, newlen, cap)
//
//
// If inplace is true, process as statement "s = append(s, e1, e2, e3)":
//
// a := &s
// ptr, len, cap := s
// newlen := len + 3
// if uint(newlen) > uint(cap) {
cmd/compile: re-enable in-place append optimization CL 21891 was too clever in its attempts to avoid spills. Storing newlen too early caused uses of append in the runtime itself to receive an inconsistent view of a slice, leading to corruption. This CL makes the generate code much more similar to the old backend. It spills more than before, but those spills have been contained to the grow path. It recalculates newlen unnecessarily on the fast path, but that's measurably cheaper than spilling it. CL 21891 caused runtime failures in 6 of 2000 runs of net/http and crypto/x509 in my test setup. This CL has gone 6000 runs without a failure. Benchmarks going from master to this CL: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 439ns ± 2% 436ns ± 2% -0.72% (p=0.001 n=28+27) AppendInPlace/NoGrow/1Ptr-8 901ns ± 0% 856ns ± 0% -4.95% (p=0.000 n=26+29) AppendInPlace/NoGrow/2Ptr-8 2.15µs ± 1% 1.95µs ± 0% -9.07% (p=0.000 n=28+30) AppendInPlace/NoGrow/3Ptr-8 2.66µs ± 0% 2.45µs ± 0% -7.93% (p=0.000 n=29+26) AppendInPlace/NoGrow/4Ptr-8 3.24µs ± 1% 3.02µs ± 1% -6.75% (p=0.000 n=28+30) AppendInPlace/Grow/Byte-8 269ns ± 1% 271ns ± 1% +0.84% (p=0.000 n=30+29) AppendInPlace/Grow/1Ptr-8 275ns ± 1% 280ns ± 1% +1.75% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 384ns ± 0% 391ns ± 0% +1.94% (p=0.000 n=27+30) AppendInPlace/Grow/3Ptr-8 455ns ± 0% 462ns ± 0% +1.43% (p=0.000 n=29+29) AppendInPlace/Grow/4Ptr-8 478ns ± 0% 479ns ± 0% +0.23% (p=0.000 n=30+27) However, for the large no-grow cases, there is still more work to be done. Going from this CL to the non-SSA backend: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 436ns ± 2% 436ns ± 2% ~ (p=0.967 n=27+29) AppendInPlace/NoGrow/1Ptr-8 856ns ± 0% 884ns ± 0% +3.28% (p=0.000 n=29+26) AppendInPlace/NoGrow/2Ptr-8 1.95µs ± 0% 1.56µs ± 0% -20.28% (p=0.000 n=30+29) AppendInPlace/NoGrow/3Ptr-8 2.45µs ± 0% 1.89µs ± 0% -22.88% (p=0.000 n=26+28) AppendInPlace/NoGrow/4Ptr-8 3.02µs ± 1% 2.56µs ± 1% -15.35% (p=0.000 n=30+28) AppendInPlace/Grow/Byte-8 271ns ± 1% 283ns ± 1% +4.56% (p=0.000 n=29+29) AppendInPlace/Grow/1Ptr-8 280ns ± 1% 288ns ± 1% +2.99% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 391ns ± 0% 409ns ± 0% +4.66% (p=0.000 n=30+29) AppendInPlace/Grow/3Ptr-8 462ns ± 0% 481ns ± 0% +4.13% (p=0.000 n=29+30) AppendInPlace/Grow/4Ptr-8 479ns ± 0% 502ns ± 0% +4.81% (p=0.000 n=27+26) New generated code: var x []byte func a() { x = append(x, 1) } "".a t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (a.go:5) TEXT "".a(SB), $72-0 0x0000 00000 (a.go:5) MOVQ (TLS), CX 0x0009 00009 (a.go:5) CMPQ SP, 16(CX) 0x000d 00013 (a.go:5) JLS 190 0x0013 00019 (a.go:5) SUBQ $72, SP 0x0017 00023 (a.go:5) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:5) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:6) MOVQ "".x+16(SB), CX 0x001e 00030 (a.go:6) MOVQ "".x+8(SB), DX 0x0025 00037 (a.go:6) MOVQ "".x(SB), BX 0x002c 00044 (a.go:6) LEAQ 1(DX), BP 0x0030 00048 (a.go:6) CMPQ BP, CX 0x0033 00051 (a.go:6) JGT $0, 73 0x0035 00053 (a.go:6) LEAQ 1(DX), AX 0x0039 00057 (a.go:6) MOVQ AX, "".x+8(SB) 0x0040 00064 (a.go:6) MOVB $1, (BX)(DX*1) 0x0044 00068 (a.go:7) ADDQ $72, SP 0x0048 00072 (a.go:7) RET 0x0049 00073 (a.go:6) LEAQ type.[]uint8(SB), AX 0x0050 00080 (a.go:6) MOVQ AX, (SP) 0x0054 00084 (a.go:6) MOVQ BX, 8(SP) 0x0059 00089 (a.go:6) MOVQ DX, 16(SP) 0x005e 00094 (a.go:6) MOVQ CX, 24(SP) 0x0063 00099 (a.go:6) MOVQ BP, 32(SP) 0x0068 00104 (a.go:6) PCDATA $0, $0 0x0068 00104 (a.go:6) CALL runtime.growslice(SB) 0x006d 00109 (a.go:6) MOVQ 40(SP), CX 0x0072 00114 (a.go:6) MOVQ 48(SP), DX 0x0077 00119 (a.go:6) MOVQ DX, "".autotmp_0+64(SP) 0x007c 00124 (a.go:6) MOVQ 56(SP), BX 0x0081 00129 (a.go:6) MOVQ BX, "".x+16(SB) 0x0088 00136 (a.go:6) MOVL runtime.writeBarrier(SB), AX 0x008e 00142 (a.go:6) TESTB AL, AL 0x0090 00144 (a.go:6) JNE $0, 162 0x0092 00146 (a.go:6) MOVQ CX, "".x(SB) 0x0099 00153 (a.go:6) MOVQ "".x(SB), BX 0x00a0 00160 (a.go:6) JMP 53 0x00a2 00162 (a.go:6) LEAQ "".x(SB), BX 0x00a9 00169 (a.go:6) MOVQ BX, (SP) 0x00ad 00173 (a.go:6) MOVQ CX, 8(SP) 0x00b2 00178 (a.go:6) PCDATA $0, $0 0x00b2 00178 (a.go:6) CALL runtime.writebarrierptr(SB) 0x00b7 00183 (a.go:6) MOVQ "".autotmp_0+64(SP), DX 0x00bc 00188 (a.go:6) JMP 153 0x00be 00190 (a.go:6) NOP 0x00be 00190 (a.go:5) CALL runtime.morestack_noctxt(SB) 0x00c3 00195 (a.go:5) JMP 0 Fixes #14969 again Change-Id: Ia50463b1f506011aad0718a4fef1d4738e43c32d Reviewed-on: https://go-review.googlesource.com/22197 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-18 09:40:30 -07:00
// newptr, len, newcap = growslice(ptr, len, cap, newlen)
// vardef(a) // if necessary, advise liveness we are writing a new a
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
// *a.cap = newcap // write before ptr to avoid a spill
// *a.ptr = newptr // with write barrier
// }
cmd/compile: re-enable in-place append optimization CL 21891 was too clever in its attempts to avoid spills. Storing newlen too early caused uses of append in the runtime itself to receive an inconsistent view of a slice, leading to corruption. This CL makes the generate code much more similar to the old backend. It spills more than before, but those spills have been contained to the grow path. It recalculates newlen unnecessarily on the fast path, but that's measurably cheaper than spilling it. CL 21891 caused runtime failures in 6 of 2000 runs of net/http and crypto/x509 in my test setup. This CL has gone 6000 runs without a failure. Benchmarks going from master to this CL: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 439ns ± 2% 436ns ± 2% -0.72% (p=0.001 n=28+27) AppendInPlace/NoGrow/1Ptr-8 901ns ± 0% 856ns ± 0% -4.95% (p=0.000 n=26+29) AppendInPlace/NoGrow/2Ptr-8 2.15µs ± 1% 1.95µs ± 0% -9.07% (p=0.000 n=28+30) AppendInPlace/NoGrow/3Ptr-8 2.66µs ± 0% 2.45µs ± 0% -7.93% (p=0.000 n=29+26) AppendInPlace/NoGrow/4Ptr-8 3.24µs ± 1% 3.02µs ± 1% -6.75% (p=0.000 n=28+30) AppendInPlace/Grow/Byte-8 269ns ± 1% 271ns ± 1% +0.84% (p=0.000 n=30+29) AppendInPlace/Grow/1Ptr-8 275ns ± 1% 280ns ± 1% +1.75% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 384ns ± 0% 391ns ± 0% +1.94% (p=0.000 n=27+30) AppendInPlace/Grow/3Ptr-8 455ns ± 0% 462ns ± 0% +1.43% (p=0.000 n=29+29) AppendInPlace/Grow/4Ptr-8 478ns ± 0% 479ns ± 0% +0.23% (p=0.000 n=30+27) However, for the large no-grow cases, there is still more work to be done. Going from this CL to the non-SSA backend: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 436ns ± 2% 436ns ± 2% ~ (p=0.967 n=27+29) AppendInPlace/NoGrow/1Ptr-8 856ns ± 0% 884ns ± 0% +3.28% (p=0.000 n=29+26) AppendInPlace/NoGrow/2Ptr-8 1.95µs ± 0% 1.56µs ± 0% -20.28% (p=0.000 n=30+29) AppendInPlace/NoGrow/3Ptr-8 2.45µs ± 0% 1.89µs ± 0% -22.88% (p=0.000 n=26+28) AppendInPlace/NoGrow/4Ptr-8 3.02µs ± 1% 2.56µs ± 1% -15.35% (p=0.000 n=30+28) AppendInPlace/Grow/Byte-8 271ns ± 1% 283ns ± 1% +4.56% (p=0.000 n=29+29) AppendInPlace/Grow/1Ptr-8 280ns ± 1% 288ns ± 1% +2.99% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 391ns ± 0% 409ns ± 0% +4.66% (p=0.000 n=30+29) AppendInPlace/Grow/3Ptr-8 462ns ± 0% 481ns ± 0% +4.13% (p=0.000 n=29+30) AppendInPlace/Grow/4Ptr-8 479ns ± 0% 502ns ± 0% +4.81% (p=0.000 n=27+26) New generated code: var x []byte func a() { x = append(x, 1) } "".a t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (a.go:5) TEXT "".a(SB), $72-0 0x0000 00000 (a.go:5) MOVQ (TLS), CX 0x0009 00009 (a.go:5) CMPQ SP, 16(CX) 0x000d 00013 (a.go:5) JLS 190 0x0013 00019 (a.go:5) SUBQ $72, SP 0x0017 00023 (a.go:5) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:5) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:6) MOVQ "".x+16(SB), CX 0x001e 00030 (a.go:6) MOVQ "".x+8(SB), DX 0x0025 00037 (a.go:6) MOVQ "".x(SB), BX 0x002c 00044 (a.go:6) LEAQ 1(DX), BP 0x0030 00048 (a.go:6) CMPQ BP, CX 0x0033 00051 (a.go:6) JGT $0, 73 0x0035 00053 (a.go:6) LEAQ 1(DX), AX 0x0039 00057 (a.go:6) MOVQ AX, "".x+8(SB) 0x0040 00064 (a.go:6) MOVB $1, (BX)(DX*1) 0x0044 00068 (a.go:7) ADDQ $72, SP 0x0048 00072 (a.go:7) RET 0x0049 00073 (a.go:6) LEAQ type.[]uint8(SB), AX 0x0050 00080 (a.go:6) MOVQ AX, (SP) 0x0054 00084 (a.go:6) MOVQ BX, 8(SP) 0x0059 00089 (a.go:6) MOVQ DX, 16(SP) 0x005e 00094 (a.go:6) MOVQ CX, 24(SP) 0x0063 00099 (a.go:6) MOVQ BP, 32(SP) 0x0068 00104 (a.go:6) PCDATA $0, $0 0x0068 00104 (a.go:6) CALL runtime.growslice(SB) 0x006d 00109 (a.go:6) MOVQ 40(SP), CX 0x0072 00114 (a.go:6) MOVQ 48(SP), DX 0x0077 00119 (a.go:6) MOVQ DX, "".autotmp_0+64(SP) 0x007c 00124 (a.go:6) MOVQ 56(SP), BX 0x0081 00129 (a.go:6) MOVQ BX, "".x+16(SB) 0x0088 00136 (a.go:6) MOVL runtime.writeBarrier(SB), AX 0x008e 00142 (a.go:6) TESTB AL, AL 0x0090 00144 (a.go:6) JNE $0, 162 0x0092 00146 (a.go:6) MOVQ CX, "".x(SB) 0x0099 00153 (a.go:6) MOVQ "".x(SB), BX 0x00a0 00160 (a.go:6) JMP 53 0x00a2 00162 (a.go:6) LEAQ "".x(SB), BX 0x00a9 00169 (a.go:6) MOVQ BX, (SP) 0x00ad 00173 (a.go:6) MOVQ CX, 8(SP) 0x00b2 00178 (a.go:6) PCDATA $0, $0 0x00b2 00178 (a.go:6) CALL runtime.writebarrierptr(SB) 0x00b7 00183 (a.go:6) MOVQ "".autotmp_0+64(SP), DX 0x00bc 00188 (a.go:6) JMP 153 0x00be 00190 (a.go:6) NOP 0x00be 00190 (a.go:5) CALL runtime.morestack_noctxt(SB) 0x00c3 00195 (a.go:5) JMP 0 Fixes #14969 again Change-Id: Ia50463b1f506011aad0718a4fef1d4738e43c32d Reviewed-on: https://go-review.googlesource.com/22197 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-18 09:40:30 -07:00
// newlen = len + 3 // recalculate to avoid a spill
// *a.len = newlen
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
// // with write barriers, if needed:
// *(ptr+len) = e1
// *(ptr+len+1) = e2
// *(ptr+len+2) = e3
et := n.Type.Elem()
pt := types.NewPtr(et)
// Evaluate slice
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
sn := n.List.First() // the slice node is the first in the list
var slice, addr *ssa.Value
if inplace {
addr = s.addr(sn, false)
slice = s.load(n.Type, addr)
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
} else {
slice = s.expr(sn)
}
// Allocate new blocks
grow := s.f.NewBlock(ssa.BlockPlain)
assign := s.f.NewBlock(ssa.BlockPlain)
// Decide if we need to grow
nargs := int64(n.List.Len() - 1)
p := s.newValue1(ssa.OpSlicePtr, pt, slice)
l := s.newValue1(ssa.OpSliceLen, types.Types[TINT], slice)
c := s.newValue1(ssa.OpSliceCap, types.Types[TINT], slice)
nl := s.newValue2(s.ssaOp(OADD, types.Types[TINT]), types.Types[TINT], l, s.constInt(types.Types[TINT], nargs))
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
cmp := s.newValue2(s.ssaOp(OGT, types.Types[TUINT]), types.Types[TBOOL], nl, c)
s.vars[&ptrVar] = p
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
if !inplace {
s.vars[&newlenVar] = nl
s.vars[&capVar] = c
cmd/compile: re-enable in-place append optimization CL 21891 was too clever in its attempts to avoid spills. Storing newlen too early caused uses of append in the runtime itself to receive an inconsistent view of a slice, leading to corruption. This CL makes the generate code much more similar to the old backend. It spills more than before, but those spills have been contained to the grow path. It recalculates newlen unnecessarily on the fast path, but that's measurably cheaper than spilling it. CL 21891 caused runtime failures in 6 of 2000 runs of net/http and crypto/x509 in my test setup. This CL has gone 6000 runs without a failure. Benchmarks going from master to this CL: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 439ns ± 2% 436ns ± 2% -0.72% (p=0.001 n=28+27) AppendInPlace/NoGrow/1Ptr-8 901ns ± 0% 856ns ± 0% -4.95% (p=0.000 n=26+29) AppendInPlace/NoGrow/2Ptr-8 2.15µs ± 1% 1.95µs ± 0% -9.07% (p=0.000 n=28+30) AppendInPlace/NoGrow/3Ptr-8 2.66µs ± 0% 2.45µs ± 0% -7.93% (p=0.000 n=29+26) AppendInPlace/NoGrow/4Ptr-8 3.24µs ± 1% 3.02µs ± 1% -6.75% (p=0.000 n=28+30) AppendInPlace/Grow/Byte-8 269ns ± 1% 271ns ± 1% +0.84% (p=0.000 n=30+29) AppendInPlace/Grow/1Ptr-8 275ns ± 1% 280ns ± 1% +1.75% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 384ns ± 0% 391ns ± 0% +1.94% (p=0.000 n=27+30) AppendInPlace/Grow/3Ptr-8 455ns ± 0% 462ns ± 0% +1.43% (p=0.000 n=29+29) AppendInPlace/Grow/4Ptr-8 478ns ± 0% 479ns ± 0% +0.23% (p=0.000 n=30+27) However, for the large no-grow cases, there is still more work to be done. Going from this CL to the non-SSA backend: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 436ns ± 2% 436ns ± 2% ~ (p=0.967 n=27+29) AppendInPlace/NoGrow/1Ptr-8 856ns ± 0% 884ns ± 0% +3.28% (p=0.000 n=29+26) AppendInPlace/NoGrow/2Ptr-8 1.95µs ± 0% 1.56µs ± 0% -20.28% (p=0.000 n=30+29) AppendInPlace/NoGrow/3Ptr-8 2.45µs ± 0% 1.89µs ± 0% -22.88% (p=0.000 n=26+28) AppendInPlace/NoGrow/4Ptr-8 3.02µs ± 1% 2.56µs ± 1% -15.35% (p=0.000 n=30+28) AppendInPlace/Grow/Byte-8 271ns ± 1% 283ns ± 1% +4.56% (p=0.000 n=29+29) AppendInPlace/Grow/1Ptr-8 280ns ± 1% 288ns ± 1% +2.99% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 391ns ± 0% 409ns ± 0% +4.66% (p=0.000 n=30+29) AppendInPlace/Grow/3Ptr-8 462ns ± 0% 481ns ± 0% +4.13% (p=0.000 n=29+30) AppendInPlace/Grow/4Ptr-8 479ns ± 0% 502ns ± 0% +4.81% (p=0.000 n=27+26) New generated code: var x []byte func a() { x = append(x, 1) } "".a t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (a.go:5) TEXT "".a(SB), $72-0 0x0000 00000 (a.go:5) MOVQ (TLS), CX 0x0009 00009 (a.go:5) CMPQ SP, 16(CX) 0x000d 00013 (a.go:5) JLS 190 0x0013 00019 (a.go:5) SUBQ $72, SP 0x0017 00023 (a.go:5) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:5) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:6) MOVQ "".x+16(SB), CX 0x001e 00030 (a.go:6) MOVQ "".x+8(SB), DX 0x0025 00037 (a.go:6) MOVQ "".x(SB), BX 0x002c 00044 (a.go:6) LEAQ 1(DX), BP 0x0030 00048 (a.go:6) CMPQ BP, CX 0x0033 00051 (a.go:6) JGT $0, 73 0x0035 00053 (a.go:6) LEAQ 1(DX), AX 0x0039 00057 (a.go:6) MOVQ AX, "".x+8(SB) 0x0040 00064 (a.go:6) MOVB $1, (BX)(DX*1) 0x0044 00068 (a.go:7) ADDQ $72, SP 0x0048 00072 (a.go:7) RET 0x0049 00073 (a.go:6) LEAQ type.[]uint8(SB), AX 0x0050 00080 (a.go:6) MOVQ AX, (SP) 0x0054 00084 (a.go:6) MOVQ BX, 8(SP) 0x0059 00089 (a.go:6) MOVQ DX, 16(SP) 0x005e 00094 (a.go:6) MOVQ CX, 24(SP) 0x0063 00099 (a.go:6) MOVQ BP, 32(SP) 0x0068 00104 (a.go:6) PCDATA $0, $0 0x0068 00104 (a.go:6) CALL runtime.growslice(SB) 0x006d 00109 (a.go:6) MOVQ 40(SP), CX 0x0072 00114 (a.go:6) MOVQ 48(SP), DX 0x0077 00119 (a.go:6) MOVQ DX, "".autotmp_0+64(SP) 0x007c 00124 (a.go:6) MOVQ 56(SP), BX 0x0081 00129 (a.go:6) MOVQ BX, "".x+16(SB) 0x0088 00136 (a.go:6) MOVL runtime.writeBarrier(SB), AX 0x008e 00142 (a.go:6) TESTB AL, AL 0x0090 00144 (a.go:6) JNE $0, 162 0x0092 00146 (a.go:6) MOVQ CX, "".x(SB) 0x0099 00153 (a.go:6) MOVQ "".x(SB), BX 0x00a0 00160 (a.go:6) JMP 53 0x00a2 00162 (a.go:6) LEAQ "".x(SB), BX 0x00a9 00169 (a.go:6) MOVQ BX, (SP) 0x00ad 00173 (a.go:6) MOVQ CX, 8(SP) 0x00b2 00178 (a.go:6) PCDATA $0, $0 0x00b2 00178 (a.go:6) CALL runtime.writebarrierptr(SB) 0x00b7 00183 (a.go:6) MOVQ "".autotmp_0+64(SP), DX 0x00bc 00188 (a.go:6) JMP 153 0x00be 00190 (a.go:6) NOP 0x00be 00190 (a.go:5) CALL runtime.morestack_noctxt(SB) 0x00c3 00195 (a.go:5) JMP 0 Fixes #14969 again Change-Id: Ia50463b1f506011aad0718a4fef1d4738e43c32d Reviewed-on: https://go-review.googlesource.com/22197 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-18 09:40:30 -07:00
} else {
s.vars[&lenVar] = l
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
}
b := s.endBlock()
b.Kind = ssa.BlockIf
b.Likely = ssa.BranchUnlikely
b.SetControl(cmp)
b.AddEdgeTo(grow)
b.AddEdgeTo(assign)
// Call growslice
s.startBlock(grow)
taddr := s.expr(n.Left)
r := s.rtcall(growslice, true, []*types.Type{pt, types.Types[TINT], types.Types[TINT]}, taddr, p, l, c, nl)
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
if inplace {
if sn.Op == ONAME && sn.Class() != PEXTERN {
cmd/compile: re-enable in-place append optimization CL 21891 was too clever in its attempts to avoid spills. Storing newlen too early caused uses of append in the runtime itself to receive an inconsistent view of a slice, leading to corruption. This CL makes the generate code much more similar to the old backend. It spills more than before, but those spills have been contained to the grow path. It recalculates newlen unnecessarily on the fast path, but that's measurably cheaper than spilling it. CL 21891 caused runtime failures in 6 of 2000 runs of net/http and crypto/x509 in my test setup. This CL has gone 6000 runs without a failure. Benchmarks going from master to this CL: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 439ns ± 2% 436ns ± 2% -0.72% (p=0.001 n=28+27) AppendInPlace/NoGrow/1Ptr-8 901ns ± 0% 856ns ± 0% -4.95% (p=0.000 n=26+29) AppendInPlace/NoGrow/2Ptr-8 2.15µs ± 1% 1.95µs ± 0% -9.07% (p=0.000 n=28+30) AppendInPlace/NoGrow/3Ptr-8 2.66µs ± 0% 2.45µs ± 0% -7.93% (p=0.000 n=29+26) AppendInPlace/NoGrow/4Ptr-8 3.24µs ± 1% 3.02µs ± 1% -6.75% (p=0.000 n=28+30) AppendInPlace/Grow/Byte-8 269ns ± 1% 271ns ± 1% +0.84% (p=0.000 n=30+29) AppendInPlace/Grow/1Ptr-8 275ns ± 1% 280ns ± 1% +1.75% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 384ns ± 0% 391ns ± 0% +1.94% (p=0.000 n=27+30) AppendInPlace/Grow/3Ptr-8 455ns ± 0% 462ns ± 0% +1.43% (p=0.000 n=29+29) AppendInPlace/Grow/4Ptr-8 478ns ± 0% 479ns ± 0% +0.23% (p=0.000 n=30+27) However, for the large no-grow cases, there is still more work to be done. Going from this CL to the non-SSA backend: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 436ns ± 2% 436ns ± 2% ~ (p=0.967 n=27+29) AppendInPlace/NoGrow/1Ptr-8 856ns ± 0% 884ns ± 0% +3.28% (p=0.000 n=29+26) AppendInPlace/NoGrow/2Ptr-8 1.95µs ± 0% 1.56µs ± 0% -20.28% (p=0.000 n=30+29) AppendInPlace/NoGrow/3Ptr-8 2.45µs ± 0% 1.89µs ± 0% -22.88% (p=0.000 n=26+28) AppendInPlace/NoGrow/4Ptr-8 3.02µs ± 1% 2.56µs ± 1% -15.35% (p=0.000 n=30+28) AppendInPlace/Grow/Byte-8 271ns ± 1% 283ns ± 1% +4.56% (p=0.000 n=29+29) AppendInPlace/Grow/1Ptr-8 280ns ± 1% 288ns ± 1% +2.99% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 391ns ± 0% 409ns ± 0% +4.66% (p=0.000 n=30+29) AppendInPlace/Grow/3Ptr-8 462ns ± 0% 481ns ± 0% +4.13% (p=0.000 n=29+30) AppendInPlace/Grow/4Ptr-8 479ns ± 0% 502ns ± 0% +4.81% (p=0.000 n=27+26) New generated code: var x []byte func a() { x = append(x, 1) } "".a t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (a.go:5) TEXT "".a(SB), $72-0 0x0000 00000 (a.go:5) MOVQ (TLS), CX 0x0009 00009 (a.go:5) CMPQ SP, 16(CX) 0x000d 00013 (a.go:5) JLS 190 0x0013 00019 (a.go:5) SUBQ $72, SP 0x0017 00023 (a.go:5) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:5) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:6) MOVQ "".x+16(SB), CX 0x001e 00030 (a.go:6) MOVQ "".x+8(SB), DX 0x0025 00037 (a.go:6) MOVQ "".x(SB), BX 0x002c 00044 (a.go:6) LEAQ 1(DX), BP 0x0030 00048 (a.go:6) CMPQ BP, CX 0x0033 00051 (a.go:6) JGT $0, 73 0x0035 00053 (a.go:6) LEAQ 1(DX), AX 0x0039 00057 (a.go:6) MOVQ AX, "".x+8(SB) 0x0040 00064 (a.go:6) MOVB $1, (BX)(DX*1) 0x0044 00068 (a.go:7) ADDQ $72, SP 0x0048 00072 (a.go:7) RET 0x0049 00073 (a.go:6) LEAQ type.[]uint8(SB), AX 0x0050 00080 (a.go:6) MOVQ AX, (SP) 0x0054 00084 (a.go:6) MOVQ BX, 8(SP) 0x0059 00089 (a.go:6) MOVQ DX, 16(SP) 0x005e 00094 (a.go:6) MOVQ CX, 24(SP) 0x0063 00099 (a.go:6) MOVQ BP, 32(SP) 0x0068 00104 (a.go:6) PCDATA $0, $0 0x0068 00104 (a.go:6) CALL runtime.growslice(SB) 0x006d 00109 (a.go:6) MOVQ 40(SP), CX 0x0072 00114 (a.go:6) MOVQ 48(SP), DX 0x0077 00119 (a.go:6) MOVQ DX, "".autotmp_0+64(SP) 0x007c 00124 (a.go:6) MOVQ 56(SP), BX 0x0081 00129 (a.go:6) MOVQ BX, "".x+16(SB) 0x0088 00136 (a.go:6) MOVL runtime.writeBarrier(SB), AX 0x008e 00142 (a.go:6) TESTB AL, AL 0x0090 00144 (a.go:6) JNE $0, 162 0x0092 00146 (a.go:6) MOVQ CX, "".x(SB) 0x0099 00153 (a.go:6) MOVQ "".x(SB), BX 0x00a0 00160 (a.go:6) JMP 53 0x00a2 00162 (a.go:6) LEAQ "".x(SB), BX 0x00a9 00169 (a.go:6) MOVQ BX, (SP) 0x00ad 00173 (a.go:6) MOVQ CX, 8(SP) 0x00b2 00178 (a.go:6) PCDATA $0, $0 0x00b2 00178 (a.go:6) CALL runtime.writebarrierptr(SB) 0x00b7 00183 (a.go:6) MOVQ "".autotmp_0+64(SP), DX 0x00bc 00188 (a.go:6) JMP 153 0x00be 00190 (a.go:6) NOP 0x00be 00190 (a.go:5) CALL runtime.morestack_noctxt(SB) 0x00c3 00195 (a.go:5) JMP 0 Fixes #14969 again Change-Id: Ia50463b1f506011aad0718a4fef1d4738e43c32d Reviewed-on: https://go-review.googlesource.com/22197 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-18 09:40:30 -07:00
// Tell liveness we're about to build a new slice
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
s.vars[&memVar] = s.newValue1A(ssa.OpVarDef, types.TypeMem, sn, s.mem())
cmd/compile: re-enable in-place append optimization CL 21891 was too clever in its attempts to avoid spills. Storing newlen too early caused uses of append in the runtime itself to receive an inconsistent view of a slice, leading to corruption. This CL makes the generate code much more similar to the old backend. It spills more than before, but those spills have been contained to the grow path. It recalculates newlen unnecessarily on the fast path, but that's measurably cheaper than spilling it. CL 21891 caused runtime failures in 6 of 2000 runs of net/http and crypto/x509 in my test setup. This CL has gone 6000 runs without a failure. Benchmarks going from master to this CL: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 439ns ± 2% 436ns ± 2% -0.72% (p=0.001 n=28+27) AppendInPlace/NoGrow/1Ptr-8 901ns ± 0% 856ns ± 0% -4.95% (p=0.000 n=26+29) AppendInPlace/NoGrow/2Ptr-8 2.15µs ± 1% 1.95µs ± 0% -9.07% (p=0.000 n=28+30) AppendInPlace/NoGrow/3Ptr-8 2.66µs ± 0% 2.45µs ± 0% -7.93% (p=0.000 n=29+26) AppendInPlace/NoGrow/4Ptr-8 3.24µs ± 1% 3.02µs ± 1% -6.75% (p=0.000 n=28+30) AppendInPlace/Grow/Byte-8 269ns ± 1% 271ns ± 1% +0.84% (p=0.000 n=30+29) AppendInPlace/Grow/1Ptr-8 275ns ± 1% 280ns ± 1% +1.75% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 384ns ± 0% 391ns ± 0% +1.94% (p=0.000 n=27+30) AppendInPlace/Grow/3Ptr-8 455ns ± 0% 462ns ± 0% +1.43% (p=0.000 n=29+29) AppendInPlace/Grow/4Ptr-8 478ns ± 0% 479ns ± 0% +0.23% (p=0.000 n=30+27) However, for the large no-grow cases, there is still more work to be done. Going from this CL to the non-SSA backend: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 436ns ± 2% 436ns ± 2% ~ (p=0.967 n=27+29) AppendInPlace/NoGrow/1Ptr-8 856ns ± 0% 884ns ± 0% +3.28% (p=0.000 n=29+26) AppendInPlace/NoGrow/2Ptr-8 1.95µs ± 0% 1.56µs ± 0% -20.28% (p=0.000 n=30+29) AppendInPlace/NoGrow/3Ptr-8 2.45µs ± 0% 1.89µs ± 0% -22.88% (p=0.000 n=26+28) AppendInPlace/NoGrow/4Ptr-8 3.02µs ± 1% 2.56µs ± 1% -15.35% (p=0.000 n=30+28) AppendInPlace/Grow/Byte-8 271ns ± 1% 283ns ± 1% +4.56% (p=0.000 n=29+29) AppendInPlace/Grow/1Ptr-8 280ns ± 1% 288ns ± 1% +2.99% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 391ns ± 0% 409ns ± 0% +4.66% (p=0.000 n=30+29) AppendInPlace/Grow/3Ptr-8 462ns ± 0% 481ns ± 0% +4.13% (p=0.000 n=29+30) AppendInPlace/Grow/4Ptr-8 479ns ± 0% 502ns ± 0% +4.81% (p=0.000 n=27+26) New generated code: var x []byte func a() { x = append(x, 1) } "".a t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (a.go:5) TEXT "".a(SB), $72-0 0x0000 00000 (a.go:5) MOVQ (TLS), CX 0x0009 00009 (a.go:5) CMPQ SP, 16(CX) 0x000d 00013 (a.go:5) JLS 190 0x0013 00019 (a.go:5) SUBQ $72, SP 0x0017 00023 (a.go:5) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:5) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:6) MOVQ "".x+16(SB), CX 0x001e 00030 (a.go:6) MOVQ "".x+8(SB), DX 0x0025 00037 (a.go:6) MOVQ "".x(SB), BX 0x002c 00044 (a.go:6) LEAQ 1(DX), BP 0x0030 00048 (a.go:6) CMPQ BP, CX 0x0033 00051 (a.go:6) JGT $0, 73 0x0035 00053 (a.go:6) LEAQ 1(DX), AX 0x0039 00057 (a.go:6) MOVQ AX, "".x+8(SB) 0x0040 00064 (a.go:6) MOVB $1, (BX)(DX*1) 0x0044 00068 (a.go:7) ADDQ $72, SP 0x0048 00072 (a.go:7) RET 0x0049 00073 (a.go:6) LEAQ type.[]uint8(SB), AX 0x0050 00080 (a.go:6) MOVQ AX, (SP) 0x0054 00084 (a.go:6) MOVQ BX, 8(SP) 0x0059 00089 (a.go:6) MOVQ DX, 16(SP) 0x005e 00094 (a.go:6) MOVQ CX, 24(SP) 0x0063 00099 (a.go:6) MOVQ BP, 32(SP) 0x0068 00104 (a.go:6) PCDATA $0, $0 0x0068 00104 (a.go:6) CALL runtime.growslice(SB) 0x006d 00109 (a.go:6) MOVQ 40(SP), CX 0x0072 00114 (a.go:6) MOVQ 48(SP), DX 0x0077 00119 (a.go:6) MOVQ DX, "".autotmp_0+64(SP) 0x007c 00124 (a.go:6) MOVQ 56(SP), BX 0x0081 00129 (a.go:6) MOVQ BX, "".x+16(SB) 0x0088 00136 (a.go:6) MOVL runtime.writeBarrier(SB), AX 0x008e 00142 (a.go:6) TESTB AL, AL 0x0090 00144 (a.go:6) JNE $0, 162 0x0092 00146 (a.go:6) MOVQ CX, "".x(SB) 0x0099 00153 (a.go:6) MOVQ "".x(SB), BX 0x00a0 00160 (a.go:6) JMP 53 0x00a2 00162 (a.go:6) LEAQ "".x(SB), BX 0x00a9 00169 (a.go:6) MOVQ BX, (SP) 0x00ad 00173 (a.go:6) MOVQ CX, 8(SP) 0x00b2 00178 (a.go:6) PCDATA $0, $0 0x00b2 00178 (a.go:6) CALL runtime.writebarrierptr(SB) 0x00b7 00183 (a.go:6) MOVQ "".autotmp_0+64(SP), DX 0x00bc 00188 (a.go:6) JMP 153 0x00be 00190 (a.go:6) NOP 0x00be 00190 (a.go:5) CALL runtime.morestack_noctxt(SB) 0x00c3 00195 (a.go:5) JMP 0 Fixes #14969 again Change-Id: Ia50463b1f506011aad0718a4fef1d4738e43c32d Reviewed-on: https://go-review.googlesource.com/22197 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-18 09:40:30 -07:00
}
capaddr := s.newValue1I(ssa.OpOffPtr, s.f.Config.Types.IntPtr, int64(array_cap), addr)
s.store(types.Types[TINT], capaddr, r[2])
s.store(pt, addr, r[0])
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
// load the value we just stored to avoid having to spill it
s.vars[&ptrVar] = s.load(pt, addr)
cmd/compile: re-enable in-place append optimization CL 21891 was too clever in its attempts to avoid spills. Storing newlen too early caused uses of append in the runtime itself to receive an inconsistent view of a slice, leading to corruption. This CL makes the generate code much more similar to the old backend. It spills more than before, but those spills have been contained to the grow path. It recalculates newlen unnecessarily on the fast path, but that's measurably cheaper than spilling it. CL 21891 caused runtime failures in 6 of 2000 runs of net/http and crypto/x509 in my test setup. This CL has gone 6000 runs without a failure. Benchmarks going from master to this CL: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 439ns ± 2% 436ns ± 2% -0.72% (p=0.001 n=28+27) AppendInPlace/NoGrow/1Ptr-8 901ns ± 0% 856ns ± 0% -4.95% (p=0.000 n=26+29) AppendInPlace/NoGrow/2Ptr-8 2.15µs ± 1% 1.95µs ± 0% -9.07% (p=0.000 n=28+30) AppendInPlace/NoGrow/3Ptr-8 2.66µs ± 0% 2.45µs ± 0% -7.93% (p=0.000 n=29+26) AppendInPlace/NoGrow/4Ptr-8 3.24µs ± 1% 3.02µs ± 1% -6.75% (p=0.000 n=28+30) AppendInPlace/Grow/Byte-8 269ns ± 1% 271ns ± 1% +0.84% (p=0.000 n=30+29) AppendInPlace/Grow/1Ptr-8 275ns ± 1% 280ns ± 1% +1.75% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 384ns ± 0% 391ns ± 0% +1.94% (p=0.000 n=27+30) AppendInPlace/Grow/3Ptr-8 455ns ± 0% 462ns ± 0% +1.43% (p=0.000 n=29+29) AppendInPlace/Grow/4Ptr-8 478ns ± 0% 479ns ± 0% +0.23% (p=0.000 n=30+27) However, for the large no-grow cases, there is still more work to be done. Going from this CL to the non-SSA backend: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 436ns ± 2% 436ns ± 2% ~ (p=0.967 n=27+29) AppendInPlace/NoGrow/1Ptr-8 856ns ± 0% 884ns ± 0% +3.28% (p=0.000 n=29+26) AppendInPlace/NoGrow/2Ptr-8 1.95µs ± 0% 1.56µs ± 0% -20.28% (p=0.000 n=30+29) AppendInPlace/NoGrow/3Ptr-8 2.45µs ± 0% 1.89µs ± 0% -22.88% (p=0.000 n=26+28) AppendInPlace/NoGrow/4Ptr-8 3.02µs ± 1% 2.56µs ± 1% -15.35% (p=0.000 n=30+28) AppendInPlace/Grow/Byte-8 271ns ± 1% 283ns ± 1% +4.56% (p=0.000 n=29+29) AppendInPlace/Grow/1Ptr-8 280ns ± 1% 288ns ± 1% +2.99% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 391ns ± 0% 409ns ± 0% +4.66% (p=0.000 n=30+29) AppendInPlace/Grow/3Ptr-8 462ns ± 0% 481ns ± 0% +4.13% (p=0.000 n=29+30) AppendInPlace/Grow/4Ptr-8 479ns ± 0% 502ns ± 0% +4.81% (p=0.000 n=27+26) New generated code: var x []byte func a() { x = append(x, 1) } "".a t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (a.go:5) TEXT "".a(SB), $72-0 0x0000 00000 (a.go:5) MOVQ (TLS), CX 0x0009 00009 (a.go:5) CMPQ SP, 16(CX) 0x000d 00013 (a.go:5) JLS 190 0x0013 00019 (a.go:5) SUBQ $72, SP 0x0017 00023 (a.go:5) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:5) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:6) MOVQ "".x+16(SB), CX 0x001e 00030 (a.go:6) MOVQ "".x+8(SB), DX 0x0025 00037 (a.go:6) MOVQ "".x(SB), BX 0x002c 00044 (a.go:6) LEAQ 1(DX), BP 0x0030 00048 (a.go:6) CMPQ BP, CX 0x0033 00051 (a.go:6) JGT $0, 73 0x0035 00053 (a.go:6) LEAQ 1(DX), AX 0x0039 00057 (a.go:6) MOVQ AX, "".x+8(SB) 0x0040 00064 (a.go:6) MOVB $1, (BX)(DX*1) 0x0044 00068 (a.go:7) ADDQ $72, SP 0x0048 00072 (a.go:7) RET 0x0049 00073 (a.go:6) LEAQ type.[]uint8(SB), AX 0x0050 00080 (a.go:6) MOVQ AX, (SP) 0x0054 00084 (a.go:6) MOVQ BX, 8(SP) 0x0059 00089 (a.go:6) MOVQ DX, 16(SP) 0x005e 00094 (a.go:6) MOVQ CX, 24(SP) 0x0063 00099 (a.go:6) MOVQ BP, 32(SP) 0x0068 00104 (a.go:6) PCDATA $0, $0 0x0068 00104 (a.go:6) CALL runtime.growslice(SB) 0x006d 00109 (a.go:6) MOVQ 40(SP), CX 0x0072 00114 (a.go:6) MOVQ 48(SP), DX 0x0077 00119 (a.go:6) MOVQ DX, "".autotmp_0+64(SP) 0x007c 00124 (a.go:6) MOVQ 56(SP), BX 0x0081 00129 (a.go:6) MOVQ BX, "".x+16(SB) 0x0088 00136 (a.go:6) MOVL runtime.writeBarrier(SB), AX 0x008e 00142 (a.go:6) TESTB AL, AL 0x0090 00144 (a.go:6) JNE $0, 162 0x0092 00146 (a.go:6) MOVQ CX, "".x(SB) 0x0099 00153 (a.go:6) MOVQ "".x(SB), BX 0x00a0 00160 (a.go:6) JMP 53 0x00a2 00162 (a.go:6) LEAQ "".x(SB), BX 0x00a9 00169 (a.go:6) MOVQ BX, (SP) 0x00ad 00173 (a.go:6) MOVQ CX, 8(SP) 0x00b2 00178 (a.go:6) PCDATA $0, $0 0x00b2 00178 (a.go:6) CALL runtime.writebarrierptr(SB) 0x00b7 00183 (a.go:6) MOVQ "".autotmp_0+64(SP), DX 0x00bc 00188 (a.go:6) JMP 153 0x00be 00190 (a.go:6) NOP 0x00be 00190 (a.go:5) CALL runtime.morestack_noctxt(SB) 0x00c3 00195 (a.go:5) JMP 0 Fixes #14969 again Change-Id: Ia50463b1f506011aad0718a4fef1d4738e43c32d Reviewed-on: https://go-review.googlesource.com/22197 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-18 09:40:30 -07:00
s.vars[&lenVar] = r[1] // avoid a spill in the fast path
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
} else {
s.vars[&ptrVar] = r[0]
s.vars[&newlenVar] = s.newValue2(s.ssaOp(OADD, types.Types[TINT]), types.Types[TINT], r[1], s.constInt(types.Types[TINT], nargs))
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
s.vars[&capVar] = r[2]
}
b = s.endBlock()
b.AddEdgeTo(assign)
// assign new elements to slots
s.startBlock(assign)
cmd/compile: re-enable in-place append optimization CL 21891 was too clever in its attempts to avoid spills. Storing newlen too early caused uses of append in the runtime itself to receive an inconsistent view of a slice, leading to corruption. This CL makes the generate code much more similar to the old backend. It spills more than before, but those spills have been contained to the grow path. It recalculates newlen unnecessarily on the fast path, but that's measurably cheaper than spilling it. CL 21891 caused runtime failures in 6 of 2000 runs of net/http and crypto/x509 in my test setup. This CL has gone 6000 runs without a failure. Benchmarks going from master to this CL: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 439ns ± 2% 436ns ± 2% -0.72% (p=0.001 n=28+27) AppendInPlace/NoGrow/1Ptr-8 901ns ± 0% 856ns ± 0% -4.95% (p=0.000 n=26+29) AppendInPlace/NoGrow/2Ptr-8 2.15µs ± 1% 1.95µs ± 0% -9.07% (p=0.000 n=28+30) AppendInPlace/NoGrow/3Ptr-8 2.66µs ± 0% 2.45µs ± 0% -7.93% (p=0.000 n=29+26) AppendInPlace/NoGrow/4Ptr-8 3.24µs ± 1% 3.02µs ± 1% -6.75% (p=0.000 n=28+30) AppendInPlace/Grow/Byte-8 269ns ± 1% 271ns ± 1% +0.84% (p=0.000 n=30+29) AppendInPlace/Grow/1Ptr-8 275ns ± 1% 280ns ± 1% +1.75% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 384ns ± 0% 391ns ± 0% +1.94% (p=0.000 n=27+30) AppendInPlace/Grow/3Ptr-8 455ns ± 0% 462ns ± 0% +1.43% (p=0.000 n=29+29) AppendInPlace/Grow/4Ptr-8 478ns ± 0% 479ns ± 0% +0.23% (p=0.000 n=30+27) However, for the large no-grow cases, there is still more work to be done. Going from this CL to the non-SSA backend: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 436ns ± 2% 436ns ± 2% ~ (p=0.967 n=27+29) AppendInPlace/NoGrow/1Ptr-8 856ns ± 0% 884ns ± 0% +3.28% (p=0.000 n=29+26) AppendInPlace/NoGrow/2Ptr-8 1.95µs ± 0% 1.56µs ± 0% -20.28% (p=0.000 n=30+29) AppendInPlace/NoGrow/3Ptr-8 2.45µs ± 0% 1.89µs ± 0% -22.88% (p=0.000 n=26+28) AppendInPlace/NoGrow/4Ptr-8 3.02µs ± 1% 2.56µs ± 1% -15.35% (p=0.000 n=30+28) AppendInPlace/Grow/Byte-8 271ns ± 1% 283ns ± 1% +4.56% (p=0.000 n=29+29) AppendInPlace/Grow/1Ptr-8 280ns ± 1% 288ns ± 1% +2.99% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 391ns ± 0% 409ns ± 0% +4.66% (p=0.000 n=30+29) AppendInPlace/Grow/3Ptr-8 462ns ± 0% 481ns ± 0% +4.13% (p=0.000 n=29+30) AppendInPlace/Grow/4Ptr-8 479ns ± 0% 502ns ± 0% +4.81% (p=0.000 n=27+26) New generated code: var x []byte func a() { x = append(x, 1) } "".a t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (a.go:5) TEXT "".a(SB), $72-0 0x0000 00000 (a.go:5) MOVQ (TLS), CX 0x0009 00009 (a.go:5) CMPQ SP, 16(CX) 0x000d 00013 (a.go:5) JLS 190 0x0013 00019 (a.go:5) SUBQ $72, SP 0x0017 00023 (a.go:5) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:5) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:6) MOVQ "".x+16(SB), CX 0x001e 00030 (a.go:6) MOVQ "".x+8(SB), DX 0x0025 00037 (a.go:6) MOVQ "".x(SB), BX 0x002c 00044 (a.go:6) LEAQ 1(DX), BP 0x0030 00048 (a.go:6) CMPQ BP, CX 0x0033 00051 (a.go:6) JGT $0, 73 0x0035 00053 (a.go:6) LEAQ 1(DX), AX 0x0039 00057 (a.go:6) MOVQ AX, "".x+8(SB) 0x0040 00064 (a.go:6) MOVB $1, (BX)(DX*1) 0x0044 00068 (a.go:7) ADDQ $72, SP 0x0048 00072 (a.go:7) RET 0x0049 00073 (a.go:6) LEAQ type.[]uint8(SB), AX 0x0050 00080 (a.go:6) MOVQ AX, (SP) 0x0054 00084 (a.go:6) MOVQ BX, 8(SP) 0x0059 00089 (a.go:6) MOVQ DX, 16(SP) 0x005e 00094 (a.go:6) MOVQ CX, 24(SP) 0x0063 00099 (a.go:6) MOVQ BP, 32(SP) 0x0068 00104 (a.go:6) PCDATA $0, $0 0x0068 00104 (a.go:6) CALL runtime.growslice(SB) 0x006d 00109 (a.go:6) MOVQ 40(SP), CX 0x0072 00114 (a.go:6) MOVQ 48(SP), DX 0x0077 00119 (a.go:6) MOVQ DX, "".autotmp_0+64(SP) 0x007c 00124 (a.go:6) MOVQ 56(SP), BX 0x0081 00129 (a.go:6) MOVQ BX, "".x+16(SB) 0x0088 00136 (a.go:6) MOVL runtime.writeBarrier(SB), AX 0x008e 00142 (a.go:6) TESTB AL, AL 0x0090 00144 (a.go:6) JNE $0, 162 0x0092 00146 (a.go:6) MOVQ CX, "".x(SB) 0x0099 00153 (a.go:6) MOVQ "".x(SB), BX 0x00a0 00160 (a.go:6) JMP 53 0x00a2 00162 (a.go:6) LEAQ "".x(SB), BX 0x00a9 00169 (a.go:6) MOVQ BX, (SP) 0x00ad 00173 (a.go:6) MOVQ CX, 8(SP) 0x00b2 00178 (a.go:6) PCDATA $0, $0 0x00b2 00178 (a.go:6) CALL runtime.writebarrierptr(SB) 0x00b7 00183 (a.go:6) MOVQ "".autotmp_0+64(SP), DX 0x00bc 00188 (a.go:6) JMP 153 0x00be 00190 (a.go:6) NOP 0x00be 00190 (a.go:5) CALL runtime.morestack_noctxt(SB) 0x00c3 00195 (a.go:5) JMP 0 Fixes #14969 again Change-Id: Ia50463b1f506011aad0718a4fef1d4738e43c32d Reviewed-on: https://go-review.googlesource.com/22197 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-18 09:40:30 -07:00
if inplace {
l = s.variable(&lenVar, types.Types[TINT]) // generates phi for len
nl = s.newValue2(s.ssaOp(OADD, types.Types[TINT]), types.Types[TINT], l, s.constInt(types.Types[TINT], nargs))
lenaddr := s.newValue1I(ssa.OpOffPtr, s.f.Config.Types.IntPtr, int64(array_nel), addr)
s.store(types.Types[TINT], lenaddr, nl)
cmd/compile: re-enable in-place append optimization CL 21891 was too clever in its attempts to avoid spills. Storing newlen too early caused uses of append in the runtime itself to receive an inconsistent view of a slice, leading to corruption. This CL makes the generate code much more similar to the old backend. It spills more than before, but those spills have been contained to the grow path. It recalculates newlen unnecessarily on the fast path, but that's measurably cheaper than spilling it. CL 21891 caused runtime failures in 6 of 2000 runs of net/http and crypto/x509 in my test setup. This CL has gone 6000 runs without a failure. Benchmarks going from master to this CL: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 439ns ± 2% 436ns ± 2% -0.72% (p=0.001 n=28+27) AppendInPlace/NoGrow/1Ptr-8 901ns ± 0% 856ns ± 0% -4.95% (p=0.000 n=26+29) AppendInPlace/NoGrow/2Ptr-8 2.15µs ± 1% 1.95µs ± 0% -9.07% (p=0.000 n=28+30) AppendInPlace/NoGrow/3Ptr-8 2.66µs ± 0% 2.45µs ± 0% -7.93% (p=0.000 n=29+26) AppendInPlace/NoGrow/4Ptr-8 3.24µs ± 1% 3.02µs ± 1% -6.75% (p=0.000 n=28+30) AppendInPlace/Grow/Byte-8 269ns ± 1% 271ns ± 1% +0.84% (p=0.000 n=30+29) AppendInPlace/Grow/1Ptr-8 275ns ± 1% 280ns ± 1% +1.75% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 384ns ± 0% 391ns ± 0% +1.94% (p=0.000 n=27+30) AppendInPlace/Grow/3Ptr-8 455ns ± 0% 462ns ± 0% +1.43% (p=0.000 n=29+29) AppendInPlace/Grow/4Ptr-8 478ns ± 0% 479ns ± 0% +0.23% (p=0.000 n=30+27) However, for the large no-grow cases, there is still more work to be done. Going from this CL to the non-SSA backend: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 436ns ± 2% 436ns ± 2% ~ (p=0.967 n=27+29) AppendInPlace/NoGrow/1Ptr-8 856ns ± 0% 884ns ± 0% +3.28% (p=0.000 n=29+26) AppendInPlace/NoGrow/2Ptr-8 1.95µs ± 0% 1.56µs ± 0% -20.28% (p=0.000 n=30+29) AppendInPlace/NoGrow/3Ptr-8 2.45µs ± 0% 1.89µs ± 0% -22.88% (p=0.000 n=26+28) AppendInPlace/NoGrow/4Ptr-8 3.02µs ± 1% 2.56µs ± 1% -15.35% (p=0.000 n=30+28) AppendInPlace/Grow/Byte-8 271ns ± 1% 283ns ± 1% +4.56% (p=0.000 n=29+29) AppendInPlace/Grow/1Ptr-8 280ns ± 1% 288ns ± 1% +2.99% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 391ns ± 0% 409ns ± 0% +4.66% (p=0.000 n=30+29) AppendInPlace/Grow/3Ptr-8 462ns ± 0% 481ns ± 0% +4.13% (p=0.000 n=29+30) AppendInPlace/Grow/4Ptr-8 479ns ± 0% 502ns ± 0% +4.81% (p=0.000 n=27+26) New generated code: var x []byte func a() { x = append(x, 1) } "".a t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (a.go:5) TEXT "".a(SB), $72-0 0x0000 00000 (a.go:5) MOVQ (TLS), CX 0x0009 00009 (a.go:5) CMPQ SP, 16(CX) 0x000d 00013 (a.go:5) JLS 190 0x0013 00019 (a.go:5) SUBQ $72, SP 0x0017 00023 (a.go:5) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:5) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:6) MOVQ "".x+16(SB), CX 0x001e 00030 (a.go:6) MOVQ "".x+8(SB), DX 0x0025 00037 (a.go:6) MOVQ "".x(SB), BX 0x002c 00044 (a.go:6) LEAQ 1(DX), BP 0x0030 00048 (a.go:6) CMPQ BP, CX 0x0033 00051 (a.go:6) JGT $0, 73 0x0035 00053 (a.go:6) LEAQ 1(DX), AX 0x0039 00057 (a.go:6) MOVQ AX, "".x+8(SB) 0x0040 00064 (a.go:6) MOVB $1, (BX)(DX*1) 0x0044 00068 (a.go:7) ADDQ $72, SP 0x0048 00072 (a.go:7) RET 0x0049 00073 (a.go:6) LEAQ type.[]uint8(SB), AX 0x0050 00080 (a.go:6) MOVQ AX, (SP) 0x0054 00084 (a.go:6) MOVQ BX, 8(SP) 0x0059 00089 (a.go:6) MOVQ DX, 16(SP) 0x005e 00094 (a.go:6) MOVQ CX, 24(SP) 0x0063 00099 (a.go:6) MOVQ BP, 32(SP) 0x0068 00104 (a.go:6) PCDATA $0, $0 0x0068 00104 (a.go:6) CALL runtime.growslice(SB) 0x006d 00109 (a.go:6) MOVQ 40(SP), CX 0x0072 00114 (a.go:6) MOVQ 48(SP), DX 0x0077 00119 (a.go:6) MOVQ DX, "".autotmp_0+64(SP) 0x007c 00124 (a.go:6) MOVQ 56(SP), BX 0x0081 00129 (a.go:6) MOVQ BX, "".x+16(SB) 0x0088 00136 (a.go:6) MOVL runtime.writeBarrier(SB), AX 0x008e 00142 (a.go:6) TESTB AL, AL 0x0090 00144 (a.go:6) JNE $0, 162 0x0092 00146 (a.go:6) MOVQ CX, "".x(SB) 0x0099 00153 (a.go:6) MOVQ "".x(SB), BX 0x00a0 00160 (a.go:6) JMP 53 0x00a2 00162 (a.go:6) LEAQ "".x(SB), BX 0x00a9 00169 (a.go:6) MOVQ BX, (SP) 0x00ad 00173 (a.go:6) MOVQ CX, 8(SP) 0x00b2 00178 (a.go:6) PCDATA $0, $0 0x00b2 00178 (a.go:6) CALL runtime.writebarrierptr(SB) 0x00b7 00183 (a.go:6) MOVQ "".autotmp_0+64(SP), DX 0x00bc 00188 (a.go:6) JMP 153 0x00be 00190 (a.go:6) NOP 0x00be 00190 (a.go:5) CALL runtime.morestack_noctxt(SB) 0x00c3 00195 (a.go:5) JMP 0 Fixes #14969 again Change-Id: Ia50463b1f506011aad0718a4fef1d4738e43c32d Reviewed-on: https://go-review.googlesource.com/22197 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-18 09:40:30 -07:00
}
// Evaluate args
type argRec struct {
// if store is true, we're appending the value v. If false, we're appending the
// value at *v.
v *ssa.Value
store bool
}
args := make([]argRec, 0, nargs)
for _, n := range n.List.Slice()[1:] {
if canSSAType(n.Type) {
args = append(args, argRec{v: s.expr(n), store: true})
} else {
v := s.addr(n, false)
args = append(args, argRec{v: v})
}
}
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
p = s.variable(&ptrVar, pt) // generates phi for ptr
if !inplace {
nl = s.variable(&newlenVar, types.Types[TINT]) // generates phi for nl
c = s.variable(&capVar, types.Types[TINT]) // generates phi for cap
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
}
p2 := s.newValue2(ssa.OpPtrIndex, pt, p, l)
for i, arg := range args {
addr := s.newValue2(ssa.OpPtrIndex, pt, p2, s.constInt(types.Types[TINT], int64(i)))
if arg.store {
s.storeType(et, addr, arg.v, 0, true)
} else {
s.move(et, addr, arg.v)
}
}
delete(s.vars, &ptrVar)
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
if inplace {
cmd/compile: re-enable in-place append optimization CL 21891 was too clever in its attempts to avoid spills. Storing newlen too early caused uses of append in the runtime itself to receive an inconsistent view of a slice, leading to corruption. This CL makes the generate code much more similar to the old backend. It spills more than before, but those spills have been contained to the grow path. It recalculates newlen unnecessarily on the fast path, but that's measurably cheaper than spilling it. CL 21891 caused runtime failures in 6 of 2000 runs of net/http and crypto/x509 in my test setup. This CL has gone 6000 runs without a failure. Benchmarks going from master to this CL: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 439ns ± 2% 436ns ± 2% -0.72% (p=0.001 n=28+27) AppendInPlace/NoGrow/1Ptr-8 901ns ± 0% 856ns ± 0% -4.95% (p=0.000 n=26+29) AppendInPlace/NoGrow/2Ptr-8 2.15µs ± 1% 1.95µs ± 0% -9.07% (p=0.000 n=28+30) AppendInPlace/NoGrow/3Ptr-8 2.66µs ± 0% 2.45µs ± 0% -7.93% (p=0.000 n=29+26) AppendInPlace/NoGrow/4Ptr-8 3.24µs ± 1% 3.02µs ± 1% -6.75% (p=0.000 n=28+30) AppendInPlace/Grow/Byte-8 269ns ± 1% 271ns ± 1% +0.84% (p=0.000 n=30+29) AppendInPlace/Grow/1Ptr-8 275ns ± 1% 280ns ± 1% +1.75% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 384ns ± 0% 391ns ± 0% +1.94% (p=0.000 n=27+30) AppendInPlace/Grow/3Ptr-8 455ns ± 0% 462ns ± 0% +1.43% (p=0.000 n=29+29) AppendInPlace/Grow/4Ptr-8 478ns ± 0% 479ns ± 0% +0.23% (p=0.000 n=30+27) However, for the large no-grow cases, there is still more work to be done. Going from this CL to the non-SSA backend: name old time/op new time/op delta AppendInPlace/NoGrow/Byte-8 436ns ± 2% 436ns ± 2% ~ (p=0.967 n=27+29) AppendInPlace/NoGrow/1Ptr-8 856ns ± 0% 884ns ± 0% +3.28% (p=0.000 n=29+26) AppendInPlace/NoGrow/2Ptr-8 1.95µs ± 0% 1.56µs ± 0% -20.28% (p=0.000 n=30+29) AppendInPlace/NoGrow/3Ptr-8 2.45µs ± 0% 1.89µs ± 0% -22.88% (p=0.000 n=26+28) AppendInPlace/NoGrow/4Ptr-8 3.02µs ± 1% 2.56µs ± 1% -15.35% (p=0.000 n=30+28) AppendInPlace/Grow/Byte-8 271ns ± 1% 283ns ± 1% +4.56% (p=0.000 n=29+29) AppendInPlace/Grow/1Ptr-8 280ns ± 1% 288ns ± 1% +2.99% (p=0.000 n=30+30) AppendInPlace/Grow/2Ptr-8 391ns ± 0% 409ns ± 0% +4.66% (p=0.000 n=30+29) AppendInPlace/Grow/3Ptr-8 462ns ± 0% 481ns ± 0% +4.13% (p=0.000 n=29+30) AppendInPlace/Grow/4Ptr-8 479ns ± 0% 502ns ± 0% +4.81% (p=0.000 n=27+26) New generated code: var x []byte func a() { x = append(x, 1) } "".a t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (a.go:5) TEXT "".a(SB), $72-0 0x0000 00000 (a.go:5) MOVQ (TLS), CX 0x0009 00009 (a.go:5) CMPQ SP, 16(CX) 0x000d 00013 (a.go:5) JLS 190 0x0013 00019 (a.go:5) SUBQ $72, SP 0x0017 00023 (a.go:5) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:5) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (a.go:6) MOVQ "".x+16(SB), CX 0x001e 00030 (a.go:6) MOVQ "".x+8(SB), DX 0x0025 00037 (a.go:6) MOVQ "".x(SB), BX 0x002c 00044 (a.go:6) LEAQ 1(DX), BP 0x0030 00048 (a.go:6) CMPQ BP, CX 0x0033 00051 (a.go:6) JGT $0, 73 0x0035 00053 (a.go:6) LEAQ 1(DX), AX 0x0039 00057 (a.go:6) MOVQ AX, "".x+8(SB) 0x0040 00064 (a.go:6) MOVB $1, (BX)(DX*1) 0x0044 00068 (a.go:7) ADDQ $72, SP 0x0048 00072 (a.go:7) RET 0x0049 00073 (a.go:6) LEAQ type.[]uint8(SB), AX 0x0050 00080 (a.go:6) MOVQ AX, (SP) 0x0054 00084 (a.go:6) MOVQ BX, 8(SP) 0x0059 00089 (a.go:6) MOVQ DX, 16(SP) 0x005e 00094 (a.go:6) MOVQ CX, 24(SP) 0x0063 00099 (a.go:6) MOVQ BP, 32(SP) 0x0068 00104 (a.go:6) PCDATA $0, $0 0x0068 00104 (a.go:6) CALL runtime.growslice(SB) 0x006d 00109 (a.go:6) MOVQ 40(SP), CX 0x0072 00114 (a.go:6) MOVQ 48(SP), DX 0x0077 00119 (a.go:6) MOVQ DX, "".autotmp_0+64(SP) 0x007c 00124 (a.go:6) MOVQ 56(SP), BX 0x0081 00129 (a.go:6) MOVQ BX, "".x+16(SB) 0x0088 00136 (a.go:6) MOVL runtime.writeBarrier(SB), AX 0x008e 00142 (a.go:6) TESTB AL, AL 0x0090 00144 (a.go:6) JNE $0, 162 0x0092 00146 (a.go:6) MOVQ CX, "".x(SB) 0x0099 00153 (a.go:6) MOVQ "".x(SB), BX 0x00a0 00160 (a.go:6) JMP 53 0x00a2 00162 (a.go:6) LEAQ "".x(SB), BX 0x00a9 00169 (a.go:6) MOVQ BX, (SP) 0x00ad 00173 (a.go:6) MOVQ CX, 8(SP) 0x00b2 00178 (a.go:6) PCDATA $0, $0 0x00b2 00178 (a.go:6) CALL runtime.writebarrierptr(SB) 0x00b7 00183 (a.go:6) MOVQ "".autotmp_0+64(SP), DX 0x00bc 00188 (a.go:6) JMP 153 0x00be 00190 (a.go:6) NOP 0x00be 00190 (a.go:5) CALL runtime.morestack_noctxt(SB) 0x00c3 00195 (a.go:5) JMP 0 Fixes #14969 again Change-Id: Ia50463b1f506011aad0718a4fef1d4738e43c32d Reviewed-on: https://go-review.googlesource.com/22197 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-18 09:40:30 -07:00
delete(s.vars, &lenVar)
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
return nil
}
cmd/compile: avoid a spill in append fast path Instead of spilling newlen, recalculate it. This removes a spill from the fast path, at the cost of a cheap recalculation on the (rare) growth path. This uses 8 bytes less of stack space. It generates two more bytes of code, but that is due to suboptimal register allocation; see far below. Runtime append microbenchmarks are all over the map, presumably due to incidental code movement. Sample code: func s(b []byte) []byte { b = append(b, 1, 2, 3) return b } Before: "".s t=1 size=160 args=0x30 locals=0x48 0x0000 00000 (append.go:8) TEXT "".s(SB), $72-48 0x0000 00000 (append.go:8) MOVQ (TLS), CX 0x0009 00009 (append.go:8) CMPQ SP, 16(CX) 0x000d 00013 (append.go:8) JLS 149 0x0013 00019 (append.go:8) SUBQ $72, SP 0x0017 00023 (append.go:8) FUNCDATA $0, gclocals·6432f8c6a0d23fa7bee6c5d96f21a92a(SB) 0x0017 00023 (append.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:9) MOVQ "".b+88(FP), CX 0x001c 00028 (append.go:9) LEAQ 3(CX), DX 0x0020 00032 (append.go:9) MOVQ DX, "".autotmp_0+64(SP) 0x0025 00037 (append.go:9) MOVQ "".b+96(FP), BX 0x002a 00042 (append.go:9) CMPQ DX, BX 0x002d 00045 (append.go:9) JGT $0, 86 0x002f 00047 (append.go:8) MOVQ "".b+80(FP), AX 0x0034 00052 (append.go:9) MOVB $1, (AX)(CX*1) 0x0038 00056 (append.go:9) MOVB $2, 1(AX)(CX*1) 0x003d 00061 (append.go:9) MOVB $3, 2(AX)(CX*1) 0x0042 00066 (append.go:10) MOVQ AX, "".~r1+104(FP) 0x0047 00071 (append.go:10) MOVQ DX, "".~r1+112(FP) 0x004c 00076 (append.go:10) MOVQ BX, "".~r1+120(FP) 0x0051 00081 (append.go:10) ADDQ $72, SP 0x0055 00085 (append.go:10) RET 0x0056 00086 (append.go:9) LEAQ type.[]uint8(SB), AX 0x005d 00093 (append.go:9) MOVQ AX, (SP) 0x0061 00097 (append.go:9) MOVQ "".b+80(FP), BP 0x0066 00102 (append.go:9) MOVQ BP, 8(SP) 0x006b 00107 (append.go:9) MOVQ CX, 16(SP) 0x0070 00112 (append.go:9) MOVQ BX, 24(SP) 0x0075 00117 (append.go:9) MOVQ DX, 32(SP) 0x007a 00122 (append.go:9) PCDATA $0, $0 0x007a 00122 (append.go:9) CALL runtime.growslice(SB) 0x007f 00127 (append.go:9) MOVQ 40(SP), AX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:8) MOVQ "".b+88(FP), CX 0x008e 00142 (append.go:9) MOVQ "".autotmp_0+64(SP), DX 0x0093 00147 (append.go:9) JMP 52 0x0095 00149 (append.go:9) NOP 0x0095 00149 (append.go:8) CALL runtime.morestack_noctxt(SB) 0x009a 00154 (append.go:8) JMP 0 After: "".s t=1 size=176 args=0x30 locals=0x40 0x0000 00000 (append.go:8) TEXT "".s(SB), $64-48 0x0000 00000 (append.go:8) MOVQ (TLS), CX 0x0009 00009 (append.go:8) CMPQ SP, 16(CX) 0x000d 00013 (append.go:8) JLS 151 0x0013 00019 (append.go:8) SUBQ $64, SP 0x0017 00023 (append.go:8) FUNCDATA $0, gclocals·6432f8c6a0d23fa7bee6c5d96f21a92a(SB) 0x0017 00023 (append.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:9) MOVQ "".b+80(FP), CX 0x001c 00028 (append.go:9) LEAQ 3(CX), DX 0x0020 00032 (append.go:9) MOVQ "".b+88(FP), BX 0x0025 00037 (append.go:9) CMPQ DX, BX 0x0028 00040 (append.go:9) JGT $0, 81 0x002a 00042 (append.go:8) MOVQ "".b+72(FP), AX 0x002f 00047 (append.go:9) MOVB $1, (AX)(CX*1) 0x0033 00051 (append.go:9) MOVB $2, 1(AX)(CX*1) 0x0038 00056 (append.go:9) MOVB $3, 2(AX)(CX*1) 0x003d 00061 (append.go:10) MOVQ AX, "".~r1+96(FP) 0x0042 00066 (append.go:10) MOVQ DX, "".~r1+104(FP) 0x0047 00071 (append.go:10) MOVQ BX, "".~r1+112(FP) 0x004c 00076 (append.go:10) ADDQ $64, SP 0x0050 00080 (append.go:10) RET 0x0051 00081 (append.go:9) LEAQ type.[]uint8(SB), AX 0x0058 00088 (append.go:9) MOVQ AX, (SP) 0x005c 00092 (append.go:9) MOVQ "".b+72(FP), BP 0x0061 00097 (append.go:9) MOVQ BP, 8(SP) 0x0066 00102 (append.go:9) MOVQ CX, 16(SP) 0x006b 00107 (append.go:9) MOVQ BX, 24(SP) 0x0070 00112 (append.go:9) MOVQ DX, 32(SP) 0x0075 00117 (append.go:9) PCDATA $0, $0 0x0075 00117 (append.go:9) CALL runtime.growslice(SB) 0x007a 00122 (append.go:9) MOVQ 40(SP), AX 0x007f 00127 (append.go:9) MOVQ 48(SP), CX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:9) ADDQ $3, CX 0x008d 00141 (append.go:9) MOVQ CX, DX 0x0090 00144 (append.go:8) MOVQ "".b+80(FP), CX 0x0095 00149 (append.go:9) JMP 47 0x0097 00151 (append.go:9) NOP 0x0097 00151 (append.go:8) CALL runtime.morestack_noctxt(SB) 0x009c 00156 (append.go:8) JMP 0 Observe that in the following sequence, we should use DX directly instead of using CX as a temporary register, which would make the new code a strict improvement on the old: 0x007f 00127 (append.go:9) MOVQ 48(SP), CX 0x0084 00132 (append.go:9) MOVQ 56(SP), BX 0x0089 00137 (append.go:9) ADDQ $3, CX 0x008d 00141 (append.go:9) MOVQ CX, DX 0x0090 00144 (append.go:8) MOVQ "".b+80(FP), CX Change-Id: I4ee50b18fa53865901d2d7f86c2cbb54c6fa6924 Reviewed-on: https://go-review.googlesource.com/21812 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:08:00 -07:00
delete(s.vars, &newlenVar)
delete(s.vars, &capVar)
cmd/compile: avoid write barrier in append fast path When we are writing the result of an append back to the same slice, we don’t need a write barrier on the fast path. This re-implements an optimization that was present in the old backend. Updates #14921 Fixes #14969 Sample code: var x []byte func p() { x = append(x, 1, 2, 3) } Before: "".p t=1 size=224 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 199 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x(SB), DX 0x0025 00037 (append.go:19) MOVQ "".x+8(SB), BX 0x002c 00044 (append.go:19) MOVQ BX, "".autotmp_0+64(SP) 0x0031 00049 (append.go:22) LEAQ 3(BX), BP 0x0035 00053 (append.go:22) CMPQ BP, CX 0x0038 00056 (append.go:22) JGT $0, 131 0x003a 00058 (append.go:22) MOVB $1, (DX)(BX*1) 0x003e 00062 (append.go:22) MOVB $2, 1(DX)(BX*1) 0x0043 00067 (append.go:22) MOVB $3, 2(DX)(BX*1) 0x0048 00072 (append.go:22) MOVQ BP, "".x+8(SB) 0x004f 00079 (append.go:22) MOVQ CX, "".x+16(SB) 0x0056 00086 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x005c 00092 (append.go:22) TESTB AL, AL 0x005e 00094 (append.go:22) JNE $0, 108 0x0060 00096 (append.go:22) MOVQ DX, "".x(SB) 0x0067 00103 (append.go:23) ADDQ $72, SP 0x006b 00107 (append.go:23) RET 0x006c 00108 (append.go:22) LEAQ "".x(SB), CX 0x0073 00115 (append.go:22) MOVQ CX, (SP) 0x0077 00119 (append.go:22) MOVQ DX, 8(SP) 0x007c 00124 (append.go:22) PCDATA $0, $0 0x007c 00124 (append.go:22) CALL runtime.writebarrierptr(SB) 0x0081 00129 (append.go:23) JMP 103 0x0083 00131 (append.go:22) LEAQ type.[]uint8(SB), AX 0x008a 00138 (append.go:22) MOVQ AX, (SP) 0x008e 00142 (append.go:22) MOVQ DX, 8(SP) 0x0093 00147 (append.go:22) MOVQ BX, 16(SP) 0x0098 00152 (append.go:22) MOVQ CX, 24(SP) 0x009d 00157 (append.go:22) MOVQ BP, 32(SP) 0x00a2 00162 (append.go:22) PCDATA $0, $0 0x00a2 00162 (append.go:22) CALL runtime.growslice(SB) 0x00a7 00167 (append.go:22) MOVQ 40(SP), DX 0x00ac 00172 (append.go:22) MOVQ 48(SP), AX 0x00b1 00177 (append.go:22) MOVQ 56(SP), CX 0x00b6 00182 (append.go:22) ADDQ $3, AX 0x00ba 00186 (append.go:19) MOVQ "".autotmp_0+64(SP), BX 0x00bf 00191 (append.go:22) MOVQ AX, BP 0x00c2 00194 (append.go:22) JMP 58 0x00c7 00199 (append.go:22) NOP 0x00c7 00199 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00cc 00204 (append.go:21) JMP 0 After: "".p t=1 size=208 args=0x0 locals=0x48 0x0000 00000 (append.go:21) TEXT "".p(SB), $72-0 0x0000 00000 (append.go:21) MOVQ (TLS), CX 0x0009 00009 (append.go:21) CMPQ SP, 16(CX) 0x000d 00013 (append.go:21) JLS 191 0x0013 00019 (append.go:21) SUBQ $72, SP 0x0017 00023 (append.go:21) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:21) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0017 00023 (append.go:19) MOVQ "".x+16(SB), CX 0x001e 00030 (append.go:19) MOVQ "".x+8(SB), DX 0x0025 00037 (append.go:19) MOVQ DX, "".autotmp_0+64(SP) 0x002a 00042 (append.go:19) MOVQ "".x(SB), BX 0x0031 00049 (append.go:22) LEAQ 3(DX), BP 0x0035 00053 (append.go:22) MOVQ BP, "".x+8(SB) 0x003c 00060 (append.go:22) CMPQ BP, CX 0x003f 00063 (append.go:22) JGT $0, 84 0x0041 00065 (append.go:22) MOVB $1, (BX)(DX*1) 0x0045 00069 (append.go:22) MOVB $2, 1(BX)(DX*1) 0x004a 00074 (append.go:22) MOVB $3, 2(BX)(DX*1) 0x004f 00079 (append.go:23) ADDQ $72, SP 0x0053 00083 (append.go:23) RET 0x0054 00084 (append.go:22) LEAQ type.[]uint8(SB), AX 0x005b 00091 (append.go:22) MOVQ AX, (SP) 0x005f 00095 (append.go:22) MOVQ BX, 8(SP) 0x0064 00100 (append.go:22) MOVQ DX, 16(SP) 0x0069 00105 (append.go:22) MOVQ CX, 24(SP) 0x006e 00110 (append.go:22) MOVQ BP, 32(SP) 0x0073 00115 (append.go:22) PCDATA $0, $0 0x0073 00115 (append.go:22) CALL runtime.growslice(SB) 0x0078 00120 (append.go:22) MOVQ 40(SP), CX 0x007d 00125 (append.go:22) MOVQ 56(SP), AX 0x0082 00130 (append.go:22) MOVQ AX, "".x+16(SB) 0x0089 00137 (append.go:22) MOVL runtime.writeBarrier(SB), AX 0x008f 00143 (append.go:22) TESTB AL, AL 0x0091 00145 (append.go:22) JNE $0, 168 0x0093 00147 (append.go:22) MOVQ CX, "".x(SB) 0x009a 00154 (append.go:22) MOVQ "".x(SB), BX 0x00a1 00161 (append.go:19) MOVQ "".autotmp_0+64(SP), DX 0x00a6 00166 (append.go:22) JMP 65 0x00a8 00168 (append.go:22) LEAQ "".x(SB), DX 0x00af 00175 (append.go:22) MOVQ DX, (SP) 0x00b3 00179 (append.go:22) MOVQ CX, 8(SP) 0x00b8 00184 (append.go:22) PCDATA $0, $0 0x00b8 00184 (append.go:22) CALL runtime.writebarrierptr(SB) 0x00bd 00189 (append.go:22) JMP 154 0x00bf 00191 (append.go:22) NOP 0x00bf 00191 (append.go:21) CALL runtime.morestack_noctxt(SB) 0x00c4 00196 (append.go:21) JMP 0 Change-Id: I77a41ad3a22557a4bb4654de7d6d24a029efe34a Reviewed-on: https://go-review.googlesource.com/21813 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-10 09:44:17 -07:00
// make result
return s.newValue3(ssa.OpSliceMake, n.Type, p, nl, c)
}
// condBranch evaluates the boolean expression cond and branches to yes
// if cond is true and no if cond is false.
// This function is intended to handle && and || better than just calling
// s.expr(cond) and branching on the result.
func (s *state) condBranch(cond *Node, yes, no *ssa.Block, likely int8) {
switch cond.Op {
case OANDAND:
mid := s.f.NewBlock(ssa.BlockPlain)
s.stmtList(cond.Ninit)
s.condBranch(cond.Left, mid, no, max8(likely, 0))
s.startBlock(mid)
s.condBranch(cond.Right, yes, no, likely)
return
// Note: if likely==1, then both recursive calls pass 1.
// If likely==-1, then we don't have enough information to decide
// whether the first branch is likely or not. So we pass 0 for
// the likeliness of the first branch.
// TODO: have the frontend give us branch prediction hints for
// OANDAND and OOROR nodes (if it ever has such info).
case OOROR:
mid := s.f.NewBlock(ssa.BlockPlain)
s.stmtList(cond.Ninit)
s.condBranch(cond.Left, yes, mid, min8(likely, 0))
s.startBlock(mid)
s.condBranch(cond.Right, yes, no, likely)
return
// Note: if likely==-1, then both recursive calls pass -1.
// If likely==1, then we don't have enough info to decide
// the likelihood of the first branch.
case ONOT:
s.stmtList(cond.Ninit)
s.condBranch(cond.Left, no, yes, -likely)
return
}
c := s.expr(cond)
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(c)
b.Likely = ssa.BranchPrediction(likely) // gc and ssa both use -1/0/+1 for likeliness
b.AddEdgeTo(yes)
b.AddEdgeTo(no)
}
type skipMask uint8
const (
skipPtr skipMask = 1 << iota
skipLen
skipCap
)
// assign does left = right.
// Right has already been evaluated to ssa, left has not.
// If deref is true, then we do left = *right instead (and right has already been nil-checked).
// If deref is true and right == nil, just do left = 0.
// skip indicates assignments (at the top level) that can be avoided.
func (s *state) assign(left *Node, right *ssa.Value, deref bool, skip skipMask) {
if left.Op == ONAME && left.isBlank() {
return
}
t := left.Type
dowidth(t)
if s.canSSA(left) {
if deref {
s.Fatalf("can SSA LHS %v but not RHS %s", left, right)
}
if left.Op == ODOT {
// We're assigning to a field of an ssa-able value.
// We need to build a new structure with the new value for the
// field we're assigning and the old values for the other fields.
// For instance:
// type T struct {a, b, c int}
// var T x
// x.b = 5
// For the x.b = 5 assignment we want to generate x = T{x.a, 5, x.c}
// Grab information about the structure type.
t := left.Left.Type
nf := t.NumFields()
idx := fieldIdx(left)
// Grab old value of structure.
old := s.expr(left.Left)
// Make new structure.
new := s.newValue0(ssa.StructMakeOp(t.NumFields()), t)
// Add fields as args.
for i := 0; i < nf; i++ {
if i == idx {
new.AddArg(right)
} else {
new.AddArg(s.newValue1I(ssa.OpStructSelect, t.FieldType(i), int64(i), old))
}
}
// Recursively assign the new value we've made to the base of the dot op.
s.assign(left.Left, new, false, 0)
// TODO: do we need to update named values here?
return
}
if left.Op == OINDEX && left.Left.Type.IsArray() {
// We're assigning to an element of an ssa-able array.
// a[i] = v
t := left.Left.Type
n := t.NumElem()
i := s.expr(left.Right) // index
if n == 0 {
// The bounds check must fail. Might as well
// ignore the actual index and just use zeros.
z := s.constInt(types.Types[TINT], 0)
s.boundsCheck(z, z, ssa.BoundsIndex, false)
return
}
if n != 1 {
s.Fatalf("assigning to non-1-length array")
}
// Rewrite to a = [1]{v}
len := s.constInt(types.Types[TINT], 1)
i = s.boundsCheck(i, len, ssa.BoundsIndex, false)
v := s.newValue1(ssa.OpArrayMake1, t, right)
s.assign(left.Left, v, false, 0)
return
}
// Update variable assignment.
s.vars[left] = right
s.addNamedValue(left, right)
return
}
// Left is not ssa-able. Compute its address.
if left.Op == ONAME && left.Class() != PEXTERN && skip == 0 {
s.vars[&memVar] = s.newValue1Apos(ssa.OpVarDef, types.TypeMem, left, s.mem(), !left.IsAutoTmp())
}
addr := s.addr(left, false)
if isReflectHeaderDataField(left) {
// Package unsafe's documentation says storing pointers into
// reflect.SliceHeader and reflect.StringHeader's Data fields
// is valid, even though they have type uintptr (#19168).
// Mark it pointer type to signal the writebarrier pass to
// insert a write barrier.
t = types.Types[TUNSAFEPTR]
}
if deref {
// Treat as a mem->mem move.
if right == nil {
s.zero(t, addr)
} else {
s.move(t, addr, right)
}
return
}
// Treat as a store.
s.storeType(t, addr, right, skip, !left.IsAutoTmp())
}
// zeroVal returns the zero value for type t.
func (s *state) zeroVal(t *types.Type) *ssa.Value {
switch {
case t.IsInteger():
switch t.Size() {
case 1:
return s.constInt8(t, 0)
case 2:
return s.constInt16(t, 0)
case 4:
return s.constInt32(t, 0)
case 8:
return s.constInt64(t, 0)
default:
s.Fatalf("bad sized integer type %v", t)
}
case t.IsFloat():
switch t.Size() {
case 4:
return s.constFloat32(t, 0)
case 8:
return s.constFloat64(t, 0)
default:
s.Fatalf("bad sized float type %v", t)
}
case t.IsComplex():
switch t.Size() {
case 8:
z := s.constFloat32(types.Types[TFLOAT32], 0)
return s.entryNewValue2(ssa.OpComplexMake, t, z, z)
case 16:
z := s.constFloat64(types.Types[TFLOAT64], 0)
return s.entryNewValue2(ssa.OpComplexMake, t, z, z)
default:
s.Fatalf("bad sized complex type %v", t)
}
case t.IsString():
return s.constEmptyString(t)
case t.IsPtrShaped():
return s.constNil(t)
case t.IsBoolean():
return s.constBool(false)
case t.IsInterface():
return s.constInterface(t)
case t.IsSlice():
return s.constSlice(t)
case t.IsStruct():
n := t.NumFields()
v := s.entryNewValue0(ssa.StructMakeOp(t.NumFields()), t)
for i := 0; i < n; i++ {
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
v.AddArg(s.zeroVal(t.FieldType(i)))
}
return v
case t.IsArray():
switch t.NumElem() {
case 0:
return s.entryNewValue0(ssa.OpArrayMake0, t)
case 1:
return s.entryNewValue1(ssa.OpArrayMake1, t, s.zeroVal(t.Elem()))
}
}
s.Fatalf("zero for type %v not implemented", t)
return nil
}
type callKind int8
const (
callNormal callKind = iota
callDefer
callGo
)
type sfRtCallDef struct {
rtfn *obj.LSym
rtype types.EType
}
var softFloatOps map[ssa.Op]sfRtCallDef
func softfloatInit() {
// Some of these operations get transformed by sfcall.
softFloatOps = map[ssa.Op]sfRtCallDef{
ssa.OpAdd32F: sfRtCallDef{sysfunc("fadd32"), TFLOAT32},
ssa.OpAdd64F: sfRtCallDef{sysfunc("fadd64"), TFLOAT64},
ssa.OpSub32F: sfRtCallDef{sysfunc("fadd32"), TFLOAT32},
ssa.OpSub64F: sfRtCallDef{sysfunc("fadd64"), TFLOAT64},
ssa.OpMul32F: sfRtCallDef{sysfunc("fmul32"), TFLOAT32},
ssa.OpMul64F: sfRtCallDef{sysfunc("fmul64"), TFLOAT64},
ssa.OpDiv32F: sfRtCallDef{sysfunc("fdiv32"), TFLOAT32},
ssa.OpDiv64F: sfRtCallDef{sysfunc("fdiv64"), TFLOAT64},
ssa.OpEq64F: sfRtCallDef{sysfunc("feq64"), TBOOL},
ssa.OpEq32F: sfRtCallDef{sysfunc("feq32"), TBOOL},
ssa.OpNeq64F: sfRtCallDef{sysfunc("feq64"), TBOOL},
ssa.OpNeq32F: sfRtCallDef{sysfunc("feq32"), TBOOL},
ssa.OpLess64F: sfRtCallDef{sysfunc("fgt64"), TBOOL},
ssa.OpLess32F: sfRtCallDef{sysfunc("fgt32"), TBOOL},
ssa.OpGreater64F: sfRtCallDef{sysfunc("fgt64"), TBOOL},
ssa.OpGreater32F: sfRtCallDef{sysfunc("fgt32"), TBOOL},
ssa.OpLeq64F: sfRtCallDef{sysfunc("fge64"), TBOOL},
ssa.OpLeq32F: sfRtCallDef{sysfunc("fge32"), TBOOL},
ssa.OpGeq64F: sfRtCallDef{sysfunc("fge64"), TBOOL},
ssa.OpGeq32F: sfRtCallDef{sysfunc("fge32"), TBOOL},
ssa.OpCvt32to32F: sfRtCallDef{sysfunc("fint32to32"), TFLOAT32},
ssa.OpCvt32Fto32: sfRtCallDef{sysfunc("f32toint32"), TINT32},
ssa.OpCvt64to32F: sfRtCallDef{sysfunc("fint64to32"), TFLOAT32},
ssa.OpCvt32Fto64: sfRtCallDef{sysfunc("f32toint64"), TINT64},
ssa.OpCvt64Uto32F: sfRtCallDef{sysfunc("fuint64to32"), TFLOAT32},
ssa.OpCvt32Fto64U: sfRtCallDef{sysfunc("f32touint64"), TUINT64},
ssa.OpCvt32to64F: sfRtCallDef{sysfunc("fint32to64"), TFLOAT64},
ssa.OpCvt64Fto32: sfRtCallDef{sysfunc("f64toint32"), TINT32},
ssa.OpCvt64to64F: sfRtCallDef{sysfunc("fint64to64"), TFLOAT64},
ssa.OpCvt64Fto64: sfRtCallDef{sysfunc("f64toint64"), TINT64},
ssa.OpCvt64Uto64F: sfRtCallDef{sysfunc("fuint64to64"), TFLOAT64},
ssa.OpCvt64Fto64U: sfRtCallDef{sysfunc("f64touint64"), TUINT64},
ssa.OpCvt32Fto64F: sfRtCallDef{sysfunc("f32to64"), TFLOAT64},
ssa.OpCvt64Fto32F: sfRtCallDef{sysfunc("f64to32"), TFLOAT32},
}
}
// TODO: do not emit sfcall if operation can be optimized to constant in later
// opt phase
func (s *state) sfcall(op ssa.Op, args ...*ssa.Value) (*ssa.Value, bool) {
if callDef, ok := softFloatOps[op]; ok {
switch op {
case ssa.OpLess32F,
ssa.OpLess64F,
ssa.OpLeq32F,
ssa.OpLeq64F:
args[0], args[1] = args[1], args[0]
case ssa.OpSub32F,
ssa.OpSub64F:
cmd/compile: bulk rename This change does a bulk rename of several identifiers in the compiler. See #27167 and https://docs.google.com/document/d/19_ExiylD9MRfeAjKIfEsMU1_RGhuxB9sA0b5Zv7byVI/ for context and for discussion of these particular renames. Commands run to generate this change: gorename -from '"cmd/compile/internal/gc".OPROC' -to OGO gorename -from '"cmd/compile/internal/gc".OCOM' -to OBITNOT gorename -from '"cmd/compile/internal/gc".OMINUS' -to ONEG gorename -from '"cmd/compile/internal/gc".OIND' -to ODEREF gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTR' -to OBYTES2STR gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTRTMP' -to OBYTES2STRTMP gorename -from '"cmd/compile/internal/gc".OARRAYRUNESTR' -to ORUNES2STR gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTE' -to OSTR2BYTES gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTETMP' -to OSTR2BYTESTMP gorename -from '"cmd/compile/internal/gc".OSTRARRAYRUNE' -to OSTR2RUNES gorename -from '"cmd/compile/internal/gc".Etop' -to ctxStmt gorename -from '"cmd/compile/internal/gc".Erv' -to ctxExpr gorename -from '"cmd/compile/internal/gc".Ecall' -to ctxCallee gorename -from '"cmd/compile/internal/gc".Efnstruct' -to ctxMultiOK gorename -from '"cmd/compile/internal/gc".Easgn' -to ctxAssign gorename -from '"cmd/compile/internal/gc".Ecomplit' -to ctxCompLit Not altered: parameters and local variables (mostly in typecheck.go) named top, which should probably now be called ctx (and which should probably have a named type). Also not altered: Field called Top in gc.Func. gorename -from '"cmd/compile/internal/gc".Node.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/gc".Node.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/gc".nodeIsddd' -to nodeIsDDD gorename -from '"cmd/compile/internal/types".Field.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/types".Field.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/types".fieldIsddd' -to fieldIsDDD Not altered: function gc.hasddd, params and local variables called isddd Also not altered: fmt.go prints nodes using "isddd(%v)". cd cmd/compile/internal/gc; go generate I then manually found impacted comments using exact string match and fixed them up by hand. The comment changes were trivial. Passes toolstash-check. Fixes #27167. If this experiment is deemed a success, we will open a new tracking issue for renames to do at the end of the 1.13 cycles. Change-Id: I2dc541533d2ab0d06cb3d31d65df205ecfb151e8 Reviewed-on: https://go-review.googlesource.com/c/150140 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-11-18 08:34:38 -08:00
args[1] = s.newValue1(s.ssaOp(ONEG, types.Types[callDef.rtype]), args[1].Type, args[1])
}
result := s.rtcall(callDef.rtfn, true, []*types.Type{types.Types[callDef.rtype]}, args...)[0]
if op == ssa.OpNeq32F || op == ssa.OpNeq64F {
result = s.newValue1(ssa.OpNot, result.Type, result)
}
return result, true
}
return nil, false
}
var intrinsics map[intrinsicKey]intrinsicBuilder
// An intrinsicBuilder converts a call node n into an ssa value that
// implements that call as an intrinsic. args is a list of arguments to the func.
type intrinsicBuilder func(s *state, n *Node, args []*ssa.Value) *ssa.Value
type intrinsicKey struct {
arch *sys.Arch
pkg string
fn string
}
func init() {
intrinsics = map[intrinsicKey]intrinsicBuilder{}
var all []*sys.Arch
var p4 []*sys.Arch
var p8 []*sys.Arch
var lwatomics []*sys.Arch
for _, a := range sys.Archs {
all = append(all, a)
if a.PtrSize == 4 {
p4 = append(p4, a)
} else {
p8 = append(p8, a)
}
if a.Family != sys.PPC64 {
lwatomics = append(lwatomics, a)
}
}
// add adds the intrinsic b for pkg.fn for the given list of architectures.
add := func(pkg, fn string, b intrinsicBuilder, archs ...*sys.Arch) {
for _, a := range archs {
intrinsics[intrinsicKey{a, pkg, fn}] = b
}
}
// addF does the same as add but operates on architecture families.
addF := func(pkg, fn string, b intrinsicBuilder, archFamilies ...sys.ArchFamily) {
m := 0
for _, f := range archFamilies {
if f >= 32 {
panic("too many architecture families")
}
m |= 1 << uint(f)
}
for _, a := range all {
if m>>uint(a.Family)&1 != 0 {
intrinsics[intrinsicKey{a, pkg, fn}] = b
}
}
}
// alias defines pkg.fn = pkg2.fn2 for all architectures in archs for which pkg2.fn2 exists.
alias := func(pkg, fn, pkg2, fn2 string, archs ...*sys.Arch) {
for _, a := range archs {
if b, ok := intrinsics[intrinsicKey{a, pkg2, fn2}]; ok {
intrinsics[intrinsicKey{a, pkg, fn}] = b
}
}
}
/******** runtime ********/
if !instrumenting {
add("runtime", "slicebytetostringtmp",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
cmd/compile: bulk rename This change does a bulk rename of several identifiers in the compiler. See #27167 and https://docs.google.com/document/d/19_ExiylD9MRfeAjKIfEsMU1_RGhuxB9sA0b5Zv7byVI/ for context and for discussion of these particular renames. Commands run to generate this change: gorename -from '"cmd/compile/internal/gc".OPROC' -to OGO gorename -from '"cmd/compile/internal/gc".OCOM' -to OBITNOT gorename -from '"cmd/compile/internal/gc".OMINUS' -to ONEG gorename -from '"cmd/compile/internal/gc".OIND' -to ODEREF gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTR' -to OBYTES2STR gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTRTMP' -to OBYTES2STRTMP gorename -from '"cmd/compile/internal/gc".OARRAYRUNESTR' -to ORUNES2STR gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTE' -to OSTR2BYTES gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTETMP' -to OSTR2BYTESTMP gorename -from '"cmd/compile/internal/gc".OSTRARRAYRUNE' -to OSTR2RUNES gorename -from '"cmd/compile/internal/gc".Etop' -to ctxStmt gorename -from '"cmd/compile/internal/gc".Erv' -to ctxExpr gorename -from '"cmd/compile/internal/gc".Ecall' -to ctxCallee gorename -from '"cmd/compile/internal/gc".Efnstruct' -to ctxMultiOK gorename -from '"cmd/compile/internal/gc".Easgn' -to ctxAssign gorename -from '"cmd/compile/internal/gc".Ecomplit' -to ctxCompLit Not altered: parameters and local variables (mostly in typecheck.go) named top, which should probably now be called ctx (and which should probably have a named type). Also not altered: Field called Top in gc.Func. gorename -from '"cmd/compile/internal/gc".Node.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/gc".Node.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/gc".nodeIsddd' -to nodeIsDDD gorename -from '"cmd/compile/internal/types".Field.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/types".Field.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/types".fieldIsddd' -to fieldIsDDD Not altered: function gc.hasddd, params and local variables called isddd Also not altered: fmt.go prints nodes using "isddd(%v)". cd cmd/compile/internal/gc; go generate I then manually found impacted comments using exact string match and fixed them up by hand. The comment changes were trivial. Passes toolstash-check. Fixes #27167. If this experiment is deemed a success, we will open a new tracking issue for renames to do at the end of the 1.13 cycles. Change-Id: I2dc541533d2ab0d06cb3d31d65df205ecfb151e8 Reviewed-on: https://go-review.googlesource.com/c/150140 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-11-18 08:34:38 -08:00
// Compiler frontend optimizations emit OBYTES2STRTMP nodes
// for the backend instead of slicebytetostringtmp calls
// when not instrumenting.
slice := args[0]
ptr := s.newValue1(ssa.OpSlicePtr, s.f.Config.Types.BytePtr, slice)
len := s.newValue1(ssa.OpSliceLen, types.Types[TINT], slice)
return s.newValue2(ssa.OpStringMake, n.Type, ptr, len)
},
all...)
}
addF("runtime/internal/math", "MulUintptr",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
if s.config.PtrSize == 4 {
return s.newValue2(ssa.OpMul32uover, types.NewTuple(types.Types[TUINT], types.Types[TUINT]), args[0], args[1])
}
return s.newValue2(ssa.OpMul64uover, types.NewTuple(types.Types[TUINT], types.Types[TUINT]), args[0], args[1])
},
sys.AMD64, sys.I386)
add("runtime", "KeepAlive",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
data := s.newValue1(ssa.OpIData, s.f.Config.Types.BytePtr, args[0])
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
s.vars[&memVar] = s.newValue2(ssa.OpKeepAlive, types.TypeMem, data, s.mem())
return nil
},
all...)
add("runtime", "getclosureptr",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue0(ssa.OpGetClosurePtr, s.f.Config.Types.Uintptr)
},
all...)
add("runtime", "getcallerpc",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue0(ssa.OpGetCallerPC, s.f.Config.Types.Uintptr)
},
all...)
add("runtime", "getcallersp",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue0(ssa.OpGetCallerSP, s.f.Config.Types.Uintptr)
},
all...)
/******** runtime/internal/sys ********/
addF("runtime/internal/sys", "Ctz32",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpCtz32, types.Types[TINT], args[0])
},
sys.AMD64, sys.ARM64, sys.ARM, sys.S390X, sys.MIPS, sys.PPC64)
addF("runtime/internal/sys", "Ctz64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpCtz64, types.Types[TINT], args[0])
},
sys.AMD64, sys.ARM64, sys.ARM, sys.S390X, sys.MIPS, sys.PPC64)
addF("runtime/internal/sys", "Bswap32",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpBswap32, types.Types[TUINT32], args[0])
},
sys.AMD64, sys.ARM64, sys.ARM, sys.S390X)
addF("runtime/internal/sys", "Bswap64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpBswap64, types.Types[TUINT64], args[0])
},
sys.AMD64, sys.ARM64, sys.ARM, sys.S390X)
/******** runtime/internal/atomic ********/
addF("runtime/internal/atomic", "Load",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
v := s.newValue2(ssa.OpAtomicLoad32, types.NewTuple(types.Types[TUINT32], types.TypeMem), args[0], s.mem())
s.vars[&memVar] = s.newValue1(ssa.OpSelect1, types.TypeMem, v)
return s.newValue1(ssa.OpSelect0, types.Types[TUINT32], v)
},
sys.AMD64, sys.ARM64, sys.S390X, sys.MIPS, sys.MIPS64, sys.PPC64)
addF("runtime/internal/atomic", "Load64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
v := s.newValue2(ssa.OpAtomicLoad64, types.NewTuple(types.Types[TUINT64], types.TypeMem), args[0], s.mem())
s.vars[&memVar] = s.newValue1(ssa.OpSelect1, types.TypeMem, v)
return s.newValue1(ssa.OpSelect0, types.Types[TUINT64], v)
},
sys.AMD64, sys.ARM64, sys.S390X, sys.MIPS64, sys.PPC64)
addF("runtime/internal/atomic", "LoadAcq",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
v := s.newValue2(ssa.OpAtomicLoadAcq32, types.NewTuple(types.Types[TUINT32], types.TypeMem), args[0], s.mem())
s.vars[&memVar] = s.newValue1(ssa.OpSelect1, types.TypeMem, v)
return s.newValue1(ssa.OpSelect0, types.Types[TUINT32], v)
},
sys.PPC64)
addF("runtime/internal/atomic", "Loadp",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
v := s.newValue2(ssa.OpAtomicLoadPtr, types.NewTuple(s.f.Config.Types.BytePtr, types.TypeMem), args[0], s.mem())
s.vars[&memVar] = s.newValue1(ssa.OpSelect1, types.TypeMem, v)
return s.newValue1(ssa.OpSelect0, s.f.Config.Types.BytePtr, v)
},
sys.AMD64, sys.ARM64, sys.S390X, sys.MIPS, sys.MIPS64, sys.PPC64)
addF("runtime/internal/atomic", "Store",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
s.vars[&memVar] = s.newValue3(ssa.OpAtomicStore32, types.TypeMem, args[0], args[1], s.mem())
return nil
},
sys.AMD64, sys.ARM64, sys.S390X, sys.MIPS, sys.MIPS64, sys.PPC64)
addF("runtime/internal/atomic", "Store64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
s.vars[&memVar] = s.newValue3(ssa.OpAtomicStore64, types.TypeMem, args[0], args[1], s.mem())
return nil
},
sys.AMD64, sys.ARM64, sys.S390X, sys.MIPS64, sys.PPC64)
addF("runtime/internal/atomic", "StorepNoWB",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
s.vars[&memVar] = s.newValue3(ssa.OpAtomicStorePtrNoWB, types.TypeMem, args[0], args[1], s.mem())
return nil
},
sys.AMD64, sys.ARM64, sys.S390X, sys.MIPS, sys.MIPS64)
addF("runtime/internal/atomic", "StoreRel",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
s.vars[&memVar] = s.newValue3(ssa.OpAtomicStoreRel32, types.TypeMem, args[0], args[1], s.mem())
return nil
},
sys.PPC64)
addF("runtime/internal/atomic", "Xchg",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
v := s.newValue3(ssa.OpAtomicExchange32, types.NewTuple(types.Types[TUINT32], types.TypeMem), args[0], args[1], s.mem())
s.vars[&memVar] = s.newValue1(ssa.OpSelect1, types.TypeMem, v)
return s.newValue1(ssa.OpSelect0, types.Types[TUINT32], v)
},
sys.AMD64, sys.ARM64, sys.S390X, sys.MIPS, sys.MIPS64, sys.PPC64)
addF("runtime/internal/atomic", "Xchg64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
v := s.newValue3(ssa.OpAtomicExchange64, types.NewTuple(types.Types[TUINT64], types.TypeMem), args[0], args[1], s.mem())
s.vars[&memVar] = s.newValue1(ssa.OpSelect1, types.TypeMem, v)
return s.newValue1(ssa.OpSelect0, types.Types[TUINT64], v)
},
sys.AMD64, sys.ARM64, sys.S390X, sys.MIPS64, sys.PPC64)
addF("runtime/internal/atomic", "Xadd",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
v := s.newValue3(ssa.OpAtomicAdd32, types.NewTuple(types.Types[TUINT32], types.TypeMem), args[0], args[1], s.mem())
s.vars[&memVar] = s.newValue1(ssa.OpSelect1, types.TypeMem, v)
return s.newValue1(ssa.OpSelect0, types.Types[TUINT32], v)
},
sys.AMD64, sys.S390X, sys.MIPS, sys.MIPS64, sys.PPC64)
addF("runtime/internal/atomic", "Xadd64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
v := s.newValue3(ssa.OpAtomicAdd64, types.NewTuple(types.Types[TUINT64], types.TypeMem), args[0], args[1], s.mem())
s.vars[&memVar] = s.newValue1(ssa.OpSelect1, types.TypeMem, v)
return s.newValue1(ssa.OpSelect0, types.Types[TUINT64], v)
},
sys.AMD64, sys.S390X, sys.MIPS64, sys.PPC64)
makeXaddARM64 := func(op0 ssa.Op, op1 ssa.Op, ty types.EType) func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
// Target Atomic feature is identified by dynamic detection
addr := s.entryNewValue1A(ssa.OpAddr, types.Types[TBOOL].PtrTo(), arm64HasATOMICS, s.sb)
v := s.load(types.Types[TBOOL], addr)
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(v)
bTrue := s.f.NewBlock(ssa.BlockPlain)
bFalse := s.f.NewBlock(ssa.BlockPlain)
bEnd := s.f.NewBlock(ssa.BlockPlain)
b.AddEdgeTo(bTrue)
b.AddEdgeTo(bFalse)
b.Likely = ssa.BranchUnlikely // most machines don't have Atomics nowadays
// We have atomic instructions - use it directly.
s.startBlock(bTrue)
v0 := s.newValue3(op1, types.NewTuple(types.Types[ty], types.TypeMem), args[0], args[1], s.mem())
s.vars[&memVar] = s.newValue1(ssa.OpSelect1, types.TypeMem, v0)
s.vars[n] = s.newValue1(ssa.OpSelect0, types.Types[ty], v0)
s.endBlock().AddEdgeTo(bEnd)
// Use original instruction sequence.
s.startBlock(bFalse)
v1 := s.newValue3(op0, types.NewTuple(types.Types[ty], types.TypeMem), args[0], args[1], s.mem())
s.vars[&memVar] = s.newValue1(ssa.OpSelect1, types.TypeMem, v1)
s.vars[n] = s.newValue1(ssa.OpSelect0, types.Types[ty], v1)
s.endBlock().AddEdgeTo(bEnd)
// Merge results.
s.startBlock(bEnd)
return s.variable(n, types.Types[ty])
}
}
addF("runtime/internal/atomic", "Xadd",
makeXaddARM64(ssa.OpAtomicAdd32, ssa.OpAtomicAdd32Variant, TUINT32),
sys.ARM64)
addF("runtime/internal/atomic", "Xadd64",
makeXaddARM64(ssa.OpAtomicAdd64, ssa.OpAtomicAdd64Variant, TUINT64),
sys.ARM64)
addF("runtime/internal/atomic", "Cas",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
v := s.newValue4(ssa.OpAtomicCompareAndSwap32, types.NewTuple(types.Types[TBOOL], types.TypeMem), args[0], args[1], args[2], s.mem())
s.vars[&memVar] = s.newValue1(ssa.OpSelect1, types.TypeMem, v)
return s.newValue1(ssa.OpSelect0, types.Types[TBOOL], v)
},
sys.AMD64, sys.ARM64, sys.S390X, sys.MIPS, sys.MIPS64, sys.PPC64)
addF("runtime/internal/atomic", "Cas64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
v := s.newValue4(ssa.OpAtomicCompareAndSwap64, types.NewTuple(types.Types[TBOOL], types.TypeMem), args[0], args[1], args[2], s.mem())
s.vars[&memVar] = s.newValue1(ssa.OpSelect1, types.TypeMem, v)
return s.newValue1(ssa.OpSelect0, types.Types[TBOOL], v)
},
sys.AMD64, sys.ARM64, sys.S390X, sys.MIPS64, sys.PPC64)
addF("runtime/internal/atomic", "CasRel",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
v := s.newValue4(ssa.OpAtomicCompareAndSwap32, types.NewTuple(types.Types[TBOOL], types.TypeMem), args[0], args[1], args[2], s.mem())
s.vars[&memVar] = s.newValue1(ssa.OpSelect1, types.TypeMem, v)
return s.newValue1(ssa.OpSelect0, types.Types[TBOOL], v)
},
sys.PPC64)
addF("runtime/internal/atomic", "And8",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
s.vars[&memVar] = s.newValue3(ssa.OpAtomicAnd8, types.TypeMem, args[0], args[1], s.mem())
return nil
},
sys.AMD64, sys.ARM64, sys.MIPS, sys.PPC64)
addF("runtime/internal/atomic", "Or8",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
s.vars[&memVar] = s.newValue3(ssa.OpAtomicOr8, types.TypeMem, args[0], args[1], s.mem())
return nil
},
sys.AMD64, sys.ARM64, sys.MIPS, sys.PPC64)
alias("runtime/internal/atomic", "Loadint64", "runtime/internal/atomic", "Load64", all...)
alias("runtime/internal/atomic", "Xaddint64", "runtime/internal/atomic", "Xadd64", all...)
alias("runtime/internal/atomic", "Loaduint", "runtime/internal/atomic", "Load", p4...)
alias("runtime/internal/atomic", "Loaduint", "runtime/internal/atomic", "Load64", p8...)
alias("runtime/internal/atomic", "Loaduintptr", "runtime/internal/atomic", "Load", p4...)
alias("runtime/internal/atomic", "Loaduintptr", "runtime/internal/atomic", "Load64", p8...)
alias("runtime/internal/atomic", "LoadAcq", "runtime/internal/atomic", "Load", lwatomics...)
alias("runtime/internal/atomic", "Storeuintptr", "runtime/internal/atomic", "Store", p4...)
alias("runtime/internal/atomic", "Storeuintptr", "runtime/internal/atomic", "Store64", p8...)
alias("runtime/internal/atomic", "StoreRel", "runtime/internal/atomic", "Store", lwatomics...)
alias("runtime/internal/atomic", "Xchguintptr", "runtime/internal/atomic", "Xchg", p4...)
alias("runtime/internal/atomic", "Xchguintptr", "runtime/internal/atomic", "Xchg64", p8...)
alias("runtime/internal/atomic", "Xadduintptr", "runtime/internal/atomic", "Xadd", p4...)
alias("runtime/internal/atomic", "Xadduintptr", "runtime/internal/atomic", "Xadd64", p8...)
alias("runtime/internal/atomic", "Casuintptr", "runtime/internal/atomic", "Cas", p4...)
alias("runtime/internal/atomic", "Casuintptr", "runtime/internal/atomic", "Cas64", p8...)
alias("runtime/internal/atomic", "Casp1", "runtime/internal/atomic", "Cas", p4...)
alias("runtime/internal/atomic", "Casp1", "runtime/internal/atomic", "Cas64", p8...)
alias("runtime/internal/atomic", "CasRel", "runtime/internal/atomic", "Cas", lwatomics...)
alias("runtime/internal/sys", "Ctz8", "math/bits", "TrailingZeros8", all...)
/******** math ********/
addF("math", "Sqrt",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpSqrt, types.Types[TFLOAT64], args[0])
},
sys.I386, sys.AMD64, sys.ARM, sys.ARM64, sys.MIPS, sys.MIPS64, sys.PPC64, sys.S390X, sys.Wasm)
addF("math", "Trunc",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpTrunc, types.Types[TFLOAT64], args[0])
},
sys.ARM64, sys.PPC64, sys.S390X, sys.Wasm)
addF("math", "Ceil",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpCeil, types.Types[TFLOAT64], args[0])
},
sys.ARM64, sys.PPC64, sys.S390X, sys.Wasm)
addF("math", "Floor",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpFloor, types.Types[TFLOAT64], args[0])
},
sys.ARM64, sys.PPC64, sys.S390X, sys.Wasm)
addF("math", "Round",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpRound, types.Types[TFLOAT64], args[0])
},
sys.ARM64, sys.PPC64, sys.S390X)
addF("math", "RoundToEven",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpRoundToEven, types.Types[TFLOAT64], args[0])
},
sys.ARM64, sys.S390X, sys.Wasm)
addF("math", "Abs",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpAbs, types.Types[TFLOAT64], args[0])
},
sys.ARM64, sys.PPC64, sys.Wasm)
addF("math", "Copysign",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue2(ssa.OpCopysign, types.Types[TFLOAT64], args[0], args[1])
},
sys.PPC64, sys.Wasm)
makeRoundAMD64 := func(op ssa.Op) func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
addr := s.entryNewValue1A(ssa.OpAddr, types.Types[TBOOL].PtrTo(), x86HasSSE41, s.sb)
v := s.load(types.Types[TBOOL], addr)
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(v)
bTrue := s.f.NewBlock(ssa.BlockPlain)
bFalse := s.f.NewBlock(ssa.BlockPlain)
bEnd := s.f.NewBlock(ssa.BlockPlain)
b.AddEdgeTo(bTrue)
b.AddEdgeTo(bFalse)
b.Likely = ssa.BranchLikely // most machines have sse4.1 nowadays
// We have the intrinsic - use it directly.
s.startBlock(bTrue)
s.vars[n] = s.newValue1(op, types.Types[TFLOAT64], args[0])
s.endBlock().AddEdgeTo(bEnd)
// Call the pure Go version.
s.startBlock(bFalse)
a := s.call(n, callNormal)
s.vars[n] = s.load(types.Types[TFLOAT64], a)
s.endBlock().AddEdgeTo(bEnd)
// Merge results.
s.startBlock(bEnd)
return s.variable(n, types.Types[TFLOAT64])
}
}
addF("math", "RoundToEven",
makeRoundAMD64(ssa.OpRoundToEven),
sys.AMD64)
addF("math", "Floor",
makeRoundAMD64(ssa.OpFloor),
sys.AMD64)
addF("math", "Ceil",
makeRoundAMD64(ssa.OpCeil),
sys.AMD64)
addF("math", "Trunc",
makeRoundAMD64(ssa.OpTrunc),
sys.AMD64)
/******** math/bits ********/
addF("math/bits", "TrailingZeros64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpCtz64, types.Types[TINT], args[0])
},
sys.AMD64, sys.ARM64, sys.ARM, sys.S390X, sys.MIPS, sys.PPC64, sys.Wasm)
addF("math/bits", "TrailingZeros32",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpCtz32, types.Types[TINT], args[0])
},
sys.AMD64, sys.ARM64, sys.ARM, sys.S390X, sys.MIPS, sys.PPC64, sys.Wasm)
addF("math/bits", "TrailingZeros16",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
x := s.newValue1(ssa.OpZeroExt16to32, types.Types[TUINT32], args[0])
c := s.constInt32(types.Types[TUINT32], 1<<16)
y := s.newValue2(ssa.OpOr32, types.Types[TUINT32], x, c)
return s.newValue1(ssa.OpCtz32, types.Types[TINT], y)
},
sys.MIPS)
cmd/compile: optimize TrailingZeros(8|16) on amd64 Introduce Ctz8 and Ctz16 ops and provide optimized lowerings for them. amd64 only for this CL, although it wouldn't surprise me if other architectures also admit of optimized lowerings. name old time/op new time/op delta TrailingZeros8-8 1.33ns ± 6% 0.84ns ± 3% -36.90% (p=0.000 n=20+20) TrailingZeros16-8 1.26ns ± 5% 0.84ns ± 5% -33.50% (p=0.000 n=20+18) Code: func f8(x uint8) { z = bits.TrailingZeros8(x) } func f16(x uint16) { z = bits.TrailingZeros16(x) } Before: "".f8 STEXT nosplit size=34 args=0x8 locals=0x0 0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX 0x0005 00005 (x.go:7) MOVBLZX AL, AX 0x0008 00008 (x.go:7) BTSQ $8, AX 0x000d 00013 (x.go:7) BSFQ AX, AX 0x0011 00017 (x.go:7) MOVL $64, CX 0x0016 00022 (x.go:7) CMOVQEQ CX, AX 0x001a 00026 (x.go:7) MOVQ AX, "".z(SB) 0x0021 00033 (x.go:7) RET "".f16 STEXT nosplit size=34 args=0x8 locals=0x0 0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX 0x0005 00005 (x.go:8) MOVWLZX AX, AX 0x0008 00008 (x.go:8) BTSQ $16, AX 0x000d 00013 (x.go:8) BSFQ AX, AX 0x0011 00017 (x.go:8) MOVL $64, CX 0x0016 00022 (x.go:8) CMOVQEQ CX, AX 0x001a 00026 (x.go:8) MOVQ AX, "".z(SB) 0x0021 00033 (x.go:8) RET After: "".f8 STEXT nosplit size=20 args=0x8 locals=0x0 0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX 0x0005 00005 (x.go:7) BTSL $8, AX 0x0009 00009 (x.go:7) BSFL AX, AX 0x000c 00012 (x.go:7) MOVQ AX, "".z(SB) 0x0013 00019 (x.go:7) RET "".f16 STEXT nosplit size=20 args=0x8 locals=0x0 0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX 0x0005 00005 (x.go:8) BTSL $16, AX 0x0009 00009 (x.go:8) BSFL AX, AX 0x000c 00012 (x.go:8) MOVQ AX, "".z(SB) 0x0013 00019 (x.go:8) RET Change-Id: I0551e357348de2b724737d569afd6ac9f5c3aa11 Reviewed-on: https://go-review.googlesource.com/108940 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Keith Randall <khr@golang.org>
2018-04-23 14:46:41 -07:00
addF("math/bits", "TrailingZeros16",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpCtz16, types.Types[TINT], args[0])
},
sys.AMD64, sys.ARM, sys.ARM64, sys.Wasm)
addF("math/bits", "TrailingZeros16",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
x := s.newValue1(ssa.OpZeroExt16to64, types.Types[TUINT64], args[0])
c := s.constInt64(types.Types[TUINT64], 1<<16)
y := s.newValue2(ssa.OpOr64, types.Types[TUINT64], x, c)
return s.newValue1(ssa.OpCtz64, types.Types[TINT], y)
},
cmd/compile: eliminate unnecessary type conversions in TrailingZeros(16|8) for arm64 This CL eliminates unnecessary type conversion operations: OpZeroExt16to64 and OpZeroExt8to64. If the input argrument is a nonzero value, then ORconst operation can also be eliminated. Benchmarks: name old time/op new time/op delta TrailingZeros-8 2.75ns ± 0% 2.75ns ± 0% ~ (all equal) TrailingZeros8-8 3.49ns ± 1% 2.93ns ± 0% -16.00% (p=0.000 n=10+10) TrailingZeros16-8 3.49ns ± 1% 2.93ns ± 0% -16.05% (p=0.000 n=9+10) TrailingZeros32-8 2.67ns ± 1% 2.68ns ± 1% ~ (p=0.468 n=10+10) TrailingZeros64-8 2.67ns ± 1% 2.65ns ± 0% -0.62% (p=0.022 n=10+9) code: func f16(x uint) { z = bits.TrailingZeros16(uint16(x)) } Before: "".f16 STEXT size=48 args=0x8 locals=0x0 leaf 0x0000 00000 (test.go:7) TEXT "".f16(SB), LEAF|NOFRAME|ABIInternal, $0-8 0x0000 00000 (test.go:7) FUNCDATA ZR, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (test.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (test.go:7) FUNCDATA $3, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (test.go:7) PCDATA $2, ZR 0x0000 00000 (test.go:7) PCDATA ZR, ZR 0x0000 00000 (test.go:7) MOVD "".x(FP), R0 0x0004 00004 (test.go:7) MOVHU R0, R0 0x0008 00008 (test.go:7) ORR $65536, R0, R0 0x000c 00012 (test.go:7) RBIT R0, R0 0x0010 00016 (test.go:7) CLZ R0, R0 0x0014 00020 (test.go:7) MOVD R0, "".z(SB) 0x0020 00032 (test.go:7) RET (R30) This line of code is unnecessary: 0x0004 00004 (test.go:7) MOVHU R0, R0 After: "".f16 STEXT size=32 args=0x8 locals=0x0 leaf 0x0000 00000 (test.go:7) TEXT "".f16(SB), LEAF|NOFRAME|ABIInternal, $0-8 0x0000 00000 (test.go:7) FUNCDATA ZR, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (test.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (test.go:7) FUNCDATA $3, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (test.go:7) PCDATA $2, ZR 0x0000 00000 (test.go:7) PCDATA ZR, ZR 0x0000 00000 (test.go:7) MOVD "".x(FP), R0 0x0004 00004 (test.go:7) ORR $65536, R0, R0 0x0008 00008 (test.go:7) RBITW R0, R0 0x000c 00012 (test.go:7) CLZW R0, R0 0x0010 00016 (test.go:7) MOVD R0, "".z(SB) 0x001c 00028 (test.go:7) RET (R30) The situation of TrailingZeros8 is similar to TrailingZeros16. Change-Id: I473bdca06be8460a0be87abbae6fe640017e4c9d Reviewed-on: https://go-review.googlesource.com/c/go/+/156999 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-01-03 09:25:06 +00:00
sys.S390X, sys.PPC64)
addF("math/bits", "TrailingZeros8",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
x := s.newValue1(ssa.OpZeroExt8to32, types.Types[TUINT32], args[0])
c := s.constInt32(types.Types[TUINT32], 1<<8)
y := s.newValue2(ssa.OpOr32, types.Types[TUINT32], x, c)
return s.newValue1(ssa.OpCtz32, types.Types[TINT], y)
},
sys.MIPS)
cmd/compile: optimize TrailingZeros(8|16) on amd64 Introduce Ctz8 and Ctz16 ops and provide optimized lowerings for them. amd64 only for this CL, although it wouldn't surprise me if other architectures also admit of optimized lowerings. name old time/op new time/op delta TrailingZeros8-8 1.33ns ± 6% 0.84ns ± 3% -36.90% (p=0.000 n=20+20) TrailingZeros16-8 1.26ns ± 5% 0.84ns ± 5% -33.50% (p=0.000 n=20+18) Code: func f8(x uint8) { z = bits.TrailingZeros8(x) } func f16(x uint16) { z = bits.TrailingZeros16(x) } Before: "".f8 STEXT nosplit size=34 args=0x8 locals=0x0 0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX 0x0005 00005 (x.go:7) MOVBLZX AL, AX 0x0008 00008 (x.go:7) BTSQ $8, AX 0x000d 00013 (x.go:7) BSFQ AX, AX 0x0011 00017 (x.go:7) MOVL $64, CX 0x0016 00022 (x.go:7) CMOVQEQ CX, AX 0x001a 00026 (x.go:7) MOVQ AX, "".z(SB) 0x0021 00033 (x.go:7) RET "".f16 STEXT nosplit size=34 args=0x8 locals=0x0 0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX 0x0005 00005 (x.go:8) MOVWLZX AX, AX 0x0008 00008 (x.go:8) BTSQ $16, AX 0x000d 00013 (x.go:8) BSFQ AX, AX 0x0011 00017 (x.go:8) MOVL $64, CX 0x0016 00022 (x.go:8) CMOVQEQ CX, AX 0x001a 00026 (x.go:8) MOVQ AX, "".z(SB) 0x0021 00033 (x.go:8) RET After: "".f8 STEXT nosplit size=20 args=0x8 locals=0x0 0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX 0x0005 00005 (x.go:7) BTSL $8, AX 0x0009 00009 (x.go:7) BSFL AX, AX 0x000c 00012 (x.go:7) MOVQ AX, "".z(SB) 0x0013 00019 (x.go:7) RET "".f16 STEXT nosplit size=20 args=0x8 locals=0x0 0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX 0x0005 00005 (x.go:8) BTSL $16, AX 0x0009 00009 (x.go:8) BSFL AX, AX 0x000c 00012 (x.go:8) MOVQ AX, "".z(SB) 0x0013 00019 (x.go:8) RET Change-Id: I0551e357348de2b724737d569afd6ac9f5c3aa11 Reviewed-on: https://go-review.googlesource.com/108940 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Keith Randall <khr@golang.org>
2018-04-23 14:46:41 -07:00
addF("math/bits", "TrailingZeros8",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpCtz8, types.Types[TINT], args[0])
},
sys.AMD64, sys.ARM, sys.ARM64, sys.Wasm)
addF("math/bits", "TrailingZeros8",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
x := s.newValue1(ssa.OpZeroExt8to64, types.Types[TUINT64], args[0])
c := s.constInt64(types.Types[TUINT64], 1<<8)
y := s.newValue2(ssa.OpOr64, types.Types[TUINT64], x, c)
return s.newValue1(ssa.OpCtz64, types.Types[TINT], y)
},
cmd/compile: eliminate unnecessary type conversions in TrailingZeros(16|8) for arm64 This CL eliminates unnecessary type conversion operations: OpZeroExt16to64 and OpZeroExt8to64. If the input argrument is a nonzero value, then ORconst operation can also be eliminated. Benchmarks: name old time/op new time/op delta TrailingZeros-8 2.75ns ± 0% 2.75ns ± 0% ~ (all equal) TrailingZeros8-8 3.49ns ± 1% 2.93ns ± 0% -16.00% (p=0.000 n=10+10) TrailingZeros16-8 3.49ns ± 1% 2.93ns ± 0% -16.05% (p=0.000 n=9+10) TrailingZeros32-8 2.67ns ± 1% 2.68ns ± 1% ~ (p=0.468 n=10+10) TrailingZeros64-8 2.67ns ± 1% 2.65ns ± 0% -0.62% (p=0.022 n=10+9) code: func f16(x uint) { z = bits.TrailingZeros16(uint16(x)) } Before: "".f16 STEXT size=48 args=0x8 locals=0x0 leaf 0x0000 00000 (test.go:7) TEXT "".f16(SB), LEAF|NOFRAME|ABIInternal, $0-8 0x0000 00000 (test.go:7) FUNCDATA ZR, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (test.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (test.go:7) FUNCDATA $3, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (test.go:7) PCDATA $2, ZR 0x0000 00000 (test.go:7) PCDATA ZR, ZR 0x0000 00000 (test.go:7) MOVD "".x(FP), R0 0x0004 00004 (test.go:7) MOVHU R0, R0 0x0008 00008 (test.go:7) ORR $65536, R0, R0 0x000c 00012 (test.go:7) RBIT R0, R0 0x0010 00016 (test.go:7) CLZ R0, R0 0x0014 00020 (test.go:7) MOVD R0, "".z(SB) 0x0020 00032 (test.go:7) RET (R30) This line of code is unnecessary: 0x0004 00004 (test.go:7) MOVHU R0, R0 After: "".f16 STEXT size=32 args=0x8 locals=0x0 leaf 0x0000 00000 (test.go:7) TEXT "".f16(SB), LEAF|NOFRAME|ABIInternal, $0-8 0x0000 00000 (test.go:7) FUNCDATA ZR, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (test.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (test.go:7) FUNCDATA $3, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (test.go:7) PCDATA $2, ZR 0x0000 00000 (test.go:7) PCDATA ZR, ZR 0x0000 00000 (test.go:7) MOVD "".x(FP), R0 0x0004 00004 (test.go:7) ORR $65536, R0, R0 0x0008 00008 (test.go:7) RBITW R0, R0 0x000c 00012 (test.go:7) CLZW R0, R0 0x0010 00016 (test.go:7) MOVD R0, "".z(SB) 0x001c 00028 (test.go:7) RET (R30) The situation of TrailingZeros8 is similar to TrailingZeros16. Change-Id: I473bdca06be8460a0be87abbae6fe640017e4c9d Reviewed-on: https://go-review.googlesource.com/c/go/+/156999 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-01-03 09:25:06 +00:00
sys.S390X)
alias("math/bits", "ReverseBytes64", "runtime/internal/sys", "Bswap64", all...)
alias("math/bits", "ReverseBytes32", "runtime/internal/sys", "Bswap32", all...)
// ReverseBytes inlines correctly, no need to intrinsify it.
// ReverseBytes16 lowers to a rotate, no need for anything special here.
addF("math/bits", "Len64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpBitLen64, types.Types[TINT], args[0])
},
sys.AMD64, sys.ARM64, sys.ARM, sys.S390X, sys.MIPS, sys.PPC64, sys.Wasm)
cmd/compile: optimize LeadingZeros(16|32) on amd64 Introduce Len8 and Len16 ops and provide optimized lowerings for them. amd64 only for this CL, although it wouldn't surprise me if other architectures also admit of optimized lowerings. Also use and optimize the Len32 lowering, along the same lines. Leave Len8 unused for the moment; a subsequent CL will enable it. For 16 and 32 bits, this leads to a speed-up. name old time/op new time/op delta LeadingZeros16-8 1.42ns ± 5% 1.23ns ± 5% -13.42% (p=0.000 n=20+20) LeadingZeros32-8 1.25ns ± 5% 1.03ns ± 5% -17.63% (p=0.000 n=20+16) Code: func f16(x uint16) { z = bits.LeadingZeros16(x) } func f32(x uint32) { z = bits.LeadingZeros32(x) } Before: "".f16 STEXT nosplit size=38 args=0x8 locals=0x0 0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX 0x0005 00005 (x.go:8) MOVWLZX AX, AX 0x0008 00008 (x.go:8) BSRQ AX, AX 0x000c 00012 (x.go:8) MOVQ $-1, CX 0x0013 00019 (x.go:8) CMOVQEQ CX, AX 0x0017 00023 (x.go:8) ADDQ $-15, AX 0x001b 00027 (x.go:8) NEGQ AX 0x001e 00030 (x.go:8) MOVQ AX, "".z(SB) 0x0025 00037 (x.go:8) RET "".f32 STEXT nosplit size=34 args=0x8 locals=0x0 0x0000 00000 (x.go:9) TEXT "".f32(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:9) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:9) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:9) MOVL "".x+8(SP), AX 0x0004 00004 (x.go:9) BSRQ AX, AX 0x0008 00008 (x.go:9) MOVQ $-1, CX 0x000f 00015 (x.go:9) CMOVQEQ CX, AX 0x0013 00019 (x.go:9) ADDQ $-31, AX 0x0017 00023 (x.go:9) NEGQ AX 0x001a 00026 (x.go:9) MOVQ AX, "".z(SB) 0x0021 00033 (x.go:9) RET After: "".f16 STEXT nosplit size=30 args=0x8 locals=0x0 0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX 0x0005 00005 (x.go:8) MOVWLZX AX, AX 0x0008 00008 (x.go:8) LEAL 1(AX)(AX*1), AX 0x000c 00012 (x.go:8) BSRL AX, AX 0x000f 00015 (x.go:8) ADDQ $-16, AX 0x0013 00019 (x.go:8) NEGQ AX 0x0016 00022 (x.go:8) MOVQ AX, "".z(SB) 0x001d 00029 (x.go:8) RET "".f32 STEXT nosplit size=28 args=0x8 locals=0x0 0x0000 00000 (x.go:9) TEXT "".f32(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:9) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:9) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:9) MOVL "".x+8(SP), AX 0x0004 00004 (x.go:9) LEAQ 1(AX)(AX*1), AX 0x0009 00009 (x.go:9) BSRQ AX, AX 0x000d 00013 (x.go:9) ADDQ $-32, AX 0x0011 00017 (x.go:9) NEGQ AX 0x0014 00020 (x.go:9) MOVQ AX, "".z(SB) 0x001b 00027 (x.go:9) RET Change-Id: I6c93c173752a7bfdeab8be30777ae05a736e1f4b Reviewed-on: https://go-review.googlesource.com/108941 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Keith Randall <khr@golang.org>
2018-04-23 14:54:45 -07:00
addF("math/bits", "Len32",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpBitLen32, types.Types[TINT], args[0])
},
cmd/compile: optimize math/bits Len32 intrinsic on arm64 Arm64 has a 32-bit CLZ instruction CLZW, which can be used for intrinsic Len32. Function LeadingZeros32 calls Len32, with this change, the assembly code of LeadingZeros32 becomes more concise. Go code: func f32(x uint32) { z = bits.LeadingZeros32(x) } Before: "".f32 STEXT size=32 args=0x8 locals=0x0 leaf 0x0000 00000 (test.go:7) TEXT "".f32(SB), LEAF|NOFRAME|ABIInternal, $0-8 0x0004 00004 (test.go:7) MOVWU "".x(FP), R0 0x0008 00008 ($GOROOT/src/math/bits/bits.go:30) CLZ R0, R0 0x000c 00012 ($GOROOT/src/math/bits/bits.go:30) SUB $32, R0, R0 0x0010 00016 (test.go:7) MOVD R0, "".z(SB) 0x001c 00028 (test.go:7) RET (R30) After: "".f32 STEXT size=32 args=0x8 locals=0x0 leaf 0x0000 00000 (test.go:7) TEXT "".f32(SB), LEAF|NOFRAME|ABIInternal, $0-8 0x0004 00004 (test.go:7) MOVWU "".x(FP), R0 0x0008 00008 ($GOROOT/src/math/bits/bits.go:30) CLZW R0, R0 0x000c 00012 (test.go:7) MOVD R0, "".z(SB) 0x0018 00024 (test.go:7) RET (R30) Benchmarks: name old time/op new time/op delta LeadingZeros-8 2.53ns ± 0% 2.55ns ± 0% +0.67% (p=0.000 n=10+10) LeadingZeros8-8 3.56ns ± 0% 3.56ns ± 0% ~ (all equal) LeadingZeros16-8 3.55ns ± 0% 3.56ns ± 0% ~ (p=0.465 n=10+10) LeadingZeros32-8 3.55ns ± 0% 2.96ns ± 0% -16.71% (p=0.000 n=10+7) LeadingZeros64-8 2.53ns ± 0% 2.54ns ± 0% ~ (p=0.059 n=8+10) Change-Id: Ie5666bb82909e341060e02ffd4e86c0e5d67e90a Reviewed-on: https://go-review.googlesource.com/c/157000 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-01-02 09:14:26 +00:00
sys.AMD64, sys.ARM64)
addF("math/bits", "Len32",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
if s.config.PtrSize == 4 {
return s.newValue1(ssa.OpBitLen32, types.Types[TINT], args[0])
}
x := s.newValue1(ssa.OpZeroExt32to64, types.Types[TUINT64], args[0])
return s.newValue1(ssa.OpBitLen64, types.Types[TINT], x)
},
sys.ARM, sys.S390X, sys.MIPS, sys.PPC64, sys.Wasm)
addF("math/bits", "Len16",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
if s.config.PtrSize == 4 {
x := s.newValue1(ssa.OpZeroExt16to32, types.Types[TUINT32], args[0])
return s.newValue1(ssa.OpBitLen32, types.Types[TINT], x)
}
x := s.newValue1(ssa.OpZeroExt16to64, types.Types[TUINT64], args[0])
return s.newValue1(ssa.OpBitLen64, types.Types[TINT], x)
},
sys.ARM64, sys.ARM, sys.S390X, sys.MIPS, sys.PPC64, sys.Wasm)
cmd/compile: optimize LeadingZeros(16|32) on amd64 Introduce Len8 and Len16 ops and provide optimized lowerings for them. amd64 only for this CL, although it wouldn't surprise me if other architectures also admit of optimized lowerings. Also use and optimize the Len32 lowering, along the same lines. Leave Len8 unused for the moment; a subsequent CL will enable it. For 16 and 32 bits, this leads to a speed-up. name old time/op new time/op delta LeadingZeros16-8 1.42ns ± 5% 1.23ns ± 5% -13.42% (p=0.000 n=20+20) LeadingZeros32-8 1.25ns ± 5% 1.03ns ± 5% -17.63% (p=0.000 n=20+16) Code: func f16(x uint16) { z = bits.LeadingZeros16(x) } func f32(x uint32) { z = bits.LeadingZeros32(x) } Before: "".f16 STEXT nosplit size=38 args=0x8 locals=0x0 0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX 0x0005 00005 (x.go:8) MOVWLZX AX, AX 0x0008 00008 (x.go:8) BSRQ AX, AX 0x000c 00012 (x.go:8) MOVQ $-1, CX 0x0013 00019 (x.go:8) CMOVQEQ CX, AX 0x0017 00023 (x.go:8) ADDQ $-15, AX 0x001b 00027 (x.go:8) NEGQ AX 0x001e 00030 (x.go:8) MOVQ AX, "".z(SB) 0x0025 00037 (x.go:8) RET "".f32 STEXT nosplit size=34 args=0x8 locals=0x0 0x0000 00000 (x.go:9) TEXT "".f32(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:9) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:9) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:9) MOVL "".x+8(SP), AX 0x0004 00004 (x.go:9) BSRQ AX, AX 0x0008 00008 (x.go:9) MOVQ $-1, CX 0x000f 00015 (x.go:9) CMOVQEQ CX, AX 0x0013 00019 (x.go:9) ADDQ $-31, AX 0x0017 00023 (x.go:9) NEGQ AX 0x001a 00026 (x.go:9) MOVQ AX, "".z(SB) 0x0021 00033 (x.go:9) RET After: "".f16 STEXT nosplit size=30 args=0x8 locals=0x0 0x0000 00000 (x.go:8) TEXT "".f16(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:8) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:8) MOVWLZX "".x+8(SP), AX 0x0005 00005 (x.go:8) MOVWLZX AX, AX 0x0008 00008 (x.go:8) LEAL 1(AX)(AX*1), AX 0x000c 00012 (x.go:8) BSRL AX, AX 0x000f 00015 (x.go:8) ADDQ $-16, AX 0x0013 00019 (x.go:8) NEGQ AX 0x0016 00022 (x.go:8) MOVQ AX, "".z(SB) 0x001d 00029 (x.go:8) RET "".f32 STEXT nosplit size=28 args=0x8 locals=0x0 0x0000 00000 (x.go:9) TEXT "".f32(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:9) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:9) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:9) MOVL "".x+8(SP), AX 0x0004 00004 (x.go:9) LEAQ 1(AX)(AX*1), AX 0x0009 00009 (x.go:9) BSRQ AX, AX 0x000d 00013 (x.go:9) ADDQ $-32, AX 0x0011 00017 (x.go:9) NEGQ AX 0x0014 00020 (x.go:9) MOVQ AX, "".z(SB) 0x001b 00027 (x.go:9) RET Change-Id: I6c93c173752a7bfdeab8be30777ae05a736e1f4b Reviewed-on: https://go-review.googlesource.com/108941 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Keith Randall <khr@golang.org>
2018-04-23 14:54:45 -07:00
addF("math/bits", "Len16",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpBitLen16, types.Types[TINT], args[0])
},
sys.AMD64)
addF("math/bits", "Len8",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
if s.config.PtrSize == 4 {
x := s.newValue1(ssa.OpZeroExt8to32, types.Types[TUINT32], args[0])
return s.newValue1(ssa.OpBitLen32, types.Types[TINT], x)
}
x := s.newValue1(ssa.OpZeroExt8to64, types.Types[TUINT64], args[0])
return s.newValue1(ssa.OpBitLen64, types.Types[TINT], x)
},
sys.ARM64, sys.ARM, sys.S390X, sys.MIPS, sys.PPC64, sys.Wasm)
cmd/compile: use intrinsic for LeadingZeros8 on amd64 The previous change sped up the pure computation form of LeadingZeros8. This places it somewhat close to the table lookup form. Depending on something that varies from toolchain to toolchain (alignment, perhaps?), the slowdown from ditching the table lookup is either 20% or 5%. This benchmark is the best case scenario for the table lookup: It is in the L1 cache already. I think we're close enough that we can switch to the computational version, and trust that the memory effects and binary size savings will be worth it. Code: func f8(x uint8) { z = bits.LeadingZeros8(x) } Before: "".f8 STEXT nosplit size=34 args=0x8 locals=0x0 0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX 0x0005 00005 (x.go:7) MOVBLZX AL, AX 0x0008 00008 (x.go:7) LEAQ math/bits.len8tab(SB), CX 0x000f 00015 (x.go:7) MOVBLZX (CX)(AX*1), AX 0x0013 00019 (x.go:7) ADDQ $-8, AX 0x0017 00023 (x.go:7) NEGQ AX 0x001a 00026 (x.go:7) MOVQ AX, "".z(SB) 0x0021 00033 (x.go:7) RET After: "".f8 STEXT nosplit size=30 args=0x8 locals=0x0 0x0000 00000 (x.go:7) TEXT "".f8(SB), NOSPLIT, $0-8 0x0000 00000 (x.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0000 00000 (x.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:7) MOVBLZX "".x+8(SP), AX 0x0005 00005 (x.go:7) MOVBLZX AL, AX 0x0008 00008 (x.go:7) LEAL 1(AX)(AX*1), AX 0x000c 00012 (x.go:7) BSRL AX, AX 0x000f 00015 (x.go:7) ADDQ $-8, AX 0x0013 00019 (x.go:7) NEGQ AX 0x0016 00022 (x.go:7) MOVQ AX, "".z(SB) 0x001d 00029 (x.go:7) RET Change-Id: Icc7db50a7820fb9a3da8a816d6b6940d7f8e193e Reviewed-on: https://go-review.googlesource.com/108942 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-04-23 15:38:50 -07:00
addF("math/bits", "Len8",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpBitLen8, types.Types[TINT], args[0])
},
sys.AMD64)
addF("math/bits", "Len",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
if s.config.PtrSize == 4 {
return s.newValue1(ssa.OpBitLen32, types.Types[TINT], args[0])
}
return s.newValue1(ssa.OpBitLen64, types.Types[TINT], args[0])
},
sys.AMD64, sys.ARM64, sys.ARM, sys.S390X, sys.MIPS, sys.PPC64, sys.Wasm)
// LeadingZeros is handled because it trivially calls Len.
addF("math/bits", "Reverse64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpBitRev64, types.Types[TINT], args[0])
},
sys.ARM64)
addF("math/bits", "Reverse32",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpBitRev32, types.Types[TINT], args[0])
},
sys.ARM64)
addF("math/bits", "Reverse16",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpBitRev16, types.Types[TINT], args[0])
},
sys.ARM64)
addF("math/bits", "Reverse8",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpBitRev8, types.Types[TINT], args[0])
},
sys.ARM64)
addF("math/bits", "Reverse",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
if s.config.PtrSize == 4 {
return s.newValue1(ssa.OpBitRev32, types.Types[TINT], args[0])
}
return s.newValue1(ssa.OpBitRev64, types.Types[TINT], args[0])
},
sys.ARM64)
addF("math/bits", "RotateLeft8",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue2(ssa.OpRotateLeft8, types.Types[TUINT8], args[0], args[1])
},
sys.AMD64)
addF("math/bits", "RotateLeft16",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue2(ssa.OpRotateLeft16, types.Types[TUINT16], args[0], args[1])
},
sys.AMD64)
addF("math/bits", "RotateLeft32",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue2(ssa.OpRotateLeft32, types.Types[TUINT32], args[0], args[1])
},
sys.AMD64, sys.ARM64, sys.S390X, sys.PPC64)
addF("math/bits", "RotateLeft64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue2(ssa.OpRotateLeft64, types.Types[TUINT64], args[0], args[1])
},
sys.AMD64, sys.ARM64, sys.S390X, sys.PPC64, sys.Wasm)
alias("math/bits", "RotateLeft", "math/bits", "RotateLeft64", p8...)
makeOnesCountAMD64 := func(op64 ssa.Op, op32 ssa.Op) func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
addr := s.entryNewValue1A(ssa.OpAddr, types.Types[TBOOL].PtrTo(), x86HasPOPCNT, s.sb)
v := s.load(types.Types[TBOOL], addr)
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(v)
bTrue := s.f.NewBlock(ssa.BlockPlain)
bFalse := s.f.NewBlock(ssa.BlockPlain)
bEnd := s.f.NewBlock(ssa.BlockPlain)
b.AddEdgeTo(bTrue)
b.AddEdgeTo(bFalse)
b.Likely = ssa.BranchLikely // most machines have popcnt nowadays
// We have the intrinsic - use it directly.
s.startBlock(bTrue)
op := op64
if s.config.PtrSize == 4 {
op = op32
}
s.vars[n] = s.newValue1(op, types.Types[TINT], args[0])
s.endBlock().AddEdgeTo(bEnd)
// Call the pure Go version.
s.startBlock(bFalse)
a := s.call(n, callNormal)
s.vars[n] = s.load(types.Types[TINT], a)
s.endBlock().AddEdgeTo(bEnd)
// Merge results.
s.startBlock(bEnd)
return s.variable(n, types.Types[TINT])
}
}
addF("math/bits", "OnesCount64",
makeOnesCountAMD64(ssa.OpPopCount64, ssa.OpPopCount64),
sys.AMD64)
addF("math/bits", "OnesCount64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpPopCount64, types.Types[TINT], args[0])
},
sys.PPC64, sys.ARM64, sys.S390X, sys.Wasm)
addF("math/bits", "OnesCount32",
makeOnesCountAMD64(ssa.OpPopCount32, ssa.OpPopCount32),
sys.AMD64)
addF("math/bits", "OnesCount32",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpPopCount32, types.Types[TINT], args[0])
},
sys.PPC64, sys.ARM64, sys.S390X, sys.Wasm)
addF("math/bits", "OnesCount16",
makeOnesCountAMD64(ssa.OpPopCount16, ssa.OpPopCount16),
sys.AMD64)
addF("math/bits", "OnesCount16",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpPopCount16, types.Types[TINT], args[0])
},
sys.ARM64, sys.S390X, sys.PPC64, sys.Wasm)
cmd/compile: implement OnesCount{8,16,32,64} intrinsics on s390x This CL implements the math/bits.OnesCount{8,16,32,64} functions as intrinsics on s390x using the 'population count' (popcnt) instruction. This instruction was released as the 'population-count' facility which uses the same facility bit (45) as the 'distinct-operands' facility which is a pre-requisite for Go on s390x. We can therefore use it without a feature check. The s390x popcnt instruction treats a 64 bit register as a vector of 8 bytes, summing the number of ones in each byte individually. It then writes the results to the corresponding bytes in the output register. Therefore to implement OnesCount{16,32,64} we need to sum the individual byte counts using some extra instructions. To do this efficiently I've added some additional pseudo operations to the s390x SSA backend. Unlike other architectures the new instruction sequence is faster for OnesCount8, so that is implemented using the intrinsic. name old time/op new time/op delta OnesCount 3.21ns ± 1% 1.35ns ± 0% -58.00% (p=0.000 n=20+20) OnesCount8 0.91ns ± 1% 0.81ns ± 0% -11.43% (p=0.000 n=20+20) OnesCount16 1.51ns ± 3% 1.21ns ± 0% -19.71% (p=0.000 n=20+17) OnesCount32 1.91ns ± 0% 1.12ns ± 1% -41.60% (p=0.000 n=19+20) OnesCount64 3.18ns ± 4% 1.35ns ± 0% -57.52% (p=0.000 n=20+20) Change-Id: Id54f0bd28b6db9a887ad12c0d72fcc168ef9c4e0 Reviewed-on: https://go-review.googlesource.com/114675 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-05-25 17:54:58 +01:00
addF("math/bits", "OnesCount8",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpPopCount8, types.Types[TINT], args[0])
},
sys.S390X, sys.PPC64, sys.Wasm)
addF("math/bits", "OnesCount",
makeOnesCountAMD64(ssa.OpPopCount64, ssa.OpPopCount32),
sys.AMD64)
cmd/compile: intrinsify math/bits.Mul Add SSA rules to intrinsify Mul/Mul64 (AMD64 and ARM64). SSA rules for other functions and architectures are left as a future optimization. Benchmark results on AMD64/ARM64 before and after SSA implementation are below. amd64 name old time/op new time/op delta Add-4 1.78ns ± 0% 1.85ns ±12% ~ (p=0.397 n=4+5) Add32-4 1.71ns ± 1% 1.70ns ± 0% ~ (p=0.683 n=5+5) Add64-4 1.80ns ± 2% 1.77ns ± 0% -1.22% (p=0.048 n=5+5) Sub-4 1.78ns ± 0% 1.78ns ± 0% ~ (all equal) Sub32-4 1.78ns ± 1% 1.78ns ± 0% ~ (p=1.000 n=5+5) Sub64-4 1.78ns ± 1% 1.78ns ± 0% ~ (p=0.968 n=5+4) Mul-4 11.5ns ± 1% 1.8ns ± 2% -84.39% (p=0.008 n=5+5) Mul32-4 1.39ns ± 0% 1.38ns ± 3% ~ (p=0.175 n=5+5) Mul64-4 6.85ns ± 1% 1.78ns ± 1% -73.97% (p=0.008 n=5+5) Div-4 57.1ns ± 1% 56.7ns ± 0% ~ (p=0.087 n=5+5) Div32-4 18.0ns ± 0% 18.0ns ± 0% ~ (all equal) Div64-4 56.4ns ±10% 53.6ns ± 1% ~ (p=0.071 n=5+5) arm64 name old time/op new time/op delta Add-96 5.51ns ± 0% 5.51ns ± 0% ~ (all equal) Add32-96 5.51ns ± 0% 5.51ns ± 0% ~ (all equal) Add64-96 5.52ns ± 0% 5.51ns ± 0% ~ (p=0.444 n=5+5) Sub-96 5.51ns ± 0% 5.51ns ± 0% ~ (all equal) Sub32-96 5.51ns ± 0% 5.51ns ± 0% ~ (all equal) Sub64-96 5.51ns ± 0% 5.51ns ± 0% ~ (all equal) Mul-96 34.6ns ± 0% 5.0ns ± 0% -85.52% (p=0.008 n=5+5) Mul32-96 4.51ns ± 0% 4.51ns ± 0% ~ (all equal) Mul64-96 21.1ns ± 0% 5.0ns ± 0% -76.26% (p=0.008 n=5+5) Div-96 64.7ns ± 0% 64.7ns ± 0% ~ (all equal) Div32-96 17.0ns ± 0% 17.0ns ± 0% ~ (all equal) Div64-96 53.1ns ± 0% 53.1ns ± 0% ~ (all equal) Updates #24813 Change-Id: I9bda6d2102f65cae3d436a2087b47ed8bafeb068 Reviewed-on: https://go-review.googlesource.com/129415 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-08-14 16:41:22 -06:00
addF("math/bits", "Mul64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue2(ssa.OpMul64uhilo, types.NewTuple(types.Types[TUINT64], types.Types[TUINT64]), args[0], args[1])
},
sys.AMD64, sys.ARM64, sys.PPC64)
alias("math/bits", "Mul", "math/bits", "Mul64", sys.ArchAMD64, sys.ArchARM64, sys.ArchPPC64)
addF("math/bits", "Add64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue3(ssa.OpAdd64carry, types.NewTuple(types.Types[TUINT64], types.Types[TUINT64]), args[0], args[1], args[2])
},
sys.AMD64, sys.ARM64)
alias("math/bits", "Add", "math/bits", "Add64", sys.ArchAMD64, sys.ArchARM64)
addF("math/bits", "Sub64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
return s.newValue3(ssa.OpSub64borrow, types.NewTuple(types.Types[TUINT64], types.Types[TUINT64]), args[0], args[1], args[2])
},
sys.AMD64)
alias("math/bits", "Sub", "math/bits", "Sub64", sys.ArchAMD64)
addF("math/bits", "Div64",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
// check for divide-by-zero/overflow and panic with appropriate message
cmpZero := s.newValue2(s.ssaOp(ONE, types.Types[TUINT64]), types.Types[TBOOL], args[2], s.zeroVal(types.Types[TUINT64]))
s.check(cmpZero, panicdivide)
cmpOverflow := s.newValue2(s.ssaOp(OLT, types.Types[TUINT64]), types.Types[TBOOL], args[0], args[2])
s.check(cmpOverflow, panicoverflow)
return s.newValue3(ssa.OpDiv128u, types.NewTuple(types.Types[TUINT64], types.Types[TUINT64]), args[0], args[1], args[2])
},
sys.AMD64)
alias("math/bits", "Div", "math/bits", "Div64", sys.ArchAMD64)
/******** sync/atomic ********/
// Note: these are disabled by flag_race in findIntrinsic below.
alias("sync/atomic", "LoadInt32", "runtime/internal/atomic", "Load", all...)
alias("sync/atomic", "LoadInt64", "runtime/internal/atomic", "Load64", all...)
alias("sync/atomic", "LoadPointer", "runtime/internal/atomic", "Loadp", all...)
alias("sync/atomic", "LoadUint32", "runtime/internal/atomic", "Load", all...)
alias("sync/atomic", "LoadUint64", "runtime/internal/atomic", "Load64", all...)
alias("sync/atomic", "LoadUintptr", "runtime/internal/atomic", "Load", p4...)
alias("sync/atomic", "LoadUintptr", "runtime/internal/atomic", "Load64", p8...)
alias("sync/atomic", "StoreInt32", "runtime/internal/atomic", "Store", all...)
alias("sync/atomic", "StoreInt64", "runtime/internal/atomic", "Store64", all...)
// Note: not StorePointer, that needs a write barrier. Same below for {CompareAnd}Swap.
alias("sync/atomic", "StoreUint32", "runtime/internal/atomic", "Store", all...)
alias("sync/atomic", "StoreUint64", "runtime/internal/atomic", "Store64", all...)
alias("sync/atomic", "StoreUintptr", "runtime/internal/atomic", "Store", p4...)
alias("sync/atomic", "StoreUintptr", "runtime/internal/atomic", "Store64", p8...)
alias("sync/atomic", "SwapInt32", "runtime/internal/atomic", "Xchg", all...)
alias("sync/atomic", "SwapInt64", "runtime/internal/atomic", "Xchg64", all...)
alias("sync/atomic", "SwapUint32", "runtime/internal/atomic", "Xchg", all...)
alias("sync/atomic", "SwapUint64", "runtime/internal/atomic", "Xchg64", all...)
alias("sync/atomic", "SwapUintptr", "runtime/internal/atomic", "Xchg", p4...)
alias("sync/atomic", "SwapUintptr", "runtime/internal/atomic", "Xchg64", p8...)
alias("sync/atomic", "CompareAndSwapInt32", "runtime/internal/atomic", "Cas", all...)
alias("sync/atomic", "CompareAndSwapInt64", "runtime/internal/atomic", "Cas64", all...)
alias("sync/atomic", "CompareAndSwapUint32", "runtime/internal/atomic", "Cas", all...)
alias("sync/atomic", "CompareAndSwapUint64", "runtime/internal/atomic", "Cas64", all...)
alias("sync/atomic", "CompareAndSwapUintptr", "runtime/internal/atomic", "Cas", p4...)
alias("sync/atomic", "CompareAndSwapUintptr", "runtime/internal/atomic", "Cas64", p8...)
alias("sync/atomic", "AddInt32", "runtime/internal/atomic", "Xadd", all...)
alias("sync/atomic", "AddInt64", "runtime/internal/atomic", "Xadd64", all...)
alias("sync/atomic", "AddUint32", "runtime/internal/atomic", "Xadd", all...)
alias("sync/atomic", "AddUint64", "runtime/internal/atomic", "Xadd64", all...)
alias("sync/atomic", "AddUintptr", "runtime/internal/atomic", "Xadd", p4...)
alias("sync/atomic", "AddUintptr", "runtime/internal/atomic", "Xadd64", p8...)
/******** math/big ********/
add("math/big", "mulWW",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
return s.newValue2(ssa.OpMul64uhilo, types.NewTuple(types.Types[TUINT64], types.Types[TUINT64]), args[0], args[1])
},
cmd/compile: intrinsify math/big.mulWW on ppc64x This change implements mulWW as an intrinsic for ppc64x. Performance numbers below: name old time/op new time/op delta QuoRem 4.54µs ±45% 3.22µs ± 0% -29.22% (p=0.029 n=4+4) ModSqrt225_Tonelli 765µs ± 3% 757µs ± 0% -1.02% (p=0.029 n=4+4) ModSqrt225_3Mod4 231µs ± 0% 231µs ± 0% -0.10% (p=0.029 n=4+4) ModSqrt231_Tonelli 789µs ± 0% 788µs ± 0% -0.14% (p=0.029 n=4+4) ModSqrt231_5Mod8 267µs ± 0% 267µs ± 0% -0.13% (p=0.029 n=4+4) Sqrt 49.5µs ±17% 45.3µs ± 0% -8.48% (p=0.029 n=4+4) IntSqr/1 32.2ns ±22% 24.2ns ± 0% -24.79% (p=0.029 n=4+4) IntSqr/2 60.6ns ± 0% 60.9ns ± 0% +0.50% (p=0.029 n=4+4) IntSqr/3 82.8ns ± 0% 83.3ns ± 0% +0.51% (p=0.029 n=4+4) IntSqr/5 122ns ± 0% 121ns ± 0% -1.22% (p=0.029 n=4+4) IntSqr/8 227ns ± 0% 226ns ± 0% -0.44% (p=0.029 n=4+4) IntSqr/10 300ns ± 0% 298ns ± 0% -0.67% (p=0.029 n=4+4) IntSqr/20 1.02µs ± 0% 0.89µs ± 0% -13.08% (p=0.029 n=4+4) IntSqr/30 1.73µs ± 0% 1.51µs ± 0% -12.73% (p=0.029 n=4+4) IntSqr/50 3.69µs ± 1% 3.29µs ± 0% -10.70% (p=0.029 n=4+4) IntSqr/80 7.64µs ± 0% 7.04µs ± 0% -7.91% (p=0.029 n=4+4) IntSqr/100 11.1µs ± 0% 10.3µs ± 0% -7.04% (p=0.029 n=4+4) IntSqr/200 37.9µs ± 0% 36.4µs ± 0% -4.13% (p=0.029 n=4+4) IntSqr/300 69.4µs ± 0% 66.0µs ± 0% -4.94% (p=0.029 n=4+4) IntSqr/500 174µs ± 0% 168µs ± 0% -3.10% (p=0.029 n=4+4) IntSqr/800 347µs ± 0% 333µs ± 0% -4.06% (p=0.029 n=4+4) IntSqr/1000 524µs ± 0% 507µs ± 0% -3.21% (p=0.029 n=4+4) Change-Id: If067452f5b6579ad3a2e9daa76a7ffe6fceae1bb Reviewed-on: https://go-review.googlesource.com/c/143217 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-10-15 22:29:05 -03:00
sys.ArchAMD64, sys.ArchARM64, sys.ArchPPC64LE, sys.ArchPPC64)
add("math/big", "divWW",
func(s *state, n *Node, args []*ssa.Value) *ssa.Value {
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
return s.newValue3(ssa.OpDiv128u, types.NewTuple(types.Types[TUINT64], types.Types[TUINT64]), args[0], args[1], args[2])
},
sys.ArchAMD64)
}
// findIntrinsic returns a function which builds the SSA equivalent of the
// function identified by the symbol sym. If sym is not an intrinsic call, returns nil.
func findIntrinsic(sym *types.Sym) intrinsicBuilder {
if ssa.IntrinsicsDisable {
return nil
}
if sym == nil || sym.Pkg == nil {
return nil
}
pkg := sym.Pkg.Path
if sym.Pkg == localpkg {
pkg = myimportpath
}
if flag_race && pkg == "sync/atomic" {
// The race detector needs to be able to intercept these calls.
// We can't intrinsify them.
return nil
}
// Skip intrinsifying math functions (which may contain hard-float
// instructions) when soft-float
if thearch.SoftFloat && pkg == "math" {
return nil
}
fn := sym.Name
return intrinsics[intrinsicKey{thearch.LinkArch.Arch, pkg, fn}]
}
func isIntrinsicCall(n *Node) bool {
if n == nil || n.Left == nil {
return false
}
return findIntrinsic(n.Left.Sym) != nil
}
// intrinsicCall converts a call to a recognized intrinsic function into the intrinsic SSA operation.
func (s *state) intrinsicCall(n *Node) *ssa.Value {
v := findIntrinsic(n.Left.Sym)(s, n, s.intrinsicArgs(n))
if ssa.IntrinsicsDebug > 0 {
x := v
if x == nil {
x = s.mem()
}
if x.Op == ssa.OpSelect0 || x.Op == ssa.OpSelect1 {
x = x.Args[0]
}
Warnl(n.Pos, "intrinsic substitution for %v with %s", n.Left.Sym.Name, x.LongString())
}
return v
}
// intrinsicArgs extracts args from n, evaluates them to SSA values, and returns them.
func (s *state) intrinsicArgs(n *Node) []*ssa.Value {
cmd/compile: move argument stack construction to SSA generation The goal of this change is to move work from walk to SSA, and simplify things along the way. This is hard to accomplish cleanly with small incremental changes, so this large commit message aims to provide a roadmap to the diff. High level description: Prior to this change, walk was responsible for constructing (most of) the stack for function calls. ascompatte gathered variadic arguments into a slice. It also rewrote n.List from a list of arguments to a list of assignments to stack slots. ascompatte was called multiple times to handle the receiver in a method call. reorder1 then introduced temporaries into n.List as needed to avoid smashing the stack. adjustargs then made extra stack space for go/defer args as needed. Node to SSA construction evaluated all the statements in n.List, and issued the function call, assuming that the stack was correctly constructed. Intrinsic calls had to dig around inside n.List to extract the arguments, since intrinsics don't use the stack to make function calls. This change moves stack construction to the SSA construction phase. ascompatte, now called walkParams, does all the work that ascompatte and reorder1 did. It handles variadic arguments, inserts the method receiver if needed, and allocates temporaries. It does not, however, make any assignments to stack slots. Instead, it moves the function arguments to n.Rlist, leaving assignments to temporaries in n.List. (It would be better to use Ninit instead of List; future work.) During SSA construction, after doing all the temporary assignments in n.List, the function arguments are assigned to stack slots by constructing the appropriate SSA Value, using (*state).storeArg. SSA construction also now handles adjustments for go/defer args. This change also simplifies intrinsic calls, since we no longer need to undo walk's work. Along the way, we simplify nodarg by pushing the fp==1 case to its callers, where it fits nicely. Generated code differences: There were a few optimizations applied along the way, the old way. f(g()) was rewritten to do a block copy of function results to function arguments. And reorder1 avoided introducing the final "save the stack" temporary in n.List. The f(g()) block copy optimization never actually triggered; the order pass rewrote away g(), so that has been removed. SSA optimizations mostly obviated the need for reorder1's optimization of avoiding the final temporary. The exception was when the temporary's type was not SSA-able; in that case, we got a Move into an autotmp and then an immediate Move onto the stack, with the autotmp never read or used again. This change introduces a new rewrite rule to detect such pointless double Moves and collapse them into a single Move. This is actually more powerful than the original optimization, since the original optimization relied on the imprecise Node.HasCall calculation. The other significant difference in the generated code is that the stack is now constructed completely in SP-offset order. Prior to this change, the stack was constructed somewhat haphazardly: first the final argument that Node.HasCall deemed to require a temporary, then other arguments, then the method receiver, then the defer/go args. SP-offset is probably a good default order. See future work. There are a few minor object file size changes as a result of this change. I investigated some regressions in early versions of this change. One regression (in archive/tar) was the addition of a single CMPQ instruction, which would be eliminated were this TODO from flagalloc to be done: // TODO: Remove original instructions if they are never used. One regression (in text/template) was an ADDQconstmodify that is now a regular MOVQLoad+ADDQconst+MOVQStore, due to an unlucky change in the order in which arguments are written. The argument change order can also now be luckier, so this appears to be a wash. All in all, though there will be minor winners and losers, this change appears to be performance neutral. Future work: Move loading the result of function calls to SSA construction; eliminate OINDREGSP. Consider pushing stack construction deeper into SSA world, perhaps in an arch-specific pass. Among other benefits, this would make it easier to transition to a new calling convention. This would require rethinking the handling of stack conflicts and is non-trivial. Figure out some clean way to indicate that stack construction Stores/Moves do not alias each other, so that subsequent passes may do things like CSE+tighten shared stack setup, do DSE using non-first Stores, etc. This would allow us to eliminate the minor text/template regression. Possibly make assignments to stack slots not treated as statements by DWARF. Compiler benchmarks: name old time/op new time/op delta Template 182ms ± 2% 179ms ± 2% -1.69% (p=0.000 n=47+48) Unicode 86.3ms ± 5% 85.1ms ± 4% -1.36% (p=0.001 n=50+50) GoTypes 646ms ± 1% 642ms ± 1% -0.63% (p=0.000 n=49+48) Compiler 2.89s ± 1% 2.86s ± 2% -1.36% (p=0.000 n=48+50) SSA 8.47s ± 1% 8.37s ± 2% -1.22% (p=0.000 n=47+50) Flate 122ms ± 2% 121ms ± 2% -0.66% (p=0.000 n=47+45) GoParser 147ms ± 2% 146ms ± 2% -0.53% (p=0.006 n=46+49) Reflect 406ms ± 2% 403ms ± 2% -0.76% (p=0.000 n=48+43) Tar 162ms ± 3% 162ms ± 4% ~ (p=0.191 n=46+50) XML 223ms ± 2% 222ms ± 2% -0.37% (p=0.031 n=45+49) [Geo mean] 382ms 378ms -0.89% name old user-time/op new user-time/op delta Template 219ms ± 3% 216ms ± 3% -1.56% (p=0.000 n=50+48) Unicode 109ms ± 6% 109ms ± 5% ~ (p=0.190 n=50+49) GoTypes 836ms ± 2% 828ms ± 2% -0.96% (p=0.000 n=49+48) Compiler 3.87s ± 2% 3.80s ± 1% -1.81% (p=0.000 n=49+46) SSA 12.0s ± 1% 11.8s ± 1% -2.01% (p=0.000 n=48+50) Flate 142ms ± 3% 141ms ± 3% -0.85% (p=0.003 n=50+48) GoParser 178ms ± 4% 175ms ± 4% -1.66% (p=0.000 n=48+46) Reflect 520ms ± 2% 512ms ± 2% -1.44% (p=0.000 n=45+48) Tar 200ms ± 3% 198ms ± 4% -0.61% (p=0.037 n=47+50) XML 277ms ± 3% 275ms ± 3% -0.85% (p=0.000 n=49+48) [Geo mean] 482ms 476ms -1.23% name old alloc/op new alloc/op delta Template 36.1MB ± 0% 35.3MB ± 0% -2.18% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 29.3MB ± 0% -1.58% (p=0.008 n=5+5) GoTypes 125MB ± 0% 123MB ± 0% -2.13% (p=0.008 n=5+5) Compiler 531MB ± 0% 513MB ± 0% -3.40% (p=0.008 n=5+5) SSA 2.00GB ± 0% 1.93GB ± 0% -3.34% (p=0.008 n=5+5) Flate 24.5MB ± 0% 24.3MB ± 0% -1.18% (p=0.008 n=5+5) GoParser 29.4MB ± 0% 28.7MB ± 0% -2.34% (p=0.008 n=5+5) Reflect 87.1MB ± 0% 86.0MB ± 0% -1.33% (p=0.008 n=5+5) Tar 35.3MB ± 0% 34.8MB ± 0% -1.44% (p=0.008 n=5+5) XML 47.9MB ± 0% 47.1MB ± 0% -1.86% (p=0.008 n=5+5) [Geo mean] 82.8MB 81.1MB -2.08% name old allocs/op new allocs/op delta Template 352k ± 0% 347k ± 0% -1.32% (p=0.008 n=5+5) Unicode 342k ± 0% 339k ± 0% -0.66% (p=0.008 n=5+5) GoTypes 1.29M ± 0% 1.27M ± 0% -1.30% (p=0.008 n=5+5) Compiler 4.98M ± 0% 4.87M ± 0% -2.14% (p=0.008 n=5+5) SSA 15.7M ± 0% 15.2M ± 0% -2.86% (p=0.008 n=5+5) Flate 233k ± 0% 231k ± 0% -0.83% (p=0.008 n=5+5) GoParser 296k ± 0% 291k ± 0% -1.54% (p=0.016 n=5+4) Reflect 1.05M ± 0% 1.04M ± 0% -0.65% (p=0.008 n=5+5) Tar 343k ± 0% 339k ± 0% -0.97% (p=0.008 n=5+5) XML 432k ± 0% 426k ± 0% -1.19% (p=0.008 n=5+5) [Geo mean] 815k 804k -1.35% name old object-bytes new object-bytes delta Template 505kB ± 0% 505kB ± 0% -0.01% (p=0.008 n=5+5) Unicode 224kB ± 0% 224kB ± 0% ~ (all equal) GoTypes 1.82MB ± 0% 1.83MB ± 0% +0.06% (p=0.008 n=5+5) Flate 324kB ± 0% 324kB ± 0% +0.00% (p=0.008 n=5+5) GoParser 402kB ± 0% 402kB ± 0% +0.04% (p=0.008 n=5+5) Reflect 1.39MB ± 0% 1.39MB ± 0% -0.01% (p=0.008 n=5+5) Tar 449kB ± 0% 449kB ± 0% -0.02% (p=0.008 n=5+5) XML 598kB ± 0% 597kB ± 0% -0.05% (p=0.008 n=5+5) Change-Id: Ifc9d5c1bd01f90171414b8fb18ffe2290d271143 Reviewed-on: https://go-review.googlesource.com/c/114797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-05-06 12:58:53 -07:00
// Construct map of temps; see comments in s.call about the structure of n.
temps := map[*Node]*ssa.Value{}
for _, a := range n.List.Slice() {
if a.Op != OAS {
cmd/compile: move argument stack construction to SSA generation The goal of this change is to move work from walk to SSA, and simplify things along the way. This is hard to accomplish cleanly with small incremental changes, so this large commit message aims to provide a roadmap to the diff. High level description: Prior to this change, walk was responsible for constructing (most of) the stack for function calls. ascompatte gathered variadic arguments into a slice. It also rewrote n.List from a list of arguments to a list of assignments to stack slots. ascompatte was called multiple times to handle the receiver in a method call. reorder1 then introduced temporaries into n.List as needed to avoid smashing the stack. adjustargs then made extra stack space for go/defer args as needed. Node to SSA construction evaluated all the statements in n.List, and issued the function call, assuming that the stack was correctly constructed. Intrinsic calls had to dig around inside n.List to extract the arguments, since intrinsics don't use the stack to make function calls. This change moves stack construction to the SSA construction phase. ascompatte, now called walkParams, does all the work that ascompatte and reorder1 did. It handles variadic arguments, inserts the method receiver if needed, and allocates temporaries. It does not, however, make any assignments to stack slots. Instead, it moves the function arguments to n.Rlist, leaving assignments to temporaries in n.List. (It would be better to use Ninit instead of List; future work.) During SSA construction, after doing all the temporary assignments in n.List, the function arguments are assigned to stack slots by constructing the appropriate SSA Value, using (*state).storeArg. SSA construction also now handles adjustments for go/defer args. This change also simplifies intrinsic calls, since we no longer need to undo walk's work. Along the way, we simplify nodarg by pushing the fp==1 case to its callers, where it fits nicely. Generated code differences: There were a few optimizations applied along the way, the old way. f(g()) was rewritten to do a block copy of function results to function arguments. And reorder1 avoided introducing the final "save the stack" temporary in n.List. The f(g()) block copy optimization never actually triggered; the order pass rewrote away g(), so that has been removed. SSA optimizations mostly obviated the need for reorder1's optimization of avoiding the final temporary. The exception was when the temporary's type was not SSA-able; in that case, we got a Move into an autotmp and then an immediate Move onto the stack, with the autotmp never read or used again. This change introduces a new rewrite rule to detect such pointless double Moves and collapse them into a single Move. This is actually more powerful than the original optimization, since the original optimization relied on the imprecise Node.HasCall calculation. The other significant difference in the generated code is that the stack is now constructed completely in SP-offset order. Prior to this change, the stack was constructed somewhat haphazardly: first the final argument that Node.HasCall deemed to require a temporary, then other arguments, then the method receiver, then the defer/go args. SP-offset is probably a good default order. See future work. There are a few minor object file size changes as a result of this change. I investigated some regressions in early versions of this change. One regression (in archive/tar) was the addition of a single CMPQ instruction, which would be eliminated were this TODO from flagalloc to be done: // TODO: Remove original instructions if they are never used. One regression (in text/template) was an ADDQconstmodify that is now a regular MOVQLoad+ADDQconst+MOVQStore, due to an unlucky change in the order in which arguments are written. The argument change order can also now be luckier, so this appears to be a wash. All in all, though there will be minor winners and losers, this change appears to be performance neutral. Future work: Move loading the result of function calls to SSA construction; eliminate OINDREGSP. Consider pushing stack construction deeper into SSA world, perhaps in an arch-specific pass. Among other benefits, this would make it easier to transition to a new calling convention. This would require rethinking the handling of stack conflicts and is non-trivial. Figure out some clean way to indicate that stack construction Stores/Moves do not alias each other, so that subsequent passes may do things like CSE+tighten shared stack setup, do DSE using non-first Stores, etc. This would allow us to eliminate the minor text/template regression. Possibly make assignments to stack slots not treated as statements by DWARF. Compiler benchmarks: name old time/op new time/op delta Template 182ms ± 2% 179ms ± 2% -1.69% (p=0.000 n=47+48) Unicode 86.3ms ± 5% 85.1ms ± 4% -1.36% (p=0.001 n=50+50) GoTypes 646ms ± 1% 642ms ± 1% -0.63% (p=0.000 n=49+48) Compiler 2.89s ± 1% 2.86s ± 2% -1.36% (p=0.000 n=48+50) SSA 8.47s ± 1% 8.37s ± 2% -1.22% (p=0.000 n=47+50) Flate 122ms ± 2% 121ms ± 2% -0.66% (p=0.000 n=47+45) GoParser 147ms ± 2% 146ms ± 2% -0.53% (p=0.006 n=46+49) Reflect 406ms ± 2% 403ms ± 2% -0.76% (p=0.000 n=48+43) Tar 162ms ± 3% 162ms ± 4% ~ (p=0.191 n=46+50) XML 223ms ± 2% 222ms ± 2% -0.37% (p=0.031 n=45+49) [Geo mean] 382ms 378ms -0.89% name old user-time/op new user-time/op delta Template 219ms ± 3% 216ms ± 3% -1.56% (p=0.000 n=50+48) Unicode 109ms ± 6% 109ms ± 5% ~ (p=0.190 n=50+49) GoTypes 836ms ± 2% 828ms ± 2% -0.96% (p=0.000 n=49+48) Compiler 3.87s ± 2% 3.80s ± 1% -1.81% (p=0.000 n=49+46) SSA 12.0s ± 1% 11.8s ± 1% -2.01% (p=0.000 n=48+50) Flate 142ms ± 3% 141ms ± 3% -0.85% (p=0.003 n=50+48) GoParser 178ms ± 4% 175ms ± 4% -1.66% (p=0.000 n=48+46) Reflect 520ms ± 2% 512ms ± 2% -1.44% (p=0.000 n=45+48) Tar 200ms ± 3% 198ms ± 4% -0.61% (p=0.037 n=47+50) XML 277ms ± 3% 275ms ± 3% -0.85% (p=0.000 n=49+48) [Geo mean] 482ms 476ms -1.23% name old alloc/op new alloc/op delta Template 36.1MB ± 0% 35.3MB ± 0% -2.18% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 29.3MB ± 0% -1.58% (p=0.008 n=5+5) GoTypes 125MB ± 0% 123MB ± 0% -2.13% (p=0.008 n=5+5) Compiler 531MB ± 0% 513MB ± 0% -3.40% (p=0.008 n=5+5) SSA 2.00GB ± 0% 1.93GB ± 0% -3.34% (p=0.008 n=5+5) Flate 24.5MB ± 0% 24.3MB ± 0% -1.18% (p=0.008 n=5+5) GoParser 29.4MB ± 0% 28.7MB ± 0% -2.34% (p=0.008 n=5+5) Reflect 87.1MB ± 0% 86.0MB ± 0% -1.33% (p=0.008 n=5+5) Tar 35.3MB ± 0% 34.8MB ± 0% -1.44% (p=0.008 n=5+5) XML 47.9MB ± 0% 47.1MB ± 0% -1.86% (p=0.008 n=5+5) [Geo mean] 82.8MB 81.1MB -2.08% name old allocs/op new allocs/op delta Template 352k ± 0% 347k ± 0% -1.32% (p=0.008 n=5+5) Unicode 342k ± 0% 339k ± 0% -0.66% (p=0.008 n=5+5) GoTypes 1.29M ± 0% 1.27M ± 0% -1.30% (p=0.008 n=5+5) Compiler 4.98M ± 0% 4.87M ± 0% -2.14% (p=0.008 n=5+5) SSA 15.7M ± 0% 15.2M ± 0% -2.86% (p=0.008 n=5+5) Flate 233k ± 0% 231k ± 0% -0.83% (p=0.008 n=5+5) GoParser 296k ± 0% 291k ± 0% -1.54% (p=0.016 n=5+4) Reflect 1.05M ± 0% 1.04M ± 0% -0.65% (p=0.008 n=5+5) Tar 343k ± 0% 339k ± 0% -0.97% (p=0.008 n=5+5) XML 432k ± 0% 426k ± 0% -1.19% (p=0.008 n=5+5) [Geo mean] 815k 804k -1.35% name old object-bytes new object-bytes delta Template 505kB ± 0% 505kB ± 0% -0.01% (p=0.008 n=5+5) Unicode 224kB ± 0% 224kB ± 0% ~ (all equal) GoTypes 1.82MB ± 0% 1.83MB ± 0% +0.06% (p=0.008 n=5+5) Flate 324kB ± 0% 324kB ± 0% +0.00% (p=0.008 n=5+5) GoParser 402kB ± 0% 402kB ± 0% +0.04% (p=0.008 n=5+5) Reflect 1.39MB ± 0% 1.39MB ± 0% -0.01% (p=0.008 n=5+5) Tar 449kB ± 0% 449kB ± 0% -0.02% (p=0.008 n=5+5) XML 598kB ± 0% 597kB ± 0% -0.05% (p=0.008 n=5+5) Change-Id: Ifc9d5c1bd01f90171414b8fb18ffe2290d271143 Reviewed-on: https://go-review.googlesource.com/c/114797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-05-06 12:58:53 -07:00
s.Fatalf("non-assignment as a temp function argument %v", a.Op)
}
l, r := a.Left, a.Right
cmd/compile: move argument stack construction to SSA generation The goal of this change is to move work from walk to SSA, and simplify things along the way. This is hard to accomplish cleanly with small incremental changes, so this large commit message aims to provide a roadmap to the diff. High level description: Prior to this change, walk was responsible for constructing (most of) the stack for function calls. ascompatte gathered variadic arguments into a slice. It also rewrote n.List from a list of arguments to a list of assignments to stack slots. ascompatte was called multiple times to handle the receiver in a method call. reorder1 then introduced temporaries into n.List as needed to avoid smashing the stack. adjustargs then made extra stack space for go/defer args as needed. Node to SSA construction evaluated all the statements in n.List, and issued the function call, assuming that the stack was correctly constructed. Intrinsic calls had to dig around inside n.List to extract the arguments, since intrinsics don't use the stack to make function calls. This change moves stack construction to the SSA construction phase. ascompatte, now called walkParams, does all the work that ascompatte and reorder1 did. It handles variadic arguments, inserts the method receiver if needed, and allocates temporaries. It does not, however, make any assignments to stack slots. Instead, it moves the function arguments to n.Rlist, leaving assignments to temporaries in n.List. (It would be better to use Ninit instead of List; future work.) During SSA construction, after doing all the temporary assignments in n.List, the function arguments are assigned to stack slots by constructing the appropriate SSA Value, using (*state).storeArg. SSA construction also now handles adjustments for go/defer args. This change also simplifies intrinsic calls, since we no longer need to undo walk's work. Along the way, we simplify nodarg by pushing the fp==1 case to its callers, where it fits nicely. Generated code differences: There were a few optimizations applied along the way, the old way. f(g()) was rewritten to do a block copy of function results to function arguments. And reorder1 avoided introducing the final "save the stack" temporary in n.List. The f(g()) block copy optimization never actually triggered; the order pass rewrote away g(), so that has been removed. SSA optimizations mostly obviated the need for reorder1's optimization of avoiding the final temporary. The exception was when the temporary's type was not SSA-able; in that case, we got a Move into an autotmp and then an immediate Move onto the stack, with the autotmp never read or used again. This change introduces a new rewrite rule to detect such pointless double Moves and collapse them into a single Move. This is actually more powerful than the original optimization, since the original optimization relied on the imprecise Node.HasCall calculation. The other significant difference in the generated code is that the stack is now constructed completely in SP-offset order. Prior to this change, the stack was constructed somewhat haphazardly: first the final argument that Node.HasCall deemed to require a temporary, then other arguments, then the method receiver, then the defer/go args. SP-offset is probably a good default order. See future work. There are a few minor object file size changes as a result of this change. I investigated some regressions in early versions of this change. One regression (in archive/tar) was the addition of a single CMPQ instruction, which would be eliminated were this TODO from flagalloc to be done: // TODO: Remove original instructions if they are never used. One regression (in text/template) was an ADDQconstmodify that is now a regular MOVQLoad+ADDQconst+MOVQStore, due to an unlucky change in the order in which arguments are written. The argument change order can also now be luckier, so this appears to be a wash. All in all, though there will be minor winners and losers, this change appears to be performance neutral. Future work: Move loading the result of function calls to SSA construction; eliminate OINDREGSP. Consider pushing stack construction deeper into SSA world, perhaps in an arch-specific pass. Among other benefits, this would make it easier to transition to a new calling convention. This would require rethinking the handling of stack conflicts and is non-trivial. Figure out some clean way to indicate that stack construction Stores/Moves do not alias each other, so that subsequent passes may do things like CSE+tighten shared stack setup, do DSE using non-first Stores, etc. This would allow us to eliminate the minor text/template regression. Possibly make assignments to stack slots not treated as statements by DWARF. Compiler benchmarks: name old time/op new time/op delta Template 182ms ± 2% 179ms ± 2% -1.69% (p=0.000 n=47+48) Unicode 86.3ms ± 5% 85.1ms ± 4% -1.36% (p=0.001 n=50+50) GoTypes 646ms ± 1% 642ms ± 1% -0.63% (p=0.000 n=49+48) Compiler 2.89s ± 1% 2.86s ± 2% -1.36% (p=0.000 n=48+50) SSA 8.47s ± 1% 8.37s ± 2% -1.22% (p=0.000 n=47+50) Flate 122ms ± 2% 121ms ± 2% -0.66% (p=0.000 n=47+45) GoParser 147ms ± 2% 146ms ± 2% -0.53% (p=0.006 n=46+49) Reflect 406ms ± 2% 403ms ± 2% -0.76% (p=0.000 n=48+43) Tar 162ms ± 3% 162ms ± 4% ~ (p=0.191 n=46+50) XML 223ms ± 2% 222ms ± 2% -0.37% (p=0.031 n=45+49) [Geo mean] 382ms 378ms -0.89% name old user-time/op new user-time/op delta Template 219ms ± 3% 216ms ± 3% -1.56% (p=0.000 n=50+48) Unicode 109ms ± 6% 109ms ± 5% ~ (p=0.190 n=50+49) GoTypes 836ms ± 2% 828ms ± 2% -0.96% (p=0.000 n=49+48) Compiler 3.87s ± 2% 3.80s ± 1% -1.81% (p=0.000 n=49+46) SSA 12.0s ± 1% 11.8s ± 1% -2.01% (p=0.000 n=48+50) Flate 142ms ± 3% 141ms ± 3% -0.85% (p=0.003 n=50+48) GoParser 178ms ± 4% 175ms ± 4% -1.66% (p=0.000 n=48+46) Reflect 520ms ± 2% 512ms ± 2% -1.44% (p=0.000 n=45+48) Tar 200ms ± 3% 198ms ± 4% -0.61% (p=0.037 n=47+50) XML 277ms ± 3% 275ms ± 3% -0.85% (p=0.000 n=49+48) [Geo mean] 482ms 476ms -1.23% name old alloc/op new alloc/op delta Template 36.1MB ± 0% 35.3MB ± 0% -2.18% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 29.3MB ± 0% -1.58% (p=0.008 n=5+5) GoTypes 125MB ± 0% 123MB ± 0% -2.13% (p=0.008 n=5+5) Compiler 531MB ± 0% 513MB ± 0% -3.40% (p=0.008 n=5+5) SSA 2.00GB ± 0% 1.93GB ± 0% -3.34% (p=0.008 n=5+5) Flate 24.5MB ± 0% 24.3MB ± 0% -1.18% (p=0.008 n=5+5) GoParser 29.4MB ± 0% 28.7MB ± 0% -2.34% (p=0.008 n=5+5) Reflect 87.1MB ± 0% 86.0MB ± 0% -1.33% (p=0.008 n=5+5) Tar 35.3MB ± 0% 34.8MB ± 0% -1.44% (p=0.008 n=5+5) XML 47.9MB ± 0% 47.1MB ± 0% -1.86% (p=0.008 n=5+5) [Geo mean] 82.8MB 81.1MB -2.08% name old allocs/op new allocs/op delta Template 352k ± 0% 347k ± 0% -1.32% (p=0.008 n=5+5) Unicode 342k ± 0% 339k ± 0% -0.66% (p=0.008 n=5+5) GoTypes 1.29M ± 0% 1.27M ± 0% -1.30% (p=0.008 n=5+5) Compiler 4.98M ± 0% 4.87M ± 0% -2.14% (p=0.008 n=5+5) SSA 15.7M ± 0% 15.2M ± 0% -2.86% (p=0.008 n=5+5) Flate 233k ± 0% 231k ± 0% -0.83% (p=0.008 n=5+5) GoParser 296k ± 0% 291k ± 0% -1.54% (p=0.016 n=5+4) Reflect 1.05M ± 0% 1.04M ± 0% -0.65% (p=0.008 n=5+5) Tar 343k ± 0% 339k ± 0% -0.97% (p=0.008 n=5+5) XML 432k ± 0% 426k ± 0% -1.19% (p=0.008 n=5+5) [Geo mean] 815k 804k -1.35% name old object-bytes new object-bytes delta Template 505kB ± 0% 505kB ± 0% -0.01% (p=0.008 n=5+5) Unicode 224kB ± 0% 224kB ± 0% ~ (all equal) GoTypes 1.82MB ± 0% 1.83MB ± 0% +0.06% (p=0.008 n=5+5) Flate 324kB ± 0% 324kB ± 0% +0.00% (p=0.008 n=5+5) GoParser 402kB ± 0% 402kB ± 0% +0.04% (p=0.008 n=5+5) Reflect 1.39MB ± 0% 1.39MB ± 0% -0.01% (p=0.008 n=5+5) Tar 449kB ± 0% 449kB ± 0% -0.02% (p=0.008 n=5+5) XML 598kB ± 0% 597kB ± 0% -0.05% (p=0.008 n=5+5) Change-Id: Ifc9d5c1bd01f90171414b8fb18ffe2290d271143 Reviewed-on: https://go-review.googlesource.com/c/114797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-05-06 12:58:53 -07:00
if l.Op != ONAME {
s.Fatalf("non-ONAME temp function argument %v", a.Op)
}
cmd/compile: move argument stack construction to SSA generation The goal of this change is to move work from walk to SSA, and simplify things along the way. This is hard to accomplish cleanly with small incremental changes, so this large commit message aims to provide a roadmap to the diff. High level description: Prior to this change, walk was responsible for constructing (most of) the stack for function calls. ascompatte gathered variadic arguments into a slice. It also rewrote n.List from a list of arguments to a list of assignments to stack slots. ascompatte was called multiple times to handle the receiver in a method call. reorder1 then introduced temporaries into n.List as needed to avoid smashing the stack. adjustargs then made extra stack space for go/defer args as needed. Node to SSA construction evaluated all the statements in n.List, and issued the function call, assuming that the stack was correctly constructed. Intrinsic calls had to dig around inside n.List to extract the arguments, since intrinsics don't use the stack to make function calls. This change moves stack construction to the SSA construction phase. ascompatte, now called walkParams, does all the work that ascompatte and reorder1 did. It handles variadic arguments, inserts the method receiver if needed, and allocates temporaries. It does not, however, make any assignments to stack slots. Instead, it moves the function arguments to n.Rlist, leaving assignments to temporaries in n.List. (It would be better to use Ninit instead of List; future work.) During SSA construction, after doing all the temporary assignments in n.List, the function arguments are assigned to stack slots by constructing the appropriate SSA Value, using (*state).storeArg. SSA construction also now handles adjustments for go/defer args. This change also simplifies intrinsic calls, since we no longer need to undo walk's work. Along the way, we simplify nodarg by pushing the fp==1 case to its callers, where it fits nicely. Generated code differences: There were a few optimizations applied along the way, the old way. f(g()) was rewritten to do a block copy of function results to function arguments. And reorder1 avoided introducing the final "save the stack" temporary in n.List. The f(g()) block copy optimization never actually triggered; the order pass rewrote away g(), so that has been removed. SSA optimizations mostly obviated the need for reorder1's optimization of avoiding the final temporary. The exception was when the temporary's type was not SSA-able; in that case, we got a Move into an autotmp and then an immediate Move onto the stack, with the autotmp never read or used again. This change introduces a new rewrite rule to detect such pointless double Moves and collapse them into a single Move. This is actually more powerful than the original optimization, since the original optimization relied on the imprecise Node.HasCall calculation. The other significant difference in the generated code is that the stack is now constructed completely in SP-offset order. Prior to this change, the stack was constructed somewhat haphazardly: first the final argument that Node.HasCall deemed to require a temporary, then other arguments, then the method receiver, then the defer/go args. SP-offset is probably a good default order. See future work. There are a few minor object file size changes as a result of this change. I investigated some regressions in early versions of this change. One regression (in archive/tar) was the addition of a single CMPQ instruction, which would be eliminated were this TODO from flagalloc to be done: // TODO: Remove original instructions if they are never used. One regression (in text/template) was an ADDQconstmodify that is now a regular MOVQLoad+ADDQconst+MOVQStore, due to an unlucky change in the order in which arguments are written. The argument change order can also now be luckier, so this appears to be a wash. All in all, though there will be minor winners and losers, this change appears to be performance neutral. Future work: Move loading the result of function calls to SSA construction; eliminate OINDREGSP. Consider pushing stack construction deeper into SSA world, perhaps in an arch-specific pass. Among other benefits, this would make it easier to transition to a new calling convention. This would require rethinking the handling of stack conflicts and is non-trivial. Figure out some clean way to indicate that stack construction Stores/Moves do not alias each other, so that subsequent passes may do things like CSE+tighten shared stack setup, do DSE using non-first Stores, etc. This would allow us to eliminate the minor text/template regression. Possibly make assignments to stack slots not treated as statements by DWARF. Compiler benchmarks: name old time/op new time/op delta Template 182ms ± 2% 179ms ± 2% -1.69% (p=0.000 n=47+48) Unicode 86.3ms ± 5% 85.1ms ± 4% -1.36% (p=0.001 n=50+50) GoTypes 646ms ± 1% 642ms ± 1% -0.63% (p=0.000 n=49+48) Compiler 2.89s ± 1% 2.86s ± 2% -1.36% (p=0.000 n=48+50) SSA 8.47s ± 1% 8.37s ± 2% -1.22% (p=0.000 n=47+50) Flate 122ms ± 2% 121ms ± 2% -0.66% (p=0.000 n=47+45) GoParser 147ms ± 2% 146ms ± 2% -0.53% (p=0.006 n=46+49) Reflect 406ms ± 2% 403ms ± 2% -0.76% (p=0.000 n=48+43) Tar 162ms ± 3% 162ms ± 4% ~ (p=0.191 n=46+50) XML 223ms ± 2% 222ms ± 2% -0.37% (p=0.031 n=45+49) [Geo mean] 382ms 378ms -0.89% name old user-time/op new user-time/op delta Template 219ms ± 3% 216ms ± 3% -1.56% (p=0.000 n=50+48) Unicode 109ms ± 6% 109ms ± 5% ~ (p=0.190 n=50+49) GoTypes 836ms ± 2% 828ms ± 2% -0.96% (p=0.000 n=49+48) Compiler 3.87s ± 2% 3.80s ± 1% -1.81% (p=0.000 n=49+46) SSA 12.0s ± 1% 11.8s ± 1% -2.01% (p=0.000 n=48+50) Flate 142ms ± 3% 141ms ± 3% -0.85% (p=0.003 n=50+48) GoParser 178ms ± 4% 175ms ± 4% -1.66% (p=0.000 n=48+46) Reflect 520ms ± 2% 512ms ± 2% -1.44% (p=0.000 n=45+48) Tar 200ms ± 3% 198ms ± 4% -0.61% (p=0.037 n=47+50) XML 277ms ± 3% 275ms ± 3% -0.85% (p=0.000 n=49+48) [Geo mean] 482ms 476ms -1.23% name old alloc/op new alloc/op delta Template 36.1MB ± 0% 35.3MB ± 0% -2.18% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 29.3MB ± 0% -1.58% (p=0.008 n=5+5) GoTypes 125MB ± 0% 123MB ± 0% -2.13% (p=0.008 n=5+5) Compiler 531MB ± 0% 513MB ± 0% -3.40% (p=0.008 n=5+5) SSA 2.00GB ± 0% 1.93GB ± 0% -3.34% (p=0.008 n=5+5) Flate 24.5MB ± 0% 24.3MB ± 0% -1.18% (p=0.008 n=5+5) GoParser 29.4MB ± 0% 28.7MB ± 0% -2.34% (p=0.008 n=5+5) Reflect 87.1MB ± 0% 86.0MB ± 0% -1.33% (p=0.008 n=5+5) Tar 35.3MB ± 0% 34.8MB ± 0% -1.44% (p=0.008 n=5+5) XML 47.9MB ± 0% 47.1MB ± 0% -1.86% (p=0.008 n=5+5) [Geo mean] 82.8MB 81.1MB -2.08% name old allocs/op new allocs/op delta Template 352k ± 0% 347k ± 0% -1.32% (p=0.008 n=5+5) Unicode 342k ± 0% 339k ± 0% -0.66% (p=0.008 n=5+5) GoTypes 1.29M ± 0% 1.27M ± 0% -1.30% (p=0.008 n=5+5) Compiler 4.98M ± 0% 4.87M ± 0% -2.14% (p=0.008 n=5+5) SSA 15.7M ± 0% 15.2M ± 0% -2.86% (p=0.008 n=5+5) Flate 233k ± 0% 231k ± 0% -0.83% (p=0.008 n=5+5) GoParser 296k ± 0% 291k ± 0% -1.54% (p=0.016 n=5+4) Reflect 1.05M ± 0% 1.04M ± 0% -0.65% (p=0.008 n=5+5) Tar 343k ± 0% 339k ± 0% -0.97% (p=0.008 n=5+5) XML 432k ± 0% 426k ± 0% -1.19% (p=0.008 n=5+5) [Geo mean] 815k 804k -1.35% name old object-bytes new object-bytes delta Template 505kB ± 0% 505kB ± 0% -0.01% (p=0.008 n=5+5) Unicode 224kB ± 0% 224kB ± 0% ~ (all equal) GoTypes 1.82MB ± 0% 1.83MB ± 0% +0.06% (p=0.008 n=5+5) Flate 324kB ± 0% 324kB ± 0% +0.00% (p=0.008 n=5+5) GoParser 402kB ± 0% 402kB ± 0% +0.04% (p=0.008 n=5+5) Reflect 1.39MB ± 0% 1.39MB ± 0% -0.01% (p=0.008 n=5+5) Tar 449kB ± 0% 449kB ± 0% -0.02% (p=0.008 n=5+5) XML 598kB ± 0% 597kB ± 0% -0.05% (p=0.008 n=5+5) Change-Id: Ifc9d5c1bd01f90171414b8fb18ffe2290d271143 Reviewed-on: https://go-review.googlesource.com/c/114797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-05-06 12:58:53 -07:00
// Evaluate and store to "temporary".
// Walk ensures these temporaries are dead outside of n.
temps[l] = s.expr(r)
}
cmd/compile: move argument stack construction to SSA generation The goal of this change is to move work from walk to SSA, and simplify things along the way. This is hard to accomplish cleanly with small incremental changes, so this large commit message aims to provide a roadmap to the diff. High level description: Prior to this change, walk was responsible for constructing (most of) the stack for function calls. ascompatte gathered variadic arguments into a slice. It also rewrote n.List from a list of arguments to a list of assignments to stack slots. ascompatte was called multiple times to handle the receiver in a method call. reorder1 then introduced temporaries into n.List as needed to avoid smashing the stack. adjustargs then made extra stack space for go/defer args as needed. Node to SSA construction evaluated all the statements in n.List, and issued the function call, assuming that the stack was correctly constructed. Intrinsic calls had to dig around inside n.List to extract the arguments, since intrinsics don't use the stack to make function calls. This change moves stack construction to the SSA construction phase. ascompatte, now called walkParams, does all the work that ascompatte and reorder1 did. It handles variadic arguments, inserts the method receiver if needed, and allocates temporaries. It does not, however, make any assignments to stack slots. Instead, it moves the function arguments to n.Rlist, leaving assignments to temporaries in n.List. (It would be better to use Ninit instead of List; future work.) During SSA construction, after doing all the temporary assignments in n.List, the function arguments are assigned to stack slots by constructing the appropriate SSA Value, using (*state).storeArg. SSA construction also now handles adjustments for go/defer args. This change also simplifies intrinsic calls, since we no longer need to undo walk's work. Along the way, we simplify nodarg by pushing the fp==1 case to its callers, where it fits nicely. Generated code differences: There were a few optimizations applied along the way, the old way. f(g()) was rewritten to do a block copy of function results to function arguments. And reorder1 avoided introducing the final "save the stack" temporary in n.List. The f(g()) block copy optimization never actually triggered; the order pass rewrote away g(), so that has been removed. SSA optimizations mostly obviated the need for reorder1's optimization of avoiding the final temporary. The exception was when the temporary's type was not SSA-able; in that case, we got a Move into an autotmp and then an immediate Move onto the stack, with the autotmp never read or used again. This change introduces a new rewrite rule to detect such pointless double Moves and collapse them into a single Move. This is actually more powerful than the original optimization, since the original optimization relied on the imprecise Node.HasCall calculation. The other significant difference in the generated code is that the stack is now constructed completely in SP-offset order. Prior to this change, the stack was constructed somewhat haphazardly: first the final argument that Node.HasCall deemed to require a temporary, then other arguments, then the method receiver, then the defer/go args. SP-offset is probably a good default order. See future work. There are a few minor object file size changes as a result of this change. I investigated some regressions in early versions of this change. One regression (in archive/tar) was the addition of a single CMPQ instruction, which would be eliminated were this TODO from flagalloc to be done: // TODO: Remove original instructions if they are never used. One regression (in text/template) was an ADDQconstmodify that is now a regular MOVQLoad+ADDQconst+MOVQStore, due to an unlucky change in the order in which arguments are written. The argument change order can also now be luckier, so this appears to be a wash. All in all, though there will be minor winners and losers, this change appears to be performance neutral. Future work: Move loading the result of function calls to SSA construction; eliminate OINDREGSP. Consider pushing stack construction deeper into SSA world, perhaps in an arch-specific pass. Among other benefits, this would make it easier to transition to a new calling convention. This would require rethinking the handling of stack conflicts and is non-trivial. Figure out some clean way to indicate that stack construction Stores/Moves do not alias each other, so that subsequent passes may do things like CSE+tighten shared stack setup, do DSE using non-first Stores, etc. This would allow us to eliminate the minor text/template regression. Possibly make assignments to stack slots not treated as statements by DWARF. Compiler benchmarks: name old time/op new time/op delta Template 182ms ± 2% 179ms ± 2% -1.69% (p=0.000 n=47+48) Unicode 86.3ms ± 5% 85.1ms ± 4% -1.36% (p=0.001 n=50+50) GoTypes 646ms ± 1% 642ms ± 1% -0.63% (p=0.000 n=49+48) Compiler 2.89s ± 1% 2.86s ± 2% -1.36% (p=0.000 n=48+50) SSA 8.47s ± 1% 8.37s ± 2% -1.22% (p=0.000 n=47+50) Flate 122ms ± 2% 121ms ± 2% -0.66% (p=0.000 n=47+45) GoParser 147ms ± 2% 146ms ± 2% -0.53% (p=0.006 n=46+49) Reflect 406ms ± 2% 403ms ± 2% -0.76% (p=0.000 n=48+43) Tar 162ms ± 3% 162ms ± 4% ~ (p=0.191 n=46+50) XML 223ms ± 2% 222ms ± 2% -0.37% (p=0.031 n=45+49) [Geo mean] 382ms 378ms -0.89% name old user-time/op new user-time/op delta Template 219ms ± 3% 216ms ± 3% -1.56% (p=0.000 n=50+48) Unicode 109ms ± 6% 109ms ± 5% ~ (p=0.190 n=50+49) GoTypes 836ms ± 2% 828ms ± 2% -0.96% (p=0.000 n=49+48) Compiler 3.87s ± 2% 3.80s ± 1% -1.81% (p=0.000 n=49+46) SSA 12.0s ± 1% 11.8s ± 1% -2.01% (p=0.000 n=48+50) Flate 142ms ± 3% 141ms ± 3% -0.85% (p=0.003 n=50+48) GoParser 178ms ± 4% 175ms ± 4% -1.66% (p=0.000 n=48+46) Reflect 520ms ± 2% 512ms ± 2% -1.44% (p=0.000 n=45+48) Tar 200ms ± 3% 198ms ± 4% -0.61% (p=0.037 n=47+50) XML 277ms ± 3% 275ms ± 3% -0.85% (p=0.000 n=49+48) [Geo mean] 482ms 476ms -1.23% name old alloc/op new alloc/op delta Template 36.1MB ± 0% 35.3MB ± 0% -2.18% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 29.3MB ± 0% -1.58% (p=0.008 n=5+5) GoTypes 125MB ± 0% 123MB ± 0% -2.13% (p=0.008 n=5+5) Compiler 531MB ± 0% 513MB ± 0% -3.40% (p=0.008 n=5+5) SSA 2.00GB ± 0% 1.93GB ± 0% -3.34% (p=0.008 n=5+5) Flate 24.5MB ± 0% 24.3MB ± 0% -1.18% (p=0.008 n=5+5) GoParser 29.4MB ± 0% 28.7MB ± 0% -2.34% (p=0.008 n=5+5) Reflect 87.1MB ± 0% 86.0MB ± 0% -1.33% (p=0.008 n=5+5) Tar 35.3MB ± 0% 34.8MB ± 0% -1.44% (p=0.008 n=5+5) XML 47.9MB ± 0% 47.1MB ± 0% -1.86% (p=0.008 n=5+5) [Geo mean] 82.8MB 81.1MB -2.08% name old allocs/op new allocs/op delta Template 352k ± 0% 347k ± 0% -1.32% (p=0.008 n=5+5) Unicode 342k ± 0% 339k ± 0% -0.66% (p=0.008 n=5+5) GoTypes 1.29M ± 0% 1.27M ± 0% -1.30% (p=0.008 n=5+5) Compiler 4.98M ± 0% 4.87M ± 0% -2.14% (p=0.008 n=5+5) SSA 15.7M ± 0% 15.2M ± 0% -2.86% (p=0.008 n=5+5) Flate 233k ± 0% 231k ± 0% -0.83% (p=0.008 n=5+5) GoParser 296k ± 0% 291k ± 0% -1.54% (p=0.016 n=5+4) Reflect 1.05M ± 0% 1.04M ± 0% -0.65% (p=0.008 n=5+5) Tar 343k ± 0% 339k ± 0% -0.97% (p=0.008 n=5+5) XML 432k ± 0% 426k ± 0% -1.19% (p=0.008 n=5+5) [Geo mean] 815k 804k -1.35% name old object-bytes new object-bytes delta Template 505kB ± 0% 505kB ± 0% -0.01% (p=0.008 n=5+5) Unicode 224kB ± 0% 224kB ± 0% ~ (all equal) GoTypes 1.82MB ± 0% 1.83MB ± 0% +0.06% (p=0.008 n=5+5) Flate 324kB ± 0% 324kB ± 0% +0.00% (p=0.008 n=5+5) GoParser 402kB ± 0% 402kB ± 0% +0.04% (p=0.008 n=5+5) Reflect 1.39MB ± 0% 1.39MB ± 0% -0.01% (p=0.008 n=5+5) Tar 449kB ± 0% 449kB ± 0% -0.02% (p=0.008 n=5+5) XML 598kB ± 0% 597kB ± 0% -0.05% (p=0.008 n=5+5) Change-Id: Ifc9d5c1bd01f90171414b8fb18ffe2290d271143 Reviewed-on: https://go-review.googlesource.com/c/114797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-05-06 12:58:53 -07:00
args := make([]*ssa.Value, n.Rlist.Len())
for i, n := range n.Rlist.Slice() {
// Store a value to an argument slot.
if x, ok := temps[n]; ok {
// This is a previously computed temporary.
args[i] = x
continue
}
// This is an explicit value; evaluate it.
args[i] = s.expr(n)
}
cmd/compile: move argument stack construction to SSA generation The goal of this change is to move work from walk to SSA, and simplify things along the way. This is hard to accomplish cleanly with small incremental changes, so this large commit message aims to provide a roadmap to the diff. High level description: Prior to this change, walk was responsible for constructing (most of) the stack for function calls. ascompatte gathered variadic arguments into a slice. It also rewrote n.List from a list of arguments to a list of assignments to stack slots. ascompatte was called multiple times to handle the receiver in a method call. reorder1 then introduced temporaries into n.List as needed to avoid smashing the stack. adjustargs then made extra stack space for go/defer args as needed. Node to SSA construction evaluated all the statements in n.List, and issued the function call, assuming that the stack was correctly constructed. Intrinsic calls had to dig around inside n.List to extract the arguments, since intrinsics don't use the stack to make function calls. This change moves stack construction to the SSA construction phase. ascompatte, now called walkParams, does all the work that ascompatte and reorder1 did. It handles variadic arguments, inserts the method receiver if needed, and allocates temporaries. It does not, however, make any assignments to stack slots. Instead, it moves the function arguments to n.Rlist, leaving assignments to temporaries in n.List. (It would be better to use Ninit instead of List; future work.) During SSA construction, after doing all the temporary assignments in n.List, the function arguments are assigned to stack slots by constructing the appropriate SSA Value, using (*state).storeArg. SSA construction also now handles adjustments for go/defer args. This change also simplifies intrinsic calls, since we no longer need to undo walk's work. Along the way, we simplify nodarg by pushing the fp==1 case to its callers, where it fits nicely. Generated code differences: There were a few optimizations applied along the way, the old way. f(g()) was rewritten to do a block copy of function results to function arguments. And reorder1 avoided introducing the final "save the stack" temporary in n.List. The f(g()) block copy optimization never actually triggered; the order pass rewrote away g(), so that has been removed. SSA optimizations mostly obviated the need for reorder1's optimization of avoiding the final temporary. The exception was when the temporary's type was not SSA-able; in that case, we got a Move into an autotmp and then an immediate Move onto the stack, with the autotmp never read or used again. This change introduces a new rewrite rule to detect such pointless double Moves and collapse them into a single Move. This is actually more powerful than the original optimization, since the original optimization relied on the imprecise Node.HasCall calculation. The other significant difference in the generated code is that the stack is now constructed completely in SP-offset order. Prior to this change, the stack was constructed somewhat haphazardly: first the final argument that Node.HasCall deemed to require a temporary, then other arguments, then the method receiver, then the defer/go args. SP-offset is probably a good default order. See future work. There are a few minor object file size changes as a result of this change. I investigated some regressions in early versions of this change. One regression (in archive/tar) was the addition of a single CMPQ instruction, which would be eliminated were this TODO from flagalloc to be done: // TODO: Remove original instructions if they are never used. One regression (in text/template) was an ADDQconstmodify that is now a regular MOVQLoad+ADDQconst+MOVQStore, due to an unlucky change in the order in which arguments are written. The argument change order can also now be luckier, so this appears to be a wash. All in all, though there will be minor winners and losers, this change appears to be performance neutral. Future work: Move loading the result of function calls to SSA construction; eliminate OINDREGSP. Consider pushing stack construction deeper into SSA world, perhaps in an arch-specific pass. Among other benefits, this would make it easier to transition to a new calling convention. This would require rethinking the handling of stack conflicts and is non-trivial. Figure out some clean way to indicate that stack construction Stores/Moves do not alias each other, so that subsequent passes may do things like CSE+tighten shared stack setup, do DSE using non-first Stores, etc. This would allow us to eliminate the minor text/template regression. Possibly make assignments to stack slots not treated as statements by DWARF. Compiler benchmarks: name old time/op new time/op delta Template 182ms ± 2% 179ms ± 2% -1.69% (p=0.000 n=47+48) Unicode 86.3ms ± 5% 85.1ms ± 4% -1.36% (p=0.001 n=50+50) GoTypes 646ms ± 1% 642ms ± 1% -0.63% (p=0.000 n=49+48) Compiler 2.89s ± 1% 2.86s ± 2% -1.36% (p=0.000 n=48+50) SSA 8.47s ± 1% 8.37s ± 2% -1.22% (p=0.000 n=47+50) Flate 122ms ± 2% 121ms ± 2% -0.66% (p=0.000 n=47+45) GoParser 147ms ± 2% 146ms ± 2% -0.53% (p=0.006 n=46+49) Reflect 406ms ± 2% 403ms ± 2% -0.76% (p=0.000 n=48+43) Tar 162ms ± 3% 162ms ± 4% ~ (p=0.191 n=46+50) XML 223ms ± 2% 222ms ± 2% -0.37% (p=0.031 n=45+49) [Geo mean] 382ms 378ms -0.89% name old user-time/op new user-time/op delta Template 219ms ± 3% 216ms ± 3% -1.56% (p=0.000 n=50+48) Unicode 109ms ± 6% 109ms ± 5% ~ (p=0.190 n=50+49) GoTypes 836ms ± 2% 828ms ± 2% -0.96% (p=0.000 n=49+48) Compiler 3.87s ± 2% 3.80s ± 1% -1.81% (p=0.000 n=49+46) SSA 12.0s ± 1% 11.8s ± 1% -2.01% (p=0.000 n=48+50) Flate 142ms ± 3% 141ms ± 3% -0.85% (p=0.003 n=50+48) GoParser 178ms ± 4% 175ms ± 4% -1.66% (p=0.000 n=48+46) Reflect 520ms ± 2% 512ms ± 2% -1.44% (p=0.000 n=45+48) Tar 200ms ± 3% 198ms ± 4% -0.61% (p=0.037 n=47+50) XML 277ms ± 3% 275ms ± 3% -0.85% (p=0.000 n=49+48) [Geo mean] 482ms 476ms -1.23% name old alloc/op new alloc/op delta Template 36.1MB ± 0% 35.3MB ± 0% -2.18% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 29.3MB ± 0% -1.58% (p=0.008 n=5+5) GoTypes 125MB ± 0% 123MB ± 0% -2.13% (p=0.008 n=5+5) Compiler 531MB ± 0% 513MB ± 0% -3.40% (p=0.008 n=5+5) SSA 2.00GB ± 0% 1.93GB ± 0% -3.34% (p=0.008 n=5+5) Flate 24.5MB ± 0% 24.3MB ± 0% -1.18% (p=0.008 n=5+5) GoParser 29.4MB ± 0% 28.7MB ± 0% -2.34% (p=0.008 n=5+5) Reflect 87.1MB ± 0% 86.0MB ± 0% -1.33% (p=0.008 n=5+5) Tar 35.3MB ± 0% 34.8MB ± 0% -1.44% (p=0.008 n=5+5) XML 47.9MB ± 0% 47.1MB ± 0% -1.86% (p=0.008 n=5+5) [Geo mean] 82.8MB 81.1MB -2.08% name old allocs/op new allocs/op delta Template 352k ± 0% 347k ± 0% -1.32% (p=0.008 n=5+5) Unicode 342k ± 0% 339k ± 0% -0.66% (p=0.008 n=5+5) GoTypes 1.29M ± 0% 1.27M ± 0% -1.30% (p=0.008 n=5+5) Compiler 4.98M ± 0% 4.87M ± 0% -2.14% (p=0.008 n=5+5) SSA 15.7M ± 0% 15.2M ± 0% -2.86% (p=0.008 n=5+5) Flate 233k ± 0% 231k ± 0% -0.83% (p=0.008 n=5+5) GoParser 296k ± 0% 291k ± 0% -1.54% (p=0.016 n=5+4) Reflect 1.05M ± 0% 1.04M ± 0% -0.65% (p=0.008 n=5+5) Tar 343k ± 0% 339k ± 0% -0.97% (p=0.008 n=5+5) XML 432k ± 0% 426k ± 0% -1.19% (p=0.008 n=5+5) [Geo mean] 815k 804k -1.35% name old object-bytes new object-bytes delta Template 505kB ± 0% 505kB ± 0% -0.01% (p=0.008 n=5+5) Unicode 224kB ± 0% 224kB ± 0% ~ (all equal) GoTypes 1.82MB ± 0% 1.83MB ± 0% +0.06% (p=0.008 n=5+5) Flate 324kB ± 0% 324kB ± 0% +0.00% (p=0.008 n=5+5) GoParser 402kB ± 0% 402kB ± 0% +0.04% (p=0.008 n=5+5) Reflect 1.39MB ± 0% 1.39MB ± 0% -0.01% (p=0.008 n=5+5) Tar 449kB ± 0% 449kB ± 0% -0.02% (p=0.008 n=5+5) XML 598kB ± 0% 597kB ± 0% -0.05% (p=0.008 n=5+5) Change-Id: Ifc9d5c1bd01f90171414b8fb18ffe2290d271143 Reviewed-on: https://go-review.googlesource.com/c/114797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-05-06 12:58:53 -07:00
return args
}
// Calls the function n using the specified call type.
// Returns the address of the return value (or nil if none).
func (s *state) call(n *Node, k callKind) *ssa.Value {
var sym *types.Sym // target symbol (if static)
var closure *ssa.Value // ptr to closure to run (if dynamic)
var codeptr *ssa.Value // ptr to target code (if dynamic)
var rcvr *ssa.Value // receiver to set
fn := n.Left
switch n.Op {
case OCALLFUNC:
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if k == callNormal && fn.Op == ONAME && fn.Class() == PFUNC {
sym = fn.Sym
break
}
closure = s.expr(fn)
if thearch.LinkArch.Family == sys.Wasm || objabi.GOOS == "aix" && k != callGo {
// On AIX, the closure needs to be verified as fn can be nil, except if it's a call go. This needs to be handled by the runtime to have the "go of nil func value" error.
// TODO(neelance): On other architectures this should be eliminated by the optimization steps
s.nilCheck(closure)
}
case OCALLMETH:
if fn.Op != ODOTMETH {
Fatalf("OCALLMETH: n.Left not an ODOTMETH: %v", fn)
}
if k == callNormal {
sym = fn.Sym
break
}
// Make a name n2 for the function.
// fn.Sym might be sync.(*Mutex).Unlock.
// Make a PFUNC node out of that, then evaluate it.
// We get back an SSA value representing &sync.(*Mutex).Unlock·f.
// We can then pass that to defer or go.
n2 := newnamel(fn.Pos, fn.Sym)
n2.Name.Curfn = s.curfn
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
n2.SetClass(PFUNC)
// n2.Sym already existed, so it's already marked as a function.
n2.Pos = fn.Pos
n2.Type = types.Types[TUINT8] // dummy type for a static closure. Could use runtime.funcval if we had it.
closure = s.expr(n2)
cmd/compile: move argument stack construction to SSA generation The goal of this change is to move work from walk to SSA, and simplify things along the way. This is hard to accomplish cleanly with small incremental changes, so this large commit message aims to provide a roadmap to the diff. High level description: Prior to this change, walk was responsible for constructing (most of) the stack for function calls. ascompatte gathered variadic arguments into a slice. It also rewrote n.List from a list of arguments to a list of assignments to stack slots. ascompatte was called multiple times to handle the receiver in a method call. reorder1 then introduced temporaries into n.List as needed to avoid smashing the stack. adjustargs then made extra stack space for go/defer args as needed. Node to SSA construction evaluated all the statements in n.List, and issued the function call, assuming that the stack was correctly constructed. Intrinsic calls had to dig around inside n.List to extract the arguments, since intrinsics don't use the stack to make function calls. This change moves stack construction to the SSA construction phase. ascompatte, now called walkParams, does all the work that ascompatte and reorder1 did. It handles variadic arguments, inserts the method receiver if needed, and allocates temporaries. It does not, however, make any assignments to stack slots. Instead, it moves the function arguments to n.Rlist, leaving assignments to temporaries in n.List. (It would be better to use Ninit instead of List; future work.) During SSA construction, after doing all the temporary assignments in n.List, the function arguments are assigned to stack slots by constructing the appropriate SSA Value, using (*state).storeArg. SSA construction also now handles adjustments for go/defer args. This change also simplifies intrinsic calls, since we no longer need to undo walk's work. Along the way, we simplify nodarg by pushing the fp==1 case to its callers, where it fits nicely. Generated code differences: There were a few optimizations applied along the way, the old way. f(g()) was rewritten to do a block copy of function results to function arguments. And reorder1 avoided introducing the final "save the stack" temporary in n.List. The f(g()) block copy optimization never actually triggered; the order pass rewrote away g(), so that has been removed. SSA optimizations mostly obviated the need for reorder1's optimization of avoiding the final temporary. The exception was when the temporary's type was not SSA-able; in that case, we got a Move into an autotmp and then an immediate Move onto the stack, with the autotmp never read or used again. This change introduces a new rewrite rule to detect such pointless double Moves and collapse them into a single Move. This is actually more powerful than the original optimization, since the original optimization relied on the imprecise Node.HasCall calculation. The other significant difference in the generated code is that the stack is now constructed completely in SP-offset order. Prior to this change, the stack was constructed somewhat haphazardly: first the final argument that Node.HasCall deemed to require a temporary, then other arguments, then the method receiver, then the defer/go args. SP-offset is probably a good default order. See future work. There are a few minor object file size changes as a result of this change. I investigated some regressions in early versions of this change. One regression (in archive/tar) was the addition of a single CMPQ instruction, which would be eliminated were this TODO from flagalloc to be done: // TODO: Remove original instructions if they are never used. One regression (in text/template) was an ADDQconstmodify that is now a regular MOVQLoad+ADDQconst+MOVQStore, due to an unlucky change in the order in which arguments are written. The argument change order can also now be luckier, so this appears to be a wash. All in all, though there will be minor winners and losers, this change appears to be performance neutral. Future work: Move loading the result of function calls to SSA construction; eliminate OINDREGSP. Consider pushing stack construction deeper into SSA world, perhaps in an arch-specific pass. Among other benefits, this would make it easier to transition to a new calling convention. This would require rethinking the handling of stack conflicts and is non-trivial. Figure out some clean way to indicate that stack construction Stores/Moves do not alias each other, so that subsequent passes may do things like CSE+tighten shared stack setup, do DSE using non-first Stores, etc. This would allow us to eliminate the minor text/template regression. Possibly make assignments to stack slots not treated as statements by DWARF. Compiler benchmarks: name old time/op new time/op delta Template 182ms ± 2% 179ms ± 2% -1.69% (p=0.000 n=47+48) Unicode 86.3ms ± 5% 85.1ms ± 4% -1.36% (p=0.001 n=50+50) GoTypes 646ms ± 1% 642ms ± 1% -0.63% (p=0.000 n=49+48) Compiler 2.89s ± 1% 2.86s ± 2% -1.36% (p=0.000 n=48+50) SSA 8.47s ± 1% 8.37s ± 2% -1.22% (p=0.000 n=47+50) Flate 122ms ± 2% 121ms ± 2% -0.66% (p=0.000 n=47+45) GoParser 147ms ± 2% 146ms ± 2% -0.53% (p=0.006 n=46+49) Reflect 406ms ± 2% 403ms ± 2% -0.76% (p=0.000 n=48+43) Tar 162ms ± 3% 162ms ± 4% ~ (p=0.191 n=46+50) XML 223ms ± 2% 222ms ± 2% -0.37% (p=0.031 n=45+49) [Geo mean] 382ms 378ms -0.89% name old user-time/op new user-time/op delta Template 219ms ± 3% 216ms ± 3% -1.56% (p=0.000 n=50+48) Unicode 109ms ± 6% 109ms ± 5% ~ (p=0.190 n=50+49) GoTypes 836ms ± 2% 828ms ± 2% -0.96% (p=0.000 n=49+48) Compiler 3.87s ± 2% 3.80s ± 1% -1.81% (p=0.000 n=49+46) SSA 12.0s ± 1% 11.8s ± 1% -2.01% (p=0.000 n=48+50) Flate 142ms ± 3% 141ms ± 3% -0.85% (p=0.003 n=50+48) GoParser 178ms ± 4% 175ms ± 4% -1.66% (p=0.000 n=48+46) Reflect 520ms ± 2% 512ms ± 2% -1.44% (p=0.000 n=45+48) Tar 200ms ± 3% 198ms ± 4% -0.61% (p=0.037 n=47+50) XML 277ms ± 3% 275ms ± 3% -0.85% (p=0.000 n=49+48) [Geo mean] 482ms 476ms -1.23% name old alloc/op new alloc/op delta Template 36.1MB ± 0% 35.3MB ± 0% -2.18% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 29.3MB ± 0% -1.58% (p=0.008 n=5+5) GoTypes 125MB ± 0% 123MB ± 0% -2.13% (p=0.008 n=5+5) Compiler 531MB ± 0% 513MB ± 0% -3.40% (p=0.008 n=5+5) SSA 2.00GB ± 0% 1.93GB ± 0% -3.34% (p=0.008 n=5+5) Flate 24.5MB ± 0% 24.3MB ± 0% -1.18% (p=0.008 n=5+5) GoParser 29.4MB ± 0% 28.7MB ± 0% -2.34% (p=0.008 n=5+5) Reflect 87.1MB ± 0% 86.0MB ± 0% -1.33% (p=0.008 n=5+5) Tar 35.3MB ± 0% 34.8MB ± 0% -1.44% (p=0.008 n=5+5) XML 47.9MB ± 0% 47.1MB ± 0% -1.86% (p=0.008 n=5+5) [Geo mean] 82.8MB 81.1MB -2.08% name old allocs/op new allocs/op delta Template 352k ± 0% 347k ± 0% -1.32% (p=0.008 n=5+5) Unicode 342k ± 0% 339k ± 0% -0.66% (p=0.008 n=5+5) GoTypes 1.29M ± 0% 1.27M ± 0% -1.30% (p=0.008 n=5+5) Compiler 4.98M ± 0% 4.87M ± 0% -2.14% (p=0.008 n=5+5) SSA 15.7M ± 0% 15.2M ± 0% -2.86% (p=0.008 n=5+5) Flate 233k ± 0% 231k ± 0% -0.83% (p=0.008 n=5+5) GoParser 296k ± 0% 291k ± 0% -1.54% (p=0.016 n=5+4) Reflect 1.05M ± 0% 1.04M ± 0% -0.65% (p=0.008 n=5+5) Tar 343k ± 0% 339k ± 0% -0.97% (p=0.008 n=5+5) XML 432k ± 0% 426k ± 0% -1.19% (p=0.008 n=5+5) [Geo mean] 815k 804k -1.35% name old object-bytes new object-bytes delta Template 505kB ± 0% 505kB ± 0% -0.01% (p=0.008 n=5+5) Unicode 224kB ± 0% 224kB ± 0% ~ (all equal) GoTypes 1.82MB ± 0% 1.83MB ± 0% +0.06% (p=0.008 n=5+5) Flate 324kB ± 0% 324kB ± 0% +0.00% (p=0.008 n=5+5) GoParser 402kB ± 0% 402kB ± 0% +0.04% (p=0.008 n=5+5) Reflect 1.39MB ± 0% 1.39MB ± 0% -0.01% (p=0.008 n=5+5) Tar 449kB ± 0% 449kB ± 0% -0.02% (p=0.008 n=5+5) XML 598kB ± 0% 597kB ± 0% -0.05% (p=0.008 n=5+5) Change-Id: Ifc9d5c1bd01f90171414b8fb18ffe2290d271143 Reviewed-on: https://go-review.googlesource.com/c/114797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-05-06 12:58:53 -07:00
// Note: receiver is already present in n.Rlist, so we don't
// want to set it here.
case OCALLINTER:
if fn.Op != ODOTINTER {
Fatalf("OCALLINTER: n.Left not an ODOTINTER: %v", fn.Op)
}
i := s.expr(fn.Left)
itab := s.newValue1(ssa.OpITab, types.Types[TUINTPTR], i)
s.nilCheck(itab)
itabidx := fn.Xoffset + 2*int64(Widthptr) + 8 // offset of fun field in runtime.itab
itab = s.newValue1I(ssa.OpOffPtr, s.f.Config.Types.UintptrPtr, itabidx, itab)
if k == callNormal {
codeptr = s.load(types.Types[TUINTPTR], itab)
} else {
closure = itab
}
rcvr = s.newValue1(ssa.OpIData, types.Types[TUINTPTR], i)
}
dowidth(fn.Type)
stksize := fn.Type.ArgWidth() // includes receiver
cmd/compile: move argument stack construction to SSA generation The goal of this change is to move work from walk to SSA, and simplify things along the way. This is hard to accomplish cleanly with small incremental changes, so this large commit message aims to provide a roadmap to the diff. High level description: Prior to this change, walk was responsible for constructing (most of) the stack for function calls. ascompatte gathered variadic arguments into a slice. It also rewrote n.List from a list of arguments to a list of assignments to stack slots. ascompatte was called multiple times to handle the receiver in a method call. reorder1 then introduced temporaries into n.List as needed to avoid smashing the stack. adjustargs then made extra stack space for go/defer args as needed. Node to SSA construction evaluated all the statements in n.List, and issued the function call, assuming that the stack was correctly constructed. Intrinsic calls had to dig around inside n.List to extract the arguments, since intrinsics don't use the stack to make function calls. This change moves stack construction to the SSA construction phase. ascompatte, now called walkParams, does all the work that ascompatte and reorder1 did. It handles variadic arguments, inserts the method receiver if needed, and allocates temporaries. It does not, however, make any assignments to stack slots. Instead, it moves the function arguments to n.Rlist, leaving assignments to temporaries in n.List. (It would be better to use Ninit instead of List; future work.) During SSA construction, after doing all the temporary assignments in n.List, the function arguments are assigned to stack slots by constructing the appropriate SSA Value, using (*state).storeArg. SSA construction also now handles adjustments for go/defer args. This change also simplifies intrinsic calls, since we no longer need to undo walk's work. Along the way, we simplify nodarg by pushing the fp==1 case to its callers, where it fits nicely. Generated code differences: There were a few optimizations applied along the way, the old way. f(g()) was rewritten to do a block copy of function results to function arguments. And reorder1 avoided introducing the final "save the stack" temporary in n.List. The f(g()) block copy optimization never actually triggered; the order pass rewrote away g(), so that has been removed. SSA optimizations mostly obviated the need for reorder1's optimization of avoiding the final temporary. The exception was when the temporary's type was not SSA-able; in that case, we got a Move into an autotmp and then an immediate Move onto the stack, with the autotmp never read or used again. This change introduces a new rewrite rule to detect such pointless double Moves and collapse them into a single Move. This is actually more powerful than the original optimization, since the original optimization relied on the imprecise Node.HasCall calculation. The other significant difference in the generated code is that the stack is now constructed completely in SP-offset order. Prior to this change, the stack was constructed somewhat haphazardly: first the final argument that Node.HasCall deemed to require a temporary, then other arguments, then the method receiver, then the defer/go args. SP-offset is probably a good default order. See future work. There are a few minor object file size changes as a result of this change. I investigated some regressions in early versions of this change. One regression (in archive/tar) was the addition of a single CMPQ instruction, which would be eliminated were this TODO from flagalloc to be done: // TODO: Remove original instructions if they are never used. One regression (in text/template) was an ADDQconstmodify that is now a regular MOVQLoad+ADDQconst+MOVQStore, due to an unlucky change in the order in which arguments are written. The argument change order can also now be luckier, so this appears to be a wash. All in all, though there will be minor winners and losers, this change appears to be performance neutral. Future work: Move loading the result of function calls to SSA construction; eliminate OINDREGSP. Consider pushing stack construction deeper into SSA world, perhaps in an arch-specific pass. Among other benefits, this would make it easier to transition to a new calling convention. This would require rethinking the handling of stack conflicts and is non-trivial. Figure out some clean way to indicate that stack construction Stores/Moves do not alias each other, so that subsequent passes may do things like CSE+tighten shared stack setup, do DSE using non-first Stores, etc. This would allow us to eliminate the minor text/template regression. Possibly make assignments to stack slots not treated as statements by DWARF. Compiler benchmarks: name old time/op new time/op delta Template 182ms ± 2% 179ms ± 2% -1.69% (p=0.000 n=47+48) Unicode 86.3ms ± 5% 85.1ms ± 4% -1.36% (p=0.001 n=50+50) GoTypes 646ms ± 1% 642ms ± 1% -0.63% (p=0.000 n=49+48) Compiler 2.89s ± 1% 2.86s ± 2% -1.36% (p=0.000 n=48+50) SSA 8.47s ± 1% 8.37s ± 2% -1.22% (p=0.000 n=47+50) Flate 122ms ± 2% 121ms ± 2% -0.66% (p=0.000 n=47+45) GoParser 147ms ± 2% 146ms ± 2% -0.53% (p=0.006 n=46+49) Reflect 406ms ± 2% 403ms ± 2% -0.76% (p=0.000 n=48+43) Tar 162ms ± 3% 162ms ± 4% ~ (p=0.191 n=46+50) XML 223ms ± 2% 222ms ± 2% -0.37% (p=0.031 n=45+49) [Geo mean] 382ms 378ms -0.89% name old user-time/op new user-time/op delta Template 219ms ± 3% 216ms ± 3% -1.56% (p=0.000 n=50+48) Unicode 109ms ± 6% 109ms ± 5% ~ (p=0.190 n=50+49) GoTypes 836ms ± 2% 828ms ± 2% -0.96% (p=0.000 n=49+48) Compiler 3.87s ± 2% 3.80s ± 1% -1.81% (p=0.000 n=49+46) SSA 12.0s ± 1% 11.8s ± 1% -2.01% (p=0.000 n=48+50) Flate 142ms ± 3% 141ms ± 3% -0.85% (p=0.003 n=50+48) GoParser 178ms ± 4% 175ms ± 4% -1.66% (p=0.000 n=48+46) Reflect 520ms ± 2% 512ms ± 2% -1.44% (p=0.000 n=45+48) Tar 200ms ± 3% 198ms ± 4% -0.61% (p=0.037 n=47+50) XML 277ms ± 3% 275ms ± 3% -0.85% (p=0.000 n=49+48) [Geo mean] 482ms 476ms -1.23% name old alloc/op new alloc/op delta Template 36.1MB ± 0% 35.3MB ± 0% -2.18% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 29.3MB ± 0% -1.58% (p=0.008 n=5+5) GoTypes 125MB ± 0% 123MB ± 0% -2.13% (p=0.008 n=5+5) Compiler 531MB ± 0% 513MB ± 0% -3.40% (p=0.008 n=5+5) SSA 2.00GB ± 0% 1.93GB ± 0% -3.34% (p=0.008 n=5+5) Flate 24.5MB ± 0% 24.3MB ± 0% -1.18% (p=0.008 n=5+5) GoParser 29.4MB ± 0% 28.7MB ± 0% -2.34% (p=0.008 n=5+5) Reflect 87.1MB ± 0% 86.0MB ± 0% -1.33% (p=0.008 n=5+5) Tar 35.3MB ± 0% 34.8MB ± 0% -1.44% (p=0.008 n=5+5) XML 47.9MB ± 0% 47.1MB ± 0% -1.86% (p=0.008 n=5+5) [Geo mean] 82.8MB 81.1MB -2.08% name old allocs/op new allocs/op delta Template 352k ± 0% 347k ± 0% -1.32% (p=0.008 n=5+5) Unicode 342k ± 0% 339k ± 0% -0.66% (p=0.008 n=5+5) GoTypes 1.29M ± 0% 1.27M ± 0% -1.30% (p=0.008 n=5+5) Compiler 4.98M ± 0% 4.87M ± 0% -2.14% (p=0.008 n=5+5) SSA 15.7M ± 0% 15.2M ± 0% -2.86% (p=0.008 n=5+5) Flate 233k ± 0% 231k ± 0% -0.83% (p=0.008 n=5+5) GoParser 296k ± 0% 291k ± 0% -1.54% (p=0.016 n=5+4) Reflect 1.05M ± 0% 1.04M ± 0% -0.65% (p=0.008 n=5+5) Tar 343k ± 0% 339k ± 0% -0.97% (p=0.008 n=5+5) XML 432k ± 0% 426k ± 0% -1.19% (p=0.008 n=5+5) [Geo mean] 815k 804k -1.35% name old object-bytes new object-bytes delta Template 505kB ± 0% 505kB ± 0% -0.01% (p=0.008 n=5+5) Unicode 224kB ± 0% 224kB ± 0% ~ (all equal) GoTypes 1.82MB ± 0% 1.83MB ± 0% +0.06% (p=0.008 n=5+5) Flate 324kB ± 0% 324kB ± 0% +0.00% (p=0.008 n=5+5) GoParser 402kB ± 0% 402kB ± 0% +0.04% (p=0.008 n=5+5) Reflect 1.39MB ± 0% 1.39MB ± 0% -0.01% (p=0.008 n=5+5) Tar 449kB ± 0% 449kB ± 0% -0.02% (p=0.008 n=5+5) XML 598kB ± 0% 597kB ± 0% -0.05% (p=0.008 n=5+5) Change-Id: Ifc9d5c1bd01f90171414b8fb18ffe2290d271143 Reviewed-on: https://go-review.googlesource.com/c/114797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-05-06 12:58:53 -07:00
// Run all assignments of temps.
// The temps are introduced to avoid overwriting argument
// slots when arguments themselves require function calls.
s.stmtList(n.List)
cmd/compile: move argument stack construction to SSA generation The goal of this change is to move work from walk to SSA, and simplify things along the way. This is hard to accomplish cleanly with small incremental changes, so this large commit message aims to provide a roadmap to the diff. High level description: Prior to this change, walk was responsible for constructing (most of) the stack for function calls. ascompatte gathered variadic arguments into a slice. It also rewrote n.List from a list of arguments to a list of assignments to stack slots. ascompatte was called multiple times to handle the receiver in a method call. reorder1 then introduced temporaries into n.List as needed to avoid smashing the stack. adjustargs then made extra stack space for go/defer args as needed. Node to SSA construction evaluated all the statements in n.List, and issued the function call, assuming that the stack was correctly constructed. Intrinsic calls had to dig around inside n.List to extract the arguments, since intrinsics don't use the stack to make function calls. This change moves stack construction to the SSA construction phase. ascompatte, now called walkParams, does all the work that ascompatte and reorder1 did. It handles variadic arguments, inserts the method receiver if needed, and allocates temporaries. It does not, however, make any assignments to stack slots. Instead, it moves the function arguments to n.Rlist, leaving assignments to temporaries in n.List. (It would be better to use Ninit instead of List; future work.) During SSA construction, after doing all the temporary assignments in n.List, the function arguments are assigned to stack slots by constructing the appropriate SSA Value, using (*state).storeArg. SSA construction also now handles adjustments for go/defer args. This change also simplifies intrinsic calls, since we no longer need to undo walk's work. Along the way, we simplify nodarg by pushing the fp==1 case to its callers, where it fits nicely. Generated code differences: There were a few optimizations applied along the way, the old way. f(g()) was rewritten to do a block copy of function results to function arguments. And reorder1 avoided introducing the final "save the stack" temporary in n.List. The f(g()) block copy optimization never actually triggered; the order pass rewrote away g(), so that has been removed. SSA optimizations mostly obviated the need for reorder1's optimization of avoiding the final temporary. The exception was when the temporary's type was not SSA-able; in that case, we got a Move into an autotmp and then an immediate Move onto the stack, with the autotmp never read or used again. This change introduces a new rewrite rule to detect such pointless double Moves and collapse them into a single Move. This is actually more powerful than the original optimization, since the original optimization relied on the imprecise Node.HasCall calculation. The other significant difference in the generated code is that the stack is now constructed completely in SP-offset order. Prior to this change, the stack was constructed somewhat haphazardly: first the final argument that Node.HasCall deemed to require a temporary, then other arguments, then the method receiver, then the defer/go args. SP-offset is probably a good default order. See future work. There are a few minor object file size changes as a result of this change. I investigated some regressions in early versions of this change. One regression (in archive/tar) was the addition of a single CMPQ instruction, which would be eliminated were this TODO from flagalloc to be done: // TODO: Remove original instructions if they are never used. One regression (in text/template) was an ADDQconstmodify that is now a regular MOVQLoad+ADDQconst+MOVQStore, due to an unlucky change in the order in which arguments are written. The argument change order can also now be luckier, so this appears to be a wash. All in all, though there will be minor winners and losers, this change appears to be performance neutral. Future work: Move loading the result of function calls to SSA construction; eliminate OINDREGSP. Consider pushing stack construction deeper into SSA world, perhaps in an arch-specific pass. Among other benefits, this would make it easier to transition to a new calling convention. This would require rethinking the handling of stack conflicts and is non-trivial. Figure out some clean way to indicate that stack construction Stores/Moves do not alias each other, so that subsequent passes may do things like CSE+tighten shared stack setup, do DSE using non-first Stores, etc. This would allow us to eliminate the minor text/template regression. Possibly make assignments to stack slots not treated as statements by DWARF. Compiler benchmarks: name old time/op new time/op delta Template 182ms ± 2% 179ms ± 2% -1.69% (p=0.000 n=47+48) Unicode 86.3ms ± 5% 85.1ms ± 4% -1.36% (p=0.001 n=50+50) GoTypes 646ms ± 1% 642ms ± 1% -0.63% (p=0.000 n=49+48) Compiler 2.89s ± 1% 2.86s ± 2% -1.36% (p=0.000 n=48+50) SSA 8.47s ± 1% 8.37s ± 2% -1.22% (p=0.000 n=47+50) Flate 122ms ± 2% 121ms ± 2% -0.66% (p=0.000 n=47+45) GoParser 147ms ± 2% 146ms ± 2% -0.53% (p=0.006 n=46+49) Reflect 406ms ± 2% 403ms ± 2% -0.76% (p=0.000 n=48+43) Tar 162ms ± 3% 162ms ± 4% ~ (p=0.191 n=46+50) XML 223ms ± 2% 222ms ± 2% -0.37% (p=0.031 n=45+49) [Geo mean] 382ms 378ms -0.89% name old user-time/op new user-time/op delta Template 219ms ± 3% 216ms ± 3% -1.56% (p=0.000 n=50+48) Unicode 109ms ± 6% 109ms ± 5% ~ (p=0.190 n=50+49) GoTypes 836ms ± 2% 828ms ± 2% -0.96% (p=0.000 n=49+48) Compiler 3.87s ± 2% 3.80s ± 1% -1.81% (p=0.000 n=49+46) SSA 12.0s ± 1% 11.8s ± 1% -2.01% (p=0.000 n=48+50) Flate 142ms ± 3% 141ms ± 3% -0.85% (p=0.003 n=50+48) GoParser 178ms ± 4% 175ms ± 4% -1.66% (p=0.000 n=48+46) Reflect 520ms ± 2% 512ms ± 2% -1.44% (p=0.000 n=45+48) Tar 200ms ± 3% 198ms ± 4% -0.61% (p=0.037 n=47+50) XML 277ms ± 3% 275ms ± 3% -0.85% (p=0.000 n=49+48) [Geo mean] 482ms 476ms -1.23% name old alloc/op new alloc/op delta Template 36.1MB ± 0% 35.3MB ± 0% -2.18% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 29.3MB ± 0% -1.58% (p=0.008 n=5+5) GoTypes 125MB ± 0% 123MB ± 0% -2.13% (p=0.008 n=5+5) Compiler 531MB ± 0% 513MB ± 0% -3.40% (p=0.008 n=5+5) SSA 2.00GB ± 0% 1.93GB ± 0% -3.34% (p=0.008 n=5+5) Flate 24.5MB ± 0% 24.3MB ± 0% -1.18% (p=0.008 n=5+5) GoParser 29.4MB ± 0% 28.7MB ± 0% -2.34% (p=0.008 n=5+5) Reflect 87.1MB ± 0% 86.0MB ± 0% -1.33% (p=0.008 n=5+5) Tar 35.3MB ± 0% 34.8MB ± 0% -1.44% (p=0.008 n=5+5) XML 47.9MB ± 0% 47.1MB ± 0% -1.86% (p=0.008 n=5+5) [Geo mean] 82.8MB 81.1MB -2.08% name old allocs/op new allocs/op delta Template 352k ± 0% 347k ± 0% -1.32% (p=0.008 n=5+5) Unicode 342k ± 0% 339k ± 0% -0.66% (p=0.008 n=5+5) GoTypes 1.29M ± 0% 1.27M ± 0% -1.30% (p=0.008 n=5+5) Compiler 4.98M ± 0% 4.87M ± 0% -2.14% (p=0.008 n=5+5) SSA 15.7M ± 0% 15.2M ± 0% -2.86% (p=0.008 n=5+5) Flate 233k ± 0% 231k ± 0% -0.83% (p=0.008 n=5+5) GoParser 296k ± 0% 291k ± 0% -1.54% (p=0.016 n=5+4) Reflect 1.05M ± 0% 1.04M ± 0% -0.65% (p=0.008 n=5+5) Tar 343k ± 0% 339k ± 0% -0.97% (p=0.008 n=5+5) XML 432k ± 0% 426k ± 0% -1.19% (p=0.008 n=5+5) [Geo mean] 815k 804k -1.35% name old object-bytes new object-bytes delta Template 505kB ± 0% 505kB ± 0% -0.01% (p=0.008 n=5+5) Unicode 224kB ± 0% 224kB ± 0% ~ (all equal) GoTypes 1.82MB ± 0% 1.83MB ± 0% +0.06% (p=0.008 n=5+5) Flate 324kB ± 0% 324kB ± 0% +0.00% (p=0.008 n=5+5) GoParser 402kB ± 0% 402kB ± 0% +0.04% (p=0.008 n=5+5) Reflect 1.39MB ± 0% 1.39MB ± 0% -0.01% (p=0.008 n=5+5) Tar 449kB ± 0% 449kB ± 0% -0.02% (p=0.008 n=5+5) XML 598kB ± 0% 597kB ± 0% -0.05% (p=0.008 n=5+5) Change-Id: Ifc9d5c1bd01f90171414b8fb18ffe2290d271143 Reviewed-on: https://go-review.googlesource.com/c/114797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-05-06 12:58:53 -07:00
// Store arguments to stack, including defer/go arguments and receiver for method calls.
// These are written in SP-offset order.
argStart := Ctxt.FixedFrameSize()
// Defer/go args.
if k != callNormal {
// Write argsize and closure (args to newproc/deferproc).
argsize := s.constInt32(types.Types[TUINT32], int32(stksize))
addr := s.constOffPtrSP(s.f.Config.Types.UInt32Ptr, argStart)
s.store(types.Types[TUINT32], addr, argsize)
addr = s.constOffPtrSP(s.f.Config.Types.UintptrPtr, argStart+int64(Widthptr))
s.store(types.Types[TUINTPTR], addr, closure)
stksize += 2 * int64(Widthptr)
cmd/compile: move argument stack construction to SSA generation The goal of this change is to move work from walk to SSA, and simplify things along the way. This is hard to accomplish cleanly with small incremental changes, so this large commit message aims to provide a roadmap to the diff. High level description: Prior to this change, walk was responsible for constructing (most of) the stack for function calls. ascompatte gathered variadic arguments into a slice. It also rewrote n.List from a list of arguments to a list of assignments to stack slots. ascompatte was called multiple times to handle the receiver in a method call. reorder1 then introduced temporaries into n.List as needed to avoid smashing the stack. adjustargs then made extra stack space for go/defer args as needed. Node to SSA construction evaluated all the statements in n.List, and issued the function call, assuming that the stack was correctly constructed. Intrinsic calls had to dig around inside n.List to extract the arguments, since intrinsics don't use the stack to make function calls. This change moves stack construction to the SSA construction phase. ascompatte, now called walkParams, does all the work that ascompatte and reorder1 did. It handles variadic arguments, inserts the method receiver if needed, and allocates temporaries. It does not, however, make any assignments to stack slots. Instead, it moves the function arguments to n.Rlist, leaving assignments to temporaries in n.List. (It would be better to use Ninit instead of List; future work.) During SSA construction, after doing all the temporary assignments in n.List, the function arguments are assigned to stack slots by constructing the appropriate SSA Value, using (*state).storeArg. SSA construction also now handles adjustments for go/defer args. This change also simplifies intrinsic calls, since we no longer need to undo walk's work. Along the way, we simplify nodarg by pushing the fp==1 case to its callers, where it fits nicely. Generated code differences: There were a few optimizations applied along the way, the old way. f(g()) was rewritten to do a block copy of function results to function arguments. And reorder1 avoided introducing the final "save the stack" temporary in n.List. The f(g()) block copy optimization never actually triggered; the order pass rewrote away g(), so that has been removed. SSA optimizations mostly obviated the need for reorder1's optimization of avoiding the final temporary. The exception was when the temporary's type was not SSA-able; in that case, we got a Move into an autotmp and then an immediate Move onto the stack, with the autotmp never read or used again. This change introduces a new rewrite rule to detect such pointless double Moves and collapse them into a single Move. This is actually more powerful than the original optimization, since the original optimization relied on the imprecise Node.HasCall calculation. The other significant difference in the generated code is that the stack is now constructed completely in SP-offset order. Prior to this change, the stack was constructed somewhat haphazardly: first the final argument that Node.HasCall deemed to require a temporary, then other arguments, then the method receiver, then the defer/go args. SP-offset is probably a good default order. See future work. There are a few minor object file size changes as a result of this change. I investigated some regressions in early versions of this change. One regression (in archive/tar) was the addition of a single CMPQ instruction, which would be eliminated were this TODO from flagalloc to be done: // TODO: Remove original instructions if they are never used. One regression (in text/template) was an ADDQconstmodify that is now a regular MOVQLoad+ADDQconst+MOVQStore, due to an unlucky change in the order in which arguments are written. The argument change order can also now be luckier, so this appears to be a wash. All in all, though there will be minor winners and losers, this change appears to be performance neutral. Future work: Move loading the result of function calls to SSA construction; eliminate OINDREGSP. Consider pushing stack construction deeper into SSA world, perhaps in an arch-specific pass. Among other benefits, this would make it easier to transition to a new calling convention. This would require rethinking the handling of stack conflicts and is non-trivial. Figure out some clean way to indicate that stack construction Stores/Moves do not alias each other, so that subsequent passes may do things like CSE+tighten shared stack setup, do DSE using non-first Stores, etc. This would allow us to eliminate the minor text/template regression. Possibly make assignments to stack slots not treated as statements by DWARF. Compiler benchmarks: name old time/op new time/op delta Template 182ms ± 2% 179ms ± 2% -1.69% (p=0.000 n=47+48) Unicode 86.3ms ± 5% 85.1ms ± 4% -1.36% (p=0.001 n=50+50) GoTypes 646ms ± 1% 642ms ± 1% -0.63% (p=0.000 n=49+48) Compiler 2.89s ± 1% 2.86s ± 2% -1.36% (p=0.000 n=48+50) SSA 8.47s ± 1% 8.37s ± 2% -1.22% (p=0.000 n=47+50) Flate 122ms ± 2% 121ms ± 2% -0.66% (p=0.000 n=47+45) GoParser 147ms ± 2% 146ms ± 2% -0.53% (p=0.006 n=46+49) Reflect 406ms ± 2% 403ms ± 2% -0.76% (p=0.000 n=48+43) Tar 162ms ± 3% 162ms ± 4% ~ (p=0.191 n=46+50) XML 223ms ± 2% 222ms ± 2% -0.37% (p=0.031 n=45+49) [Geo mean] 382ms 378ms -0.89% name old user-time/op new user-time/op delta Template 219ms ± 3% 216ms ± 3% -1.56% (p=0.000 n=50+48) Unicode 109ms ± 6% 109ms ± 5% ~ (p=0.190 n=50+49) GoTypes 836ms ± 2% 828ms ± 2% -0.96% (p=0.000 n=49+48) Compiler 3.87s ± 2% 3.80s ± 1% -1.81% (p=0.000 n=49+46) SSA 12.0s ± 1% 11.8s ± 1% -2.01% (p=0.000 n=48+50) Flate 142ms ± 3% 141ms ± 3% -0.85% (p=0.003 n=50+48) GoParser 178ms ± 4% 175ms ± 4% -1.66% (p=0.000 n=48+46) Reflect 520ms ± 2% 512ms ± 2% -1.44% (p=0.000 n=45+48) Tar 200ms ± 3% 198ms ± 4% -0.61% (p=0.037 n=47+50) XML 277ms ± 3% 275ms ± 3% -0.85% (p=0.000 n=49+48) [Geo mean] 482ms 476ms -1.23% name old alloc/op new alloc/op delta Template 36.1MB ± 0% 35.3MB ± 0% -2.18% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 29.3MB ± 0% -1.58% (p=0.008 n=5+5) GoTypes 125MB ± 0% 123MB ± 0% -2.13% (p=0.008 n=5+5) Compiler 531MB ± 0% 513MB ± 0% -3.40% (p=0.008 n=5+5) SSA 2.00GB ± 0% 1.93GB ± 0% -3.34% (p=0.008 n=5+5) Flate 24.5MB ± 0% 24.3MB ± 0% -1.18% (p=0.008 n=5+5) GoParser 29.4MB ± 0% 28.7MB ± 0% -2.34% (p=0.008 n=5+5) Reflect 87.1MB ± 0% 86.0MB ± 0% -1.33% (p=0.008 n=5+5) Tar 35.3MB ± 0% 34.8MB ± 0% -1.44% (p=0.008 n=5+5) XML 47.9MB ± 0% 47.1MB ± 0% -1.86% (p=0.008 n=5+5) [Geo mean] 82.8MB 81.1MB -2.08% name old allocs/op new allocs/op delta Template 352k ± 0% 347k ± 0% -1.32% (p=0.008 n=5+5) Unicode 342k ± 0% 339k ± 0% -0.66% (p=0.008 n=5+5) GoTypes 1.29M ± 0% 1.27M ± 0% -1.30% (p=0.008 n=5+5) Compiler 4.98M ± 0% 4.87M ± 0% -2.14% (p=0.008 n=5+5) SSA 15.7M ± 0% 15.2M ± 0% -2.86% (p=0.008 n=5+5) Flate 233k ± 0% 231k ± 0% -0.83% (p=0.008 n=5+5) GoParser 296k ± 0% 291k ± 0% -1.54% (p=0.016 n=5+4) Reflect 1.05M ± 0% 1.04M ± 0% -0.65% (p=0.008 n=5+5) Tar 343k ± 0% 339k ± 0% -0.97% (p=0.008 n=5+5) XML 432k ± 0% 426k ± 0% -1.19% (p=0.008 n=5+5) [Geo mean] 815k 804k -1.35% name old object-bytes new object-bytes delta Template 505kB ± 0% 505kB ± 0% -0.01% (p=0.008 n=5+5) Unicode 224kB ± 0% 224kB ± 0% ~ (all equal) GoTypes 1.82MB ± 0% 1.83MB ± 0% +0.06% (p=0.008 n=5+5) Flate 324kB ± 0% 324kB ± 0% +0.00% (p=0.008 n=5+5) GoParser 402kB ± 0% 402kB ± 0% +0.04% (p=0.008 n=5+5) Reflect 1.39MB ± 0% 1.39MB ± 0% -0.01% (p=0.008 n=5+5) Tar 449kB ± 0% 449kB ± 0% -0.02% (p=0.008 n=5+5) XML 598kB ± 0% 597kB ± 0% -0.05% (p=0.008 n=5+5) Change-Id: Ifc9d5c1bd01f90171414b8fb18ffe2290d271143 Reviewed-on: https://go-review.googlesource.com/c/114797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-05-06 12:58:53 -07:00
argStart += 2 * int64(Widthptr)
}
// Set receiver (for interface calls).
if rcvr != nil {
addr := s.constOffPtrSP(s.f.Config.Types.UintptrPtr, argStart)
s.store(types.Types[TUINTPTR], addr, rcvr)
}
// Write args.
t := n.Left.Type
args := n.Rlist.Slice()
if n.Op == OCALLMETH {
f := t.Recv()
s.storeArg(args[0], f.Type, argStart+f.Offset)
args = args[1:]
}
for i, n := range args {
f := t.Params().Field(i)
s.storeArg(n, f.Type, argStart+f.Offset)
}
// call target
var call *ssa.Value
switch {
case k == callDefer:
call = s.newValue1A(ssa.OpStaticCall, types.TypeMem, deferproc, s.mem())
case k == callGo:
call = s.newValue1A(ssa.OpStaticCall, types.TypeMem, newproc, s.mem())
case closure != nil:
// rawLoad because loading the code pointer from a
// closure is always safe, but IsSanitizerSafeAddr
// can't always figure that out currently, and it's
// critical that we not clobber any arguments already
// stored onto the stack.
codeptr = s.rawLoad(types.Types[TUINTPTR], closure)
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
call = s.newValue3(ssa.OpClosureCall, types.TypeMem, codeptr, closure, s.mem())
case codeptr != nil:
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
call = s.newValue2(ssa.OpInterCall, types.TypeMem, codeptr, s.mem())
case sym != nil:
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
call = s.newValue1A(ssa.OpStaticCall, types.TypeMem, sym.Linksym(), s.mem())
default:
Fatalf("bad call type %v %v", n.Op, n)
}
call.AuxInt = stksize // Call operations carry the argsize of the callee along with them
s.vars[&memVar] = call
// Finish block for defers
if k == callDefer {
b := s.endBlock()
b.Kind = ssa.BlockDefer
b.SetControl(call)
bNext := s.f.NewBlock(ssa.BlockPlain)
b.AddEdgeTo(bNext)
// Add recover edge to exit code.
r := s.f.NewBlock(ssa.BlockPlain)
s.startBlock(r)
s.exit()
b.AddEdgeTo(r)
b.Likely = ssa.BranchLikely
s.startBlock(bNext)
}
res := n.Left.Type.Results()
if res.NumFields() == 0 || k != callNormal {
// call has no return value. Continue with the next statement.
return nil
}
fp := res.Field(0)
return s.constOffPtrSP(types.NewPtr(fp.Type), fp.Offset+Ctxt.FixedFrameSize())
}
// etypesign returns the signed-ness of e, for integer/pointer etypes.
// -1 means signed, +1 means unsigned, 0 means non-integer/non-pointer.
func etypesign(e types.EType) int8 {
switch e {
case TINT8, TINT16, TINT32, TINT64, TINT:
return -1
case TUINT8, TUINT16, TUINT32, TUINT64, TUINT, TUINTPTR, TUNSAFEPTR:
return +1
}
return 0
}
// addr converts the address of the expression n to SSA, adds it to s and returns the SSA result.
// The value that the returned Value represents is guaranteed to be non-nil.
// If bounded is true then this address does not require a nil check for its operand
// even if that would otherwise be implied.
func (s *state) addr(n *Node, bounded bool) *ssa.Value {
t := types.NewPtr(n.Type)
switch n.Op {
case ONAME:
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
switch n.Class() {
case PEXTERN:
// global variable
v := s.entryNewValue1A(ssa.OpAddr, t, n.Sym.Linksym(), s.sb)
// TODO: Make OpAddr use AuxInt as well as Aux.
if n.Xoffset != 0 {
v = s.entryNewValue1I(ssa.OpOffPtr, v.Type, n.Xoffset, v)
}
return v
case PPARAM:
// parameter slot
v := s.decladdrs[n]
if v != nil {
return v
}
if n == nodfp {
// Special arg that points to the frame pointer (Used by ORECOVER).
return s.entryNewValue2A(ssa.OpLocalAddr, t, n, s.sp, s.startmem)
}
s.Fatalf("addr of undeclared ONAME %v. declared: %v", n, s.decladdrs)
return nil
case PAUTO:
return s.newValue2Apos(ssa.OpLocalAddr, t, n, s.sp, s.mem(), !n.IsAutoTmp())
case PPARAMOUT: // Same as PAUTO -- cannot generate LEA early.
// ensure that we reuse symbols for out parameters so
// that cse works on their addresses
return s.newValue2Apos(ssa.OpLocalAddr, t, n, s.sp, s.mem(), true)
default:
s.Fatalf("variable address class %v not implemented", n.Class())
return nil
}
case OINDREGSP:
// indirect off REGSP
// used for storing/loading arguments/returns to/from callees
cmd/compile: add OpOffPtr [c] SP to constant cache They accounted for almost 30% of all CSE'd values. By never creating the duplicates in the first place, we reduce the high water mark of Value IDs, which in turn makes all SSA phases cheaper, particularly regalloc. name old time/op new time/op delta Template 200ms ± 3% 198ms ± 4% -0.87% (p=0.016 n=50+49) Unicode 86.9ms ± 2% 85.5ms ± 3% -1.56% (p=0.000 n=49+50) GoTypes 553ms ± 4% 551ms ± 4% ~ (p=0.183 n=50+49) SSA 3.97s ± 3% 3.93s ± 2% -1.06% (p=0.000 n=48+48) Flate 124ms ± 4% 124ms ± 3% ~ (p=0.545 n=48+50) GoParser 146ms ± 4% 146ms ± 4% ~ (p=0.810 n=49+49) Reflect 357ms ± 3% 355ms ± 3% -0.59% (p=0.049 n=50+48) Tar 106ms ± 4% 107ms ± 5% ~ (p=0.454 n=49+50) XML 203ms ± 4% 203ms ± 4% ~ (p=0.726 n=48+50) name old user-ns/op new user-ns/op delta Template 237M ± 3% 235M ± 4% ~ (p=0.208 n=47+48) Unicode 111M ± 4% 108M ± 9% -2.50% (p=0.000 n=47+50) GoTypes 736M ± 5% 729M ± 4% -0.95% (p=0.017 n=50+46) SSA 5.73G ± 4% 5.74G ± 4% ~ (p=0.765 n=50+50) Flate 150M ± 5% 148M ± 6% -0.89% (p=0.045 n=48+47) GoParser 180M ± 5% 178M ± 7% -1.34% (p=0.012 n=50+50) Reflect 450M ± 4% 444M ± 4% -1.40% (p=0.000 n=50+49) Tar 124M ± 7% 123M ± 7% ~ (p=0.092 n=50+50) XML 248M ± 6% 245M ± 5% ~ (p=0.057 n=50+50) name old alloc/op new alloc/op delta Template 39.4MB ± 0% 39.3MB ± 0% -0.37% (p=0.000 n=50+50) Unicode 30.9MB ± 0% 30.9MB ± 0% -0.27% (p=0.000 n=48+50) GoTypes 114MB ± 0% 113MB ± 0% -1.03% (p=0.000 n=50+49) SSA 882MB ± 0% 865MB ± 0% -1.95% (p=0.000 n=49+49) Flate 25.8MB ± 0% 25.7MB ± 0% -0.21% (p=0.000 n=50+50) GoParser 31.7MB ± 0% 31.6MB ± 0% -0.33% (p=0.000 n=50+50) Reflect 79.7MB ± 0% 79.3MB ± 0% -0.49% (p=0.000 n=44+49) Tar 27.2MB ± 0% 27.1MB ± 0% -0.31% (p=0.000 n=50+50) XML 42.7MB ± 0% 42.3MB ± 0% -1.05% (p=0.000 n=48+49) name old allocs/op new allocs/op delta Template 379k ± 1% 380k ± 1% +0.26% (p=0.000 n=50+50) Unicode 324k ± 1% 324k ± 1% ~ (p=0.964 n=49+50) GoTypes 1.14M ± 0% 1.15M ± 0% +0.14% (p=0.000 n=50+49) SSA 7.89M ± 0% 7.89M ± 0% -0.05% (p=0.000 n=49+49) Flate 240k ± 1% 241k ± 1% +0.27% (p=0.001 n=50+50) GoParser 310k ± 1% 311k ± 1% +0.48% (p=0.000 n=50+49) Reflect 1.00M ± 0% 1.00M ± 0% +0.17% (p=0.000 n=48+50) Tar 254k ± 1% 255k ± 1% +0.23% (p=0.005 n=50+50) XML 395k ± 1% 395k ± 1% +0.19% (p=0.002 n=49+47) Change-Id: Iaa8f5f37e23bd81983409f7359f9dcd4dfe2961f Reviewed-on: https://go-review.googlesource.com/38003 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-08 12:50:00 -08:00
return s.constOffPtrSP(t, n.Xoffset)
case OINDEX:
if n.Left.Type.IsSlice() {
a := s.expr(n.Left)
i := s.expr(n.Right)
len := s.newValue1(ssa.OpSliceLen, types.Types[TINT], a)
i = s.boundsCheck(i, len, ssa.BoundsIndex, n.Bounded())
p := s.newValue1(ssa.OpSlicePtr, t, a)
return s.newValue2(ssa.OpPtrIndex, t, p, i)
} else { // array
a := s.addr(n.Left, bounded)
i := s.expr(n.Right)
len := s.constInt(types.Types[TINT], n.Left.Type.NumElem())
i = s.boundsCheck(i, len, ssa.BoundsIndex, n.Bounded())
return s.newValue2(ssa.OpPtrIndex, types.NewPtr(n.Left.Type.Elem()), a, i)
}
cmd/compile: bulk rename This change does a bulk rename of several identifiers in the compiler. See #27167 and https://docs.google.com/document/d/19_ExiylD9MRfeAjKIfEsMU1_RGhuxB9sA0b5Zv7byVI/ for context and for discussion of these particular renames. Commands run to generate this change: gorename -from '"cmd/compile/internal/gc".OPROC' -to OGO gorename -from '"cmd/compile/internal/gc".OCOM' -to OBITNOT gorename -from '"cmd/compile/internal/gc".OMINUS' -to ONEG gorename -from '"cmd/compile/internal/gc".OIND' -to ODEREF gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTR' -to OBYTES2STR gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTRTMP' -to OBYTES2STRTMP gorename -from '"cmd/compile/internal/gc".OARRAYRUNESTR' -to ORUNES2STR gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTE' -to OSTR2BYTES gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTETMP' -to OSTR2BYTESTMP gorename -from '"cmd/compile/internal/gc".OSTRARRAYRUNE' -to OSTR2RUNES gorename -from '"cmd/compile/internal/gc".Etop' -to ctxStmt gorename -from '"cmd/compile/internal/gc".Erv' -to ctxExpr gorename -from '"cmd/compile/internal/gc".Ecall' -to ctxCallee gorename -from '"cmd/compile/internal/gc".Efnstruct' -to ctxMultiOK gorename -from '"cmd/compile/internal/gc".Easgn' -to ctxAssign gorename -from '"cmd/compile/internal/gc".Ecomplit' -to ctxCompLit Not altered: parameters and local variables (mostly in typecheck.go) named top, which should probably now be called ctx (and which should probably have a named type). Also not altered: Field called Top in gc.Func. gorename -from '"cmd/compile/internal/gc".Node.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/gc".Node.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/gc".nodeIsddd' -to nodeIsDDD gorename -from '"cmd/compile/internal/types".Field.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/types".Field.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/types".fieldIsddd' -to fieldIsDDD Not altered: function gc.hasddd, params and local variables called isddd Also not altered: fmt.go prints nodes using "isddd(%v)". cd cmd/compile/internal/gc; go generate I then manually found impacted comments using exact string match and fixed them up by hand. The comment changes were trivial. Passes toolstash-check. Fixes #27167. If this experiment is deemed a success, we will open a new tracking issue for renames to do at the end of the 1.13 cycles. Change-Id: I2dc541533d2ab0d06cb3d31d65df205ecfb151e8 Reviewed-on: https://go-review.googlesource.com/c/150140 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-11-18 08:34:38 -08:00
case ODEREF:
return s.exprPtr(n.Left, bounded, n.Pos)
case ODOT:
p := s.addr(n.Left, bounded)
return s.newValue1I(ssa.OpOffPtr, t, n.Xoffset, p)
case ODOTPTR:
p := s.exprPtr(n.Left, bounded, n.Pos)
return s.newValue1I(ssa.OpOffPtr, t, n.Xoffset, p)
case OCLOSUREVAR:
return s.newValue1I(ssa.OpOffPtr, t, n.Xoffset,
s.entryNewValue0(ssa.OpGetClosurePtr, s.f.Config.Types.BytePtr))
case OCONVNOP:
addr := s.addr(n.Left, bounded)
return s.newValue1(ssa.OpCopy, t, addr) // ensure that addr has the right type
case OCALLFUNC, OCALLINTER, OCALLMETH:
return s.call(n, callNormal)
case ODOTTYPE:
v, _ := s.dottype(n, false)
if v.Op != ssa.OpLoad {
s.Fatalf("dottype of non-load")
}
if v.Args[1] != s.mem() {
s.Fatalf("memory no longer live from dottype load")
}
return v.Args[0]
default:
s.Fatalf("unhandled addr %v", n.Op)
return nil
}
}
// canSSA reports whether n is SSA-able.
// n must be an ONAME (or an ODOT sequence with an ONAME base).
func (s *state) canSSA(n *Node) bool {
if Debug['N'] != 0 {
return false
}
for n.Op == ODOT || (n.Op == OINDEX && n.Left.Type.IsArray()) {
n = n.Left
}
if n.Op != ONAME {
return false
}
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
if n.Addrtaken() {
return false
}
cmd/compile: fix liveness computation for heap-escaped parameters The liveness computation of parameters generally was never correct, but forcing all parameters to be live throughout the function covered up that problem. The new SSA back end is too clever: even though it currently keeps the parameter values live throughout the function, it may find optimizations that mean the current values are not written back to the original parameter stack slots immediately or ever (for example if a parameter is set to nil, SSA constant propagation may replace all later uses of the parameter with a constant nil, eliminating the need to write the nil value back to the stack slot), so the liveness code must now track the actual operations on the stack slots, exposing these problems. One small problem in the handling of arguments is that nodarg can return ONAME PPARAM nodes with adjusted offsets, so that there are actually multiple *Node pointers for the same parameter in the instruction stream. This might be possible to correct, but not in this CL. For now, we fix this by using n.Orig instead of n when considering PPARAM and PPARAMOUT nodes. The major problem in the handling of arguments is general confusion in the liveness code about the meaning of PPARAM|PHEAP and PPARAMOUT|PHEAP nodes, especially as contrasted with PAUTO|PHEAP. The difference between these two is that when a local variable "moves" to the heap, it's really just allocated there to start with; in contrast, when an argument moves to the heap, the actual data has to be copied there from the stack at the beginning of the function, and when a result "moves" to the heap the value in the heap has to be copied back to the stack when the function returns This general confusion is also present in the SSA back end. The PHEAP bit worked decently when I first introduced it 7 years ago (!) in 391425ae. The back end did nothing sophisticated, and in particular there was no analysis at all: no escape analysis, no liveness analysis, and certainly no SSA back end. But the complications caused in the various downstream consumers suggest that this should be a detail kept mainly in the front end. This CL therefore eliminates both the PHEAP bit and even the idea of "heap variables" from the back ends. First, it replaces the PPARAM|PHEAP, PPARAMOUT|PHEAP, and PAUTO|PHEAP variable classes with the single PAUTOHEAP, a pseudo-class indicating a variable maintained on the heap and available by indirecting a local variable kept on the stack (a plain PAUTO). Second, walkexpr replaces all references to PAUTOHEAP variables with indirections of the corresponding PAUTO variable. The back ends and the liveness code now just see plain indirected variables. This may actually produce better code, but the real goal here is to eliminate these little-used and somewhat suspect code paths in the back end analyses. The OPARAM node type goes away too. A followup CL will do the same to PPARAMREF. I'm not sure that the back ends (SSA in particular) are handling those right either, and with the framework established in this CL that change is trivial and the result clearly more correct. Fixes #15747. Change-Id: I2770b1ce3cbc93981bfc7166be66a9da12013d74 Reviewed-on: https://go-review.googlesource.com/23393 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-25 01:33:24 -04:00
if n.isParamHeapCopy() {
return false
}
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if n.Class() == PAUTOHEAP {
cmd/compile: fix liveness computation for heap-escaped parameters The liveness computation of parameters generally was never correct, but forcing all parameters to be live throughout the function covered up that problem. The new SSA back end is too clever: even though it currently keeps the parameter values live throughout the function, it may find optimizations that mean the current values are not written back to the original parameter stack slots immediately or ever (for example if a parameter is set to nil, SSA constant propagation may replace all later uses of the parameter with a constant nil, eliminating the need to write the nil value back to the stack slot), so the liveness code must now track the actual operations on the stack slots, exposing these problems. One small problem in the handling of arguments is that nodarg can return ONAME PPARAM nodes with adjusted offsets, so that there are actually multiple *Node pointers for the same parameter in the instruction stream. This might be possible to correct, but not in this CL. For now, we fix this by using n.Orig instead of n when considering PPARAM and PPARAMOUT nodes. The major problem in the handling of arguments is general confusion in the liveness code about the meaning of PPARAM|PHEAP and PPARAMOUT|PHEAP nodes, especially as contrasted with PAUTO|PHEAP. The difference between these two is that when a local variable "moves" to the heap, it's really just allocated there to start with; in contrast, when an argument moves to the heap, the actual data has to be copied there from the stack at the beginning of the function, and when a result "moves" to the heap the value in the heap has to be copied back to the stack when the function returns This general confusion is also present in the SSA back end. The PHEAP bit worked decently when I first introduced it 7 years ago (!) in 391425ae. The back end did nothing sophisticated, and in particular there was no analysis at all: no escape analysis, no liveness analysis, and certainly no SSA back end. But the complications caused in the various downstream consumers suggest that this should be a detail kept mainly in the front end. This CL therefore eliminates both the PHEAP bit and even the idea of "heap variables" from the back ends. First, it replaces the PPARAM|PHEAP, PPARAMOUT|PHEAP, and PAUTO|PHEAP variable classes with the single PAUTOHEAP, a pseudo-class indicating a variable maintained on the heap and available by indirecting a local variable kept on the stack (a plain PAUTO). Second, walkexpr replaces all references to PAUTOHEAP variables with indirections of the corresponding PAUTO variable. The back ends and the liveness code now just see plain indirected variables. This may actually produce better code, but the real goal here is to eliminate these little-used and somewhat suspect code paths in the back end analyses. The OPARAM node type goes away too. A followup CL will do the same to PPARAMREF. I'm not sure that the back ends (SSA in particular) are handling those right either, and with the framework established in this CL that change is trivial and the result clearly more correct. Fixes #15747. Change-Id: I2770b1ce3cbc93981bfc7166be66a9da12013d74 Reviewed-on: https://go-review.googlesource.com/23393 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-25 01:33:24 -04:00
Fatalf("canSSA of PAUTOHEAP %v", n)
}
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
switch n.Class() {
case PEXTERN:
return false
case PPARAMOUT:
if s.hasdefer {
// TODO: handle this case? Named return values must be
// in memory so that the deferred function can see them.
// Maybe do: if !strings.HasPrefix(n.String(), "~") { return false }
// Or maybe not, see issue 18860. Even unnamed return values
// must be written back so if a defer recovers, the caller can see them.
return false
}
if s.cgoUnsafeArgs {
// Cgo effectively takes the address of all result args,
// but the compiler can't see that.
return false
}
}
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if n.Class() == PPARAM && n.Sym != nil && n.Sym.Name == ".this" {
// wrappers generated by genwrapper need to update
// the .this pointer in place.
// TODO: treat as a PPARMOUT?
return false
}
return canSSAType(n.Type)
// TODO: try to make more variables SSAable?
}
// canSSA reports whether variables of type t are SSA-able.
func canSSAType(t *types.Type) bool {
dowidth(t)
if t.Width > int64(4*Widthptr) {
// 4*Widthptr is an arbitrary constant. We want it
// to be at least 3*Widthptr so slices can be registerized.
// Too big and we'll introduce too much register pressure.
return false
}
switch t.Etype {
case TARRAY:
// We can't do larger arrays because dynamic indexing is
// not supported on SSA variables.
// TODO: allow if all indexes are constant.
if t.NumElem() <= 1 {
return canSSAType(t.Elem())
}
return false
case TSTRUCT:
if t.NumFields() > ssa.MaxStruct {
return false
}
for _, t1 := range t.Fields().Slice() {
if !canSSAType(t1.Type) {
return false
}
}
return true
default:
return true
}
}
// exprPtr evaluates n to a pointer and nil-checks it.
2016-12-15 17:17:01 -08:00
func (s *state) exprPtr(n *Node, bounded bool, lineno src.XPos) *ssa.Value {
p := s.expr(n)
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
if bounded || n.NonNil() {
if s.f.Frontend().Debug_checknil() && lineno.Line() > 1 {
s.f.Warnl(lineno, "removed nil check")
}
return p
}
s.nilCheck(p)
return p
}
// nilCheck generates nil pointer checking code.
// Used only for automatically inserted nil checks,
// not for user code like 'x != nil'.
func (s *state) nilCheck(ptr *ssa.Value) {
if disable_checknil != 0 || s.curfn.Func.NilCheckDisabled() {
return
}
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
s.newValue2(ssa.OpNilCheck, types.TypeVoid, ptr, s.mem())
}
// boundsCheck generates bounds checking code. Checks if 0 <= idx <[=] len, branches to exit if not.
// Starts a new block on return.
// On input, len must be converted to full int width and be nonnegative.
// Returns idx converted to full int width.
// If bounded is true then caller guarantees the index is not out of bounds
// (but boundsCheck will still extend the index to full int width).
func (s *state) boundsCheck(idx, len *ssa.Value, kind ssa.BoundsKind, bounded bool) *ssa.Value {
idx = s.extendIndex(idx, len, kind, bounded)
if bounded || Debug['B'] != 0 {
// If bounded or bounds checking is flag-disabled, then no check necessary,
// just return the extended index.
return idx
}
bNext := s.f.NewBlock(ssa.BlockPlain)
bPanic := s.f.NewBlock(ssa.BlockExit)
if !idx.Type.IsSigned() {
switch kind {
case ssa.BoundsIndex:
kind = ssa.BoundsIndexU
case ssa.BoundsSliceAlen:
kind = ssa.BoundsSliceAlenU
case ssa.BoundsSliceAcap:
kind = ssa.BoundsSliceAcapU
case ssa.BoundsSliceB:
kind = ssa.BoundsSliceBU
case ssa.BoundsSlice3Alen:
kind = ssa.BoundsSlice3AlenU
case ssa.BoundsSlice3Acap:
kind = ssa.BoundsSlice3AcapU
case ssa.BoundsSlice3B:
kind = ssa.BoundsSlice3BU
case ssa.BoundsSlice3C:
kind = ssa.BoundsSlice3CU
}
}
var cmp *ssa.Value
if kind == ssa.BoundsIndex || kind == ssa.BoundsIndexU {
cmp = s.newValue2(ssa.OpIsInBounds, types.Types[TBOOL], idx, len)
} else {
cmp = s.newValue2(ssa.OpIsSliceInBounds, types.Types[TBOOL], idx, len)
}
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(cmp)
b.Likely = ssa.BranchLikely
b.AddEdgeTo(bNext)
b.AddEdgeTo(bPanic)
s.startBlock(bPanic)
if thearch.LinkArch.Family == sys.Wasm {
// TODO(khr): figure out how to do "register" based calling convention for bounds checks.
// Should be similar to gcWriteBarrier, but I can't make it work.
s.rtcall(BoundsCheckFunc[kind], false, nil, idx, len)
} else {
mem := s.newValue3I(ssa.OpPanicBounds, types.TypeMem, int64(kind), idx, len, s.mem())
s.endBlock().SetControl(mem)
}
s.startBlock(bNext)
return idx
}
// If cmp (a bool) is false, panic using the given function.
func (s *state) check(cmp *ssa.Value, fn *obj.LSym) {
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(cmp)
b.Likely = ssa.BranchLikely
bNext := s.f.NewBlock(ssa.BlockPlain)
line := s.peekPos()
cmd/compile: restore panic deduplication The switch to detailed position information broke the removal of duplicate panics on the same line. Restore it. Neutral compiler performance impact: name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.8MB ± 0% ~ (p=0.690 n=5+5) Unicode 28.7MB ± 0% 28.7MB ± 0% +0.13% (p=0.032 n=5+5) GoTypes 109MB ± 0% 109MB ± 0% ~ (p=1.000 n=5+5) Compiler 457MB ± 0% 457MB ± 0% ~ (p=0.151 n=5+5) SSA 1.09GB ± 0% 1.10GB ± 0% +0.17% (p=0.008 n=5+5) Flate 24.6MB ± 0% 24.5MB ± 0% -0.35% (p=0.008 n=5+5) GoParser 30.9MB ± 0% 31.0MB ± 0% ~ (p=0.421 n=5+5) Reflect 73.4MB ± 0% 73.4MB ± 0% ~ (p=0.056 n=5+5) Tar 25.6MB ± 0% 25.5MB ± 0% -0.61% (p=0.008 n=5+5) XML 40.9MB ± 0% 40.9MB ± 0% ~ (p=0.841 n=5+5) [Geo mean] 71.6MB 71.6MB -0.07% name old allocs/op new allocs/op delta Template 394k ± 0% 395k ± 1% ~ (p=0.151 n=5+5) Unicode 343k ± 0% 344k ± 0% +0.38% (p=0.032 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=1.000 n=5+5) Compiler 4.41M ± 0% 4.42M ± 0% ~ (p=0.151 n=5+5) SSA 9.79M ± 0% 9.79M ± 0% ~ (p=0.690 n=5+5) Flate 238k ± 1% 238k ± 0% ~ (p=0.151 n=5+5) GoParser 321k ± 0% 321k ± 1% ~ (p=0.548 n=5+5) Reflect 958k ± 0% 957k ± 0% ~ (p=0.841 n=5+5) Tar 252k ± 0% 252k ± 1% ~ (p=0.151 n=5+5) XML 401k ± 0% 400k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 742k +0.08% Reduces object files a little bit: name old object-bytes new object-bytes delta Template 386k ± 0% 386k ± 0% -0.04% (p=0.008 n=5+5) Unicode 202k ± 0% 202k ± 0% ~ (all equal) GoTypes 1.16M ± 0% 1.16M ± 0% -0.04% (p=0.008 n=5+5) Compiler 3.91M ± 0% 3.91M ± 0% -0.08% (p=0.008 n=5+5) SSA 7.91M ± 0% 7.91M ± 0% -0.04% (p=0.008 n=5+5) Flate 228k ± 0% 227k ± 0% -0.28% (p=0.008 n=5+5) GoParser 283k ± 0% 283k ± 0% -0.01% (p=0.008 n=5+5) Reflect 952k ± 0% 951k ± 0% -0.03% (p=0.008 n=5+5) Tar 188k ± 0% 187k ± 0% -0.09% (p=0.008 n=5+5) XML 406k ± 0% 406k ± 0% -0.04% (p=0.008 n=5+5) [Geo mean] 648k 648k -0.06% This was discovered in the context for the Fannkuch benchmark. It shrinks the number of panicindex calls in that function from 13 back to 9, their 1.8.1 level. It shrinks the function text a bit, from 829 to 801 bytes. It slows down execution a little, presumably due to alignment (?). name old time/op new time/op delta Fannkuch11-8 2.68s ± 2% 2.74s ± 1% +2.09% (p=0.000 n=19+20) After this CL, 1.8.1 and tip are identical: name old time/op new time/op delta Fannkuch11-8 2.74s ± 2% 2.74s ± 1% ~ (p=0.301 n=20+20) Fixes #20332 Change-Id: I2aeacc3e8cf2ac1ff10f36c572a27856f4f8f7c9 Reviewed-on: https://go-review.googlesource.com/43291 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-05-11 11:41:02 -07:00
pos := Ctxt.PosTable.Pos(line)
fl := funcLine{f: fn, base: pos.Base(), line: pos.Line()}
cmd/compile: restore panic deduplication The switch to detailed position information broke the removal of duplicate panics on the same line. Restore it. Neutral compiler performance impact: name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.8MB ± 0% ~ (p=0.690 n=5+5) Unicode 28.7MB ± 0% 28.7MB ± 0% +0.13% (p=0.032 n=5+5) GoTypes 109MB ± 0% 109MB ± 0% ~ (p=1.000 n=5+5) Compiler 457MB ± 0% 457MB ± 0% ~ (p=0.151 n=5+5) SSA 1.09GB ± 0% 1.10GB ± 0% +0.17% (p=0.008 n=5+5) Flate 24.6MB ± 0% 24.5MB ± 0% -0.35% (p=0.008 n=5+5) GoParser 30.9MB ± 0% 31.0MB ± 0% ~ (p=0.421 n=5+5) Reflect 73.4MB ± 0% 73.4MB ± 0% ~ (p=0.056 n=5+5) Tar 25.6MB ± 0% 25.5MB ± 0% -0.61% (p=0.008 n=5+5) XML 40.9MB ± 0% 40.9MB ± 0% ~ (p=0.841 n=5+5) [Geo mean] 71.6MB 71.6MB -0.07% name old allocs/op new allocs/op delta Template 394k ± 0% 395k ± 1% ~ (p=0.151 n=5+5) Unicode 343k ± 0% 344k ± 0% +0.38% (p=0.032 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=1.000 n=5+5) Compiler 4.41M ± 0% 4.42M ± 0% ~ (p=0.151 n=5+5) SSA 9.79M ± 0% 9.79M ± 0% ~ (p=0.690 n=5+5) Flate 238k ± 1% 238k ± 0% ~ (p=0.151 n=5+5) GoParser 321k ± 0% 321k ± 1% ~ (p=0.548 n=5+5) Reflect 958k ± 0% 957k ± 0% ~ (p=0.841 n=5+5) Tar 252k ± 0% 252k ± 1% ~ (p=0.151 n=5+5) XML 401k ± 0% 400k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 742k +0.08% Reduces object files a little bit: name old object-bytes new object-bytes delta Template 386k ± 0% 386k ± 0% -0.04% (p=0.008 n=5+5) Unicode 202k ± 0% 202k ± 0% ~ (all equal) GoTypes 1.16M ± 0% 1.16M ± 0% -0.04% (p=0.008 n=5+5) Compiler 3.91M ± 0% 3.91M ± 0% -0.08% (p=0.008 n=5+5) SSA 7.91M ± 0% 7.91M ± 0% -0.04% (p=0.008 n=5+5) Flate 228k ± 0% 227k ± 0% -0.28% (p=0.008 n=5+5) GoParser 283k ± 0% 283k ± 0% -0.01% (p=0.008 n=5+5) Reflect 952k ± 0% 951k ± 0% -0.03% (p=0.008 n=5+5) Tar 188k ± 0% 187k ± 0% -0.09% (p=0.008 n=5+5) XML 406k ± 0% 406k ± 0% -0.04% (p=0.008 n=5+5) [Geo mean] 648k 648k -0.06% This was discovered in the context for the Fannkuch benchmark. It shrinks the number of panicindex calls in that function from 13 back to 9, their 1.8.1 level. It shrinks the function text a bit, from 829 to 801 bytes. It slows down execution a little, presumably due to alignment (?). name old time/op new time/op delta Fannkuch11-8 2.68s ± 2% 2.74s ± 1% +2.09% (p=0.000 n=19+20) After this CL, 1.8.1 and tip are identical: name old time/op new time/op delta Fannkuch11-8 2.74s ± 2% 2.74s ± 1% ~ (p=0.301 n=20+20) Fixes #20332 Change-Id: I2aeacc3e8cf2ac1ff10f36c572a27856f4f8f7c9 Reviewed-on: https://go-review.googlesource.com/43291 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-05-11 11:41:02 -07:00
bPanic := s.panics[fl]
if bPanic == nil {
bPanic = s.f.NewBlock(ssa.BlockPlain)
cmd/compile: restore panic deduplication The switch to detailed position information broke the removal of duplicate panics on the same line. Restore it. Neutral compiler performance impact: name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.8MB ± 0% ~ (p=0.690 n=5+5) Unicode 28.7MB ± 0% 28.7MB ± 0% +0.13% (p=0.032 n=5+5) GoTypes 109MB ± 0% 109MB ± 0% ~ (p=1.000 n=5+5) Compiler 457MB ± 0% 457MB ± 0% ~ (p=0.151 n=5+5) SSA 1.09GB ± 0% 1.10GB ± 0% +0.17% (p=0.008 n=5+5) Flate 24.6MB ± 0% 24.5MB ± 0% -0.35% (p=0.008 n=5+5) GoParser 30.9MB ± 0% 31.0MB ± 0% ~ (p=0.421 n=5+5) Reflect 73.4MB ± 0% 73.4MB ± 0% ~ (p=0.056 n=5+5) Tar 25.6MB ± 0% 25.5MB ± 0% -0.61% (p=0.008 n=5+5) XML 40.9MB ± 0% 40.9MB ± 0% ~ (p=0.841 n=5+5) [Geo mean] 71.6MB 71.6MB -0.07% name old allocs/op new allocs/op delta Template 394k ± 0% 395k ± 1% ~ (p=0.151 n=5+5) Unicode 343k ± 0% 344k ± 0% +0.38% (p=0.032 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=1.000 n=5+5) Compiler 4.41M ± 0% 4.42M ± 0% ~ (p=0.151 n=5+5) SSA 9.79M ± 0% 9.79M ± 0% ~ (p=0.690 n=5+5) Flate 238k ± 1% 238k ± 0% ~ (p=0.151 n=5+5) GoParser 321k ± 0% 321k ± 1% ~ (p=0.548 n=5+5) Reflect 958k ± 0% 957k ± 0% ~ (p=0.841 n=5+5) Tar 252k ± 0% 252k ± 1% ~ (p=0.151 n=5+5) XML 401k ± 0% 400k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 742k +0.08% Reduces object files a little bit: name old object-bytes new object-bytes delta Template 386k ± 0% 386k ± 0% -0.04% (p=0.008 n=5+5) Unicode 202k ± 0% 202k ± 0% ~ (all equal) GoTypes 1.16M ± 0% 1.16M ± 0% -0.04% (p=0.008 n=5+5) Compiler 3.91M ± 0% 3.91M ± 0% -0.08% (p=0.008 n=5+5) SSA 7.91M ± 0% 7.91M ± 0% -0.04% (p=0.008 n=5+5) Flate 228k ± 0% 227k ± 0% -0.28% (p=0.008 n=5+5) GoParser 283k ± 0% 283k ± 0% -0.01% (p=0.008 n=5+5) Reflect 952k ± 0% 951k ± 0% -0.03% (p=0.008 n=5+5) Tar 188k ± 0% 187k ± 0% -0.09% (p=0.008 n=5+5) XML 406k ± 0% 406k ± 0% -0.04% (p=0.008 n=5+5) [Geo mean] 648k 648k -0.06% This was discovered in the context for the Fannkuch benchmark. It shrinks the number of panicindex calls in that function from 13 back to 9, their 1.8.1 level. It shrinks the function text a bit, from 829 to 801 bytes. It slows down execution a little, presumably due to alignment (?). name old time/op new time/op delta Fannkuch11-8 2.68s ± 2% 2.74s ± 1% +2.09% (p=0.000 n=19+20) After this CL, 1.8.1 and tip are identical: name old time/op new time/op delta Fannkuch11-8 2.74s ± 2% 2.74s ± 1% ~ (p=0.301 n=20+20) Fixes #20332 Change-Id: I2aeacc3e8cf2ac1ff10f36c572a27856f4f8f7c9 Reviewed-on: https://go-review.googlesource.com/43291 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-05-11 11:41:02 -07:00
s.panics[fl] = bPanic
s.startBlock(bPanic)
// The panic call takes/returns memory to ensure that the right
// memory state is observed if the panic happens.
s.rtcall(fn, false, nil)
}
b.AddEdgeTo(bNext)
b.AddEdgeTo(bPanic)
s.startBlock(bNext)
}
func (s *state) intDivide(n *Node, a, b *ssa.Value) *ssa.Value {
needcheck := true
switch b.Op {
case ssa.OpConst8, ssa.OpConst16, ssa.OpConst32, ssa.OpConst64:
if b.AuxInt != 0 {
needcheck = false
}
}
if needcheck {
// do a size-appropriate check for zero
cmp := s.newValue2(s.ssaOp(ONE, n.Type), types.Types[TBOOL], b, s.zeroVal(n.Type))
s.check(cmp, panicdivide)
}
return s.newValue2(s.ssaOp(n.Op, n.Type), a.Type, a, b)
}
// rtcall issues a call to the given runtime function fn with the listed args.
// Returns a slice of results of the given result types.
// The call is added to the end of the current block.
// If returns is false, the block is marked as an exit block.
func (s *state) rtcall(fn *obj.LSym, returns bool, results []*types.Type, args ...*ssa.Value) []*ssa.Value {
// Write args to the stack
off := Ctxt.FixedFrameSize()
for _, arg := range args {
t := arg.Type
off = Rnd(off, t.Alignment())
cmd/compile: add OpOffPtr [c] SP to constant cache They accounted for almost 30% of all CSE'd values. By never creating the duplicates in the first place, we reduce the high water mark of Value IDs, which in turn makes all SSA phases cheaper, particularly regalloc. name old time/op new time/op delta Template 200ms ± 3% 198ms ± 4% -0.87% (p=0.016 n=50+49) Unicode 86.9ms ± 2% 85.5ms ± 3% -1.56% (p=0.000 n=49+50) GoTypes 553ms ± 4% 551ms ± 4% ~ (p=0.183 n=50+49) SSA 3.97s ± 3% 3.93s ± 2% -1.06% (p=0.000 n=48+48) Flate 124ms ± 4% 124ms ± 3% ~ (p=0.545 n=48+50) GoParser 146ms ± 4% 146ms ± 4% ~ (p=0.810 n=49+49) Reflect 357ms ± 3% 355ms ± 3% -0.59% (p=0.049 n=50+48) Tar 106ms ± 4% 107ms ± 5% ~ (p=0.454 n=49+50) XML 203ms ± 4% 203ms ± 4% ~ (p=0.726 n=48+50) name old user-ns/op new user-ns/op delta Template 237M ± 3% 235M ± 4% ~ (p=0.208 n=47+48) Unicode 111M ± 4% 108M ± 9% -2.50% (p=0.000 n=47+50) GoTypes 736M ± 5% 729M ± 4% -0.95% (p=0.017 n=50+46) SSA 5.73G ± 4% 5.74G ± 4% ~ (p=0.765 n=50+50) Flate 150M ± 5% 148M ± 6% -0.89% (p=0.045 n=48+47) GoParser 180M ± 5% 178M ± 7% -1.34% (p=0.012 n=50+50) Reflect 450M ± 4% 444M ± 4% -1.40% (p=0.000 n=50+49) Tar 124M ± 7% 123M ± 7% ~ (p=0.092 n=50+50) XML 248M ± 6% 245M ± 5% ~ (p=0.057 n=50+50) name old alloc/op new alloc/op delta Template 39.4MB ± 0% 39.3MB ± 0% -0.37% (p=0.000 n=50+50) Unicode 30.9MB ± 0% 30.9MB ± 0% -0.27% (p=0.000 n=48+50) GoTypes 114MB ± 0% 113MB ± 0% -1.03% (p=0.000 n=50+49) SSA 882MB ± 0% 865MB ± 0% -1.95% (p=0.000 n=49+49) Flate 25.8MB ± 0% 25.7MB ± 0% -0.21% (p=0.000 n=50+50) GoParser 31.7MB ± 0% 31.6MB ± 0% -0.33% (p=0.000 n=50+50) Reflect 79.7MB ± 0% 79.3MB ± 0% -0.49% (p=0.000 n=44+49) Tar 27.2MB ± 0% 27.1MB ± 0% -0.31% (p=0.000 n=50+50) XML 42.7MB ± 0% 42.3MB ± 0% -1.05% (p=0.000 n=48+49) name old allocs/op new allocs/op delta Template 379k ± 1% 380k ± 1% +0.26% (p=0.000 n=50+50) Unicode 324k ± 1% 324k ± 1% ~ (p=0.964 n=49+50) GoTypes 1.14M ± 0% 1.15M ± 0% +0.14% (p=0.000 n=50+49) SSA 7.89M ± 0% 7.89M ± 0% -0.05% (p=0.000 n=49+49) Flate 240k ± 1% 241k ± 1% +0.27% (p=0.001 n=50+50) GoParser 310k ± 1% 311k ± 1% +0.48% (p=0.000 n=50+49) Reflect 1.00M ± 0% 1.00M ± 0% +0.17% (p=0.000 n=48+50) Tar 254k ± 1% 255k ± 1% +0.23% (p=0.005 n=50+50) XML 395k ± 1% 395k ± 1% +0.19% (p=0.002 n=49+47) Change-Id: Iaa8f5f37e23bd81983409f7359f9dcd4dfe2961f Reviewed-on: https://go-review.googlesource.com/38003 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-08 12:50:00 -08:00
ptr := s.constOffPtrSP(t.PtrTo(), off)
size := t.Size()
s.store(t, ptr, arg)
off += size
}
off = Rnd(off, int64(Widthreg))
// Issue call
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
call := s.newValue1A(ssa.OpStaticCall, types.TypeMem, fn, s.mem())
s.vars[&memVar] = call
if !returns {
// Finish block
b := s.endBlock()
b.Kind = ssa.BlockExit
b.SetControl(call)
[dev.ssa] cmd/compile: fix argument size of runtime call in SSA for ARM The argument size for runtime call was incorrectly includes the size of LR (FixedFrameSize in general). This makes the stack frame sometimes unnecessarily 4 bytes larger on ARM. For example, func f(b []byte) byte { return b[0] } compiles to 0x0000 00000 (h.go:6) TEXT "".f(SB), $4-16 // <-- framesize = 4 0x0000 00000 (h.go:6) MOVW 8(g), R1 0x0004 00004 (h.go:6) CMP R1, R13 0x0008 00008 (h.go:6) BLS 52 0x000c 00012 (h.go:6) MOVW.W R14, -8(R13) 0x0010 00016 (h.go:6) FUNCDATA $0, gclocals·8355ad952265fec823c17fcf739bd009(SB) 0x0010 00016 (h.go:6) FUNCDATA $1, gclocals·69c1753bd5f81501d95132d08af04464(SB) 0x0010 00016 (h.go:6) MOVW "".b+4(FP), R0 0x0014 00020 (h.go:6) CMP $0, R0 0x0018 00024 (h.go:6) BLS 44 0x001c 00028 (h.go:6) MOVW "".b(FP), R0 0x0020 00032 (h.go:6) MOVBU (R0), R0 0x0024 00036 (h.go:6) MOVB R0, "".~r1+12(FP) 0x0028 00040 (h.go:6) MOVW.P 8(R13), R15 0x002c 00044 (h.go:6) PCDATA $0, $1 0x002c 00044 (h.go:6) CALL runtime.panicindex(SB) 0x0030 00048 (h.go:6) UNDEF 0x0034 00052 (h.go:6) NOP 0x0034 00052 (h.go:6) MOVW R14, R3 0x0038 00056 (h.go:6) CALL runtime.morestack_noctxt(SB) 0x003c 00060 (h.go:6) JMP 0 Note that the frame size is 4, but there is actually no local. It incorrectly thinks call to runtime.panicindex needs 4 bytes space for argument. This CL fixes it. Updates #15365. Change-Id: Ic65d55283a6aa8a7861d7a3fbc7b63c35785eeec Reviewed-on: https://go-review.googlesource.com/24909 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-07-13 09:22:48 -06:00
call.AuxInt = off - Ctxt.FixedFrameSize()
if len(results) > 0 {
Fatalf("panic call can't have results")
}
return nil
}
// Load results
res := make([]*ssa.Value, len(results))
for i, t := range results {
off = Rnd(off, t.Alignment())
ptr := s.constOffPtrSP(types.NewPtr(t), off)
res[i] = s.load(t, ptr)
off += t.Size()
}
off = Rnd(off, int64(Widthptr))
// Remember how much callee stack space we needed.
call.AuxInt = off
return res
}
// do *left = right for type t.
func (s *state) storeType(t *types.Type, left, right *ssa.Value, skip skipMask, leftIsStmt bool) {
s.instrument(t, left, true)
if skip == 0 && (!types.Haspointers(t) || ssa.IsStackAddr(left)) {
// Known to not have write barrier. Store the whole type.
s.vars[&memVar] = s.newValue3Apos(ssa.OpStore, types.TypeMem, t, left, right, s.mem(), leftIsStmt)
return
}
// store scalar fields first, so write barrier stores for
// pointer fields can be grouped together, and scalar values
// don't need to be live across the write barrier call.
// TODO: if the writebarrier pass knows how to reorder stores,
// we can do a single store here as long as skip==0.
s.storeTypeScalars(t, left, right, skip)
if skip&skipPtr == 0 && types.Haspointers(t) {
s.storeTypePtrs(t, left, right)
}
}
// do *left = right for all scalar (non-pointer) parts of t.
func (s *state) storeTypeScalars(t *types.Type, left, right *ssa.Value, skip skipMask) {
switch {
case t.IsBoolean() || t.IsInteger() || t.IsFloat() || t.IsComplex():
s.store(t, left, right)
case t.IsPtrShaped():
// no scalar fields.
case t.IsString():
if skip&skipLen != 0 {
return
}
len := s.newValue1(ssa.OpStringLen, types.Types[TINT], right)
lenAddr := s.newValue1I(ssa.OpOffPtr, s.f.Config.Types.IntPtr, s.config.PtrSize, left)
s.store(types.Types[TINT], lenAddr, len)
case t.IsSlice():
if skip&skipLen == 0 {
len := s.newValue1(ssa.OpSliceLen, types.Types[TINT], right)
lenAddr := s.newValue1I(ssa.OpOffPtr, s.f.Config.Types.IntPtr, s.config.PtrSize, left)
s.store(types.Types[TINT], lenAddr, len)
}
if skip&skipCap == 0 {
cap := s.newValue1(ssa.OpSliceCap, types.Types[TINT], right)
capAddr := s.newValue1I(ssa.OpOffPtr, s.f.Config.Types.IntPtr, 2*s.config.PtrSize, left)
s.store(types.Types[TINT], capAddr, cap)
}
case t.IsInterface():
// itab field doesn't need a write barrier (even though it is a pointer).
itab := s.newValue1(ssa.OpITab, s.f.Config.Types.BytePtr, right)
s.store(types.Types[TUINTPTR], left, itab)
case t.IsStruct():
n := t.NumFields()
for i := 0; i < n; i++ {
ft := t.FieldType(i)
addr := s.newValue1I(ssa.OpOffPtr, ft.PtrTo(), t.FieldOff(i), left)
val := s.newValue1I(ssa.OpStructSelect, ft, int64(i), right)
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
s.storeTypeScalars(ft, addr, val, 0)
}
case t.IsArray() && t.NumElem() == 0:
// nothing
case t.IsArray() && t.NumElem() == 1:
s.storeTypeScalars(t.Elem(), left, s.newValue1I(ssa.OpArraySelect, t.Elem(), 0, right), 0)
default:
s.Fatalf("bad write barrier type %v", t)
}
}
// do *left = right for all pointer parts of t.
func (s *state) storeTypePtrs(t *types.Type, left, right *ssa.Value) {
switch {
case t.IsPtrShaped():
s.store(t, left, right)
case t.IsString():
ptr := s.newValue1(ssa.OpStringPtr, s.f.Config.Types.BytePtr, right)
s.store(s.f.Config.Types.BytePtr, left, ptr)
case t.IsSlice():
elType := types.NewPtr(t.Elem())
ptr := s.newValue1(ssa.OpSlicePtr, elType, right)
s.store(elType, left, ptr)
case t.IsInterface():
// itab field is treated as a scalar.
idata := s.newValue1(ssa.OpIData, s.f.Config.Types.BytePtr, right)
idataAddr := s.newValue1I(ssa.OpOffPtr, s.f.Config.Types.BytePtrPtr, s.config.PtrSize, left)
s.store(s.f.Config.Types.BytePtr, idataAddr, idata)
case t.IsStruct():
n := t.NumFields()
for i := 0; i < n; i++ {
ft := t.FieldType(i)
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
if !types.Haspointers(ft) {
continue
}
addr := s.newValue1I(ssa.OpOffPtr, ft.PtrTo(), t.FieldOff(i), left)
val := s.newValue1I(ssa.OpStructSelect, ft, int64(i), right)
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
s.storeTypePtrs(ft, addr, val)
}
case t.IsArray() && t.NumElem() == 0:
// nothing
case t.IsArray() && t.NumElem() == 1:
s.storeTypePtrs(t.Elem(), left, s.newValue1I(ssa.OpArraySelect, t.Elem(), 0, right))
default:
s.Fatalf("bad write barrier type %v", t)
}
}
cmd/compile: move argument stack construction to SSA generation The goal of this change is to move work from walk to SSA, and simplify things along the way. This is hard to accomplish cleanly with small incremental changes, so this large commit message aims to provide a roadmap to the diff. High level description: Prior to this change, walk was responsible for constructing (most of) the stack for function calls. ascompatte gathered variadic arguments into a slice. It also rewrote n.List from a list of arguments to a list of assignments to stack slots. ascompatte was called multiple times to handle the receiver in a method call. reorder1 then introduced temporaries into n.List as needed to avoid smashing the stack. adjustargs then made extra stack space for go/defer args as needed. Node to SSA construction evaluated all the statements in n.List, and issued the function call, assuming that the stack was correctly constructed. Intrinsic calls had to dig around inside n.List to extract the arguments, since intrinsics don't use the stack to make function calls. This change moves stack construction to the SSA construction phase. ascompatte, now called walkParams, does all the work that ascompatte and reorder1 did. It handles variadic arguments, inserts the method receiver if needed, and allocates temporaries. It does not, however, make any assignments to stack slots. Instead, it moves the function arguments to n.Rlist, leaving assignments to temporaries in n.List. (It would be better to use Ninit instead of List; future work.) During SSA construction, after doing all the temporary assignments in n.List, the function arguments are assigned to stack slots by constructing the appropriate SSA Value, using (*state).storeArg. SSA construction also now handles adjustments for go/defer args. This change also simplifies intrinsic calls, since we no longer need to undo walk's work. Along the way, we simplify nodarg by pushing the fp==1 case to its callers, where it fits nicely. Generated code differences: There were a few optimizations applied along the way, the old way. f(g()) was rewritten to do a block copy of function results to function arguments. And reorder1 avoided introducing the final "save the stack" temporary in n.List. The f(g()) block copy optimization never actually triggered; the order pass rewrote away g(), so that has been removed. SSA optimizations mostly obviated the need for reorder1's optimization of avoiding the final temporary. The exception was when the temporary's type was not SSA-able; in that case, we got a Move into an autotmp and then an immediate Move onto the stack, with the autotmp never read or used again. This change introduces a new rewrite rule to detect such pointless double Moves and collapse them into a single Move. This is actually more powerful than the original optimization, since the original optimization relied on the imprecise Node.HasCall calculation. The other significant difference in the generated code is that the stack is now constructed completely in SP-offset order. Prior to this change, the stack was constructed somewhat haphazardly: first the final argument that Node.HasCall deemed to require a temporary, then other arguments, then the method receiver, then the defer/go args. SP-offset is probably a good default order. See future work. There are a few minor object file size changes as a result of this change. I investigated some regressions in early versions of this change. One regression (in archive/tar) was the addition of a single CMPQ instruction, which would be eliminated were this TODO from flagalloc to be done: // TODO: Remove original instructions if they are never used. One regression (in text/template) was an ADDQconstmodify that is now a regular MOVQLoad+ADDQconst+MOVQStore, due to an unlucky change in the order in which arguments are written. The argument change order can also now be luckier, so this appears to be a wash. All in all, though there will be minor winners and losers, this change appears to be performance neutral. Future work: Move loading the result of function calls to SSA construction; eliminate OINDREGSP. Consider pushing stack construction deeper into SSA world, perhaps in an arch-specific pass. Among other benefits, this would make it easier to transition to a new calling convention. This would require rethinking the handling of stack conflicts and is non-trivial. Figure out some clean way to indicate that stack construction Stores/Moves do not alias each other, so that subsequent passes may do things like CSE+tighten shared stack setup, do DSE using non-first Stores, etc. This would allow us to eliminate the minor text/template regression. Possibly make assignments to stack slots not treated as statements by DWARF. Compiler benchmarks: name old time/op new time/op delta Template 182ms ± 2% 179ms ± 2% -1.69% (p=0.000 n=47+48) Unicode 86.3ms ± 5% 85.1ms ± 4% -1.36% (p=0.001 n=50+50) GoTypes 646ms ± 1% 642ms ± 1% -0.63% (p=0.000 n=49+48) Compiler 2.89s ± 1% 2.86s ± 2% -1.36% (p=0.000 n=48+50) SSA 8.47s ± 1% 8.37s ± 2% -1.22% (p=0.000 n=47+50) Flate 122ms ± 2% 121ms ± 2% -0.66% (p=0.000 n=47+45) GoParser 147ms ± 2% 146ms ± 2% -0.53% (p=0.006 n=46+49) Reflect 406ms ± 2% 403ms ± 2% -0.76% (p=0.000 n=48+43) Tar 162ms ± 3% 162ms ± 4% ~ (p=0.191 n=46+50) XML 223ms ± 2% 222ms ± 2% -0.37% (p=0.031 n=45+49) [Geo mean] 382ms 378ms -0.89% name old user-time/op new user-time/op delta Template 219ms ± 3% 216ms ± 3% -1.56% (p=0.000 n=50+48) Unicode 109ms ± 6% 109ms ± 5% ~ (p=0.190 n=50+49) GoTypes 836ms ± 2% 828ms ± 2% -0.96% (p=0.000 n=49+48) Compiler 3.87s ± 2% 3.80s ± 1% -1.81% (p=0.000 n=49+46) SSA 12.0s ± 1% 11.8s ± 1% -2.01% (p=0.000 n=48+50) Flate 142ms ± 3% 141ms ± 3% -0.85% (p=0.003 n=50+48) GoParser 178ms ± 4% 175ms ± 4% -1.66% (p=0.000 n=48+46) Reflect 520ms ± 2% 512ms ± 2% -1.44% (p=0.000 n=45+48) Tar 200ms ± 3% 198ms ± 4% -0.61% (p=0.037 n=47+50) XML 277ms ± 3% 275ms ± 3% -0.85% (p=0.000 n=49+48) [Geo mean] 482ms 476ms -1.23% name old alloc/op new alloc/op delta Template 36.1MB ± 0% 35.3MB ± 0% -2.18% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 29.3MB ± 0% -1.58% (p=0.008 n=5+5) GoTypes 125MB ± 0% 123MB ± 0% -2.13% (p=0.008 n=5+5) Compiler 531MB ± 0% 513MB ± 0% -3.40% (p=0.008 n=5+5) SSA 2.00GB ± 0% 1.93GB ± 0% -3.34% (p=0.008 n=5+5) Flate 24.5MB ± 0% 24.3MB ± 0% -1.18% (p=0.008 n=5+5) GoParser 29.4MB ± 0% 28.7MB ± 0% -2.34% (p=0.008 n=5+5) Reflect 87.1MB ± 0% 86.0MB ± 0% -1.33% (p=0.008 n=5+5) Tar 35.3MB ± 0% 34.8MB ± 0% -1.44% (p=0.008 n=5+5) XML 47.9MB ± 0% 47.1MB ± 0% -1.86% (p=0.008 n=5+5) [Geo mean] 82.8MB 81.1MB -2.08% name old allocs/op new allocs/op delta Template 352k ± 0% 347k ± 0% -1.32% (p=0.008 n=5+5) Unicode 342k ± 0% 339k ± 0% -0.66% (p=0.008 n=5+5) GoTypes 1.29M ± 0% 1.27M ± 0% -1.30% (p=0.008 n=5+5) Compiler 4.98M ± 0% 4.87M ± 0% -2.14% (p=0.008 n=5+5) SSA 15.7M ± 0% 15.2M ± 0% -2.86% (p=0.008 n=5+5) Flate 233k ± 0% 231k ± 0% -0.83% (p=0.008 n=5+5) GoParser 296k ± 0% 291k ± 0% -1.54% (p=0.016 n=5+4) Reflect 1.05M ± 0% 1.04M ± 0% -0.65% (p=0.008 n=5+5) Tar 343k ± 0% 339k ± 0% -0.97% (p=0.008 n=5+5) XML 432k ± 0% 426k ± 0% -1.19% (p=0.008 n=5+5) [Geo mean] 815k 804k -1.35% name old object-bytes new object-bytes delta Template 505kB ± 0% 505kB ± 0% -0.01% (p=0.008 n=5+5) Unicode 224kB ± 0% 224kB ± 0% ~ (all equal) GoTypes 1.82MB ± 0% 1.83MB ± 0% +0.06% (p=0.008 n=5+5) Flate 324kB ± 0% 324kB ± 0% +0.00% (p=0.008 n=5+5) GoParser 402kB ± 0% 402kB ± 0% +0.04% (p=0.008 n=5+5) Reflect 1.39MB ± 0% 1.39MB ± 0% -0.01% (p=0.008 n=5+5) Tar 449kB ± 0% 449kB ± 0% -0.02% (p=0.008 n=5+5) XML 598kB ± 0% 597kB ± 0% -0.05% (p=0.008 n=5+5) Change-Id: Ifc9d5c1bd01f90171414b8fb18ffe2290d271143 Reviewed-on: https://go-review.googlesource.com/c/114797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-05-06 12:58:53 -07:00
func (s *state) storeArg(n *Node, t *types.Type, off int64) {
pt := types.NewPtr(t)
sp := s.constOffPtrSP(pt, off)
if !canSSAType(t) {
a := s.addr(n, false)
s.move(t, sp, a)
return
}
a := s.expr(n)
s.storeType(t, sp, a, 0, false)
}
// slice computes the slice v[i:j:k] and returns ptr, len, and cap of result.
// i,j,k may be nil, in which case they are set to their default value.
// v may be a slice, string or pointer to an array.
func (s *state) slice(v, i, j, k *ssa.Value, bounded bool) (p, l, c *ssa.Value) {
t := v.Type
var ptr, len, cap *ssa.Value
switch {
case t.IsSlice():
ptr = s.newValue1(ssa.OpSlicePtr, types.NewPtr(t.Elem()), v)
len = s.newValue1(ssa.OpSliceLen, types.Types[TINT], v)
cap = s.newValue1(ssa.OpSliceCap, types.Types[TINT], v)
case t.IsString():
ptr = s.newValue1(ssa.OpStringPtr, types.NewPtr(types.Types[TUINT8]), v)
len = s.newValue1(ssa.OpStringLen, types.Types[TINT], v)
cap = len
case t.IsPtr():
if !t.Elem().IsArray() {
s.Fatalf("bad ptr to array in slice %v\n", t)
}
s.nilCheck(v)
ptr = s.newValue1(ssa.OpCopy, types.NewPtr(t.Elem().Elem()), v)
len = s.constInt(types.Types[TINT], t.Elem().NumElem())
cap = len
default:
s.Fatalf("bad type in slice %v\n", t)
}
// Set default values
if i == nil {
i = s.constInt(types.Types[TINT], 0)
}
if j == nil {
j = len
}
three := true
if k == nil {
three = false
k = cap
}
// Panic if slice indices are not in bounds.
// Make sure we check these in reverse order so that we're always
// comparing against a value known to be nonnegative. See issue 28797.
if three {
if k != cap {
kind := ssa.BoundsSlice3Alen
if t.IsSlice() {
kind = ssa.BoundsSlice3Acap
}
k = s.boundsCheck(k, cap, kind, bounded)
}
if j != k {
j = s.boundsCheck(j, k, ssa.BoundsSlice3B, bounded)
}
i = s.boundsCheck(i, j, ssa.BoundsSlice3C, bounded)
} else {
if j != k {
kind := ssa.BoundsSliceAlen
if t.IsSlice() {
kind = ssa.BoundsSliceAcap
}
j = s.boundsCheck(j, k, kind, bounded)
}
i = s.boundsCheck(i, j, ssa.BoundsSliceB, bounded)
}
// Word-sized integer operations.
subOp := s.ssaOp(OSUB, types.Types[TINT])
mulOp := s.ssaOp(OMUL, types.Types[TINT])
andOp := s.ssaOp(OAND, types.Types[TINT])
// Calculate the length (rlen) and capacity (rcap) of the new slice.
// For strings the capacity of the result is unimportant. However,
// we use rcap to test if we've generated a zero-length slice.
// Use length of strings for that.
rlen := s.newValue2(subOp, types.Types[TINT], j, i)
rcap := rlen
if j != k && !t.IsString() {
rcap = s.newValue2(subOp, types.Types[TINT], k, i)
}
if (i.Op == ssa.OpConst64 || i.Op == ssa.OpConst32) && i.AuxInt == 0 {
// No pointer arithmetic necessary.
return ptr, rlen, rcap
}
// Calculate the base pointer (rptr) for the new slice.
//
// Generate the following code assuming that indexes are in bounds.
// The masking is to make sure that we don't generate a slice
// that points to the next object in memory. We cannot just set
// the pointer to nil because then we would create a nil slice or
// string.
//
// rcap = k - i
// rlen = j - i
// rptr = ptr + (mask(rcap) & (i * stride))
//
// Where mask(x) is 0 if x==0 and -1 if x>0 and stride is the width
// of the element type.
stride := s.constInt(types.Types[TINT], ptr.Type.Elem().Width)
// The delta is the number of bytes to offset ptr by.
delta := s.newValue2(mulOp, types.Types[TINT], i, stride)
// If we're slicing to the point where the capacity is zero,
// zero out the delta.
mask := s.newValue1(ssa.OpSlicemask, types.Types[TINT], rcap)
delta = s.newValue2(andOp, types.Types[TINT], delta, mask)
// Compute rptr = ptr + delta.
rptr := s.newValue2(ssa.OpAddPtr, ptr.Type, ptr, delta)
return rptr, rlen, rcap
}
type u642fcvtTab struct {
geq, cvt2F, and, rsh, or, add ssa.Op
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
one func(*state, *types.Type, int64) *ssa.Value
}
var u64_f64 = u642fcvtTab{
geq: ssa.OpGeq64,
cvt2F: ssa.OpCvt64to64F,
and: ssa.OpAnd64,
rsh: ssa.OpRsh64Ux64,
or: ssa.OpOr64,
add: ssa.OpAdd64F,
one: (*state).constInt64,
}
var u64_f32 = u642fcvtTab{
geq: ssa.OpGeq64,
cvt2F: ssa.OpCvt64to32F,
and: ssa.OpAnd64,
rsh: ssa.OpRsh64Ux64,
or: ssa.OpOr64,
add: ssa.OpAdd32F,
one: (*state).constInt64,
}
func (s *state) uint64Tofloat64(n *Node, x *ssa.Value, ft, tt *types.Type) *ssa.Value {
return s.uint64Tofloat(&u64_f64, n, x, ft, tt)
}
func (s *state) uint64Tofloat32(n *Node, x *ssa.Value, ft, tt *types.Type) *ssa.Value {
return s.uint64Tofloat(&u64_f32, n, x, ft, tt)
}
func (s *state) uint64Tofloat(cvttab *u642fcvtTab, n *Node, x *ssa.Value, ft, tt *types.Type) *ssa.Value {
// if x >= 0 {
// result = (floatY) x
// } else {
// y = uintX(x) ; y = x & 1
// z = uintX(x) ; z = z >> 1
// z = z >> 1
// z = z | y
// result = floatY(z)
// result = result + result
// }
//
// Code borrowed from old code generator.
// What's going on: large 64-bit "unsigned" looks like
// negative number to hardware's integer-to-float
// conversion. However, because the mantissa is only
// 63 bits, we don't need the LSB, so instead we do an
// unsigned right shift (divide by two), convert, and
// double. However, before we do that, we need to be
// sure that we do not lose a "1" if that made the
// difference in the resulting rounding. Therefore, we
// preserve it, and OR (not ADD) it back in. The case
// that matters is when the eleven discarded bits are
// equal to 10000000001; that rounds up, and the 1 cannot
// be lost else it would round down if the LSB of the
// candidate mantissa is 0.
cmp := s.newValue2(cvttab.geq, types.Types[TBOOL], x, s.zeroVal(ft))
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(cmp)
b.Likely = ssa.BranchLikely
bThen := s.f.NewBlock(ssa.BlockPlain)
bElse := s.f.NewBlock(ssa.BlockPlain)
bAfter := s.f.NewBlock(ssa.BlockPlain)
b.AddEdgeTo(bThen)
s.startBlock(bThen)
a0 := s.newValue1(cvttab.cvt2F, tt, x)
s.vars[n] = a0
s.endBlock()
bThen.AddEdgeTo(bAfter)
b.AddEdgeTo(bElse)
s.startBlock(bElse)
one := cvttab.one(s, ft, 1)
y := s.newValue2(cvttab.and, ft, x, one)
z := s.newValue2(cvttab.rsh, ft, x, one)
z = s.newValue2(cvttab.or, ft, z, y)
a := s.newValue1(cvttab.cvt2F, tt, z)
a1 := s.newValue2(cvttab.add, tt, a, a)
s.vars[n] = a1
s.endBlock()
bElse.AddEdgeTo(bAfter)
s.startBlock(bAfter)
return s.variable(n, n.Type)
}
type u322fcvtTab struct {
cvtI2F, cvtF2F ssa.Op
}
var u32_f64 = u322fcvtTab{
cvtI2F: ssa.OpCvt32to64F,
cvtF2F: ssa.OpCopy,
}
var u32_f32 = u322fcvtTab{
cvtI2F: ssa.OpCvt32to32F,
cvtF2F: ssa.OpCvt64Fto32F,
}
func (s *state) uint32Tofloat64(n *Node, x *ssa.Value, ft, tt *types.Type) *ssa.Value {
return s.uint32Tofloat(&u32_f64, n, x, ft, tt)
}
func (s *state) uint32Tofloat32(n *Node, x *ssa.Value, ft, tt *types.Type) *ssa.Value {
return s.uint32Tofloat(&u32_f32, n, x, ft, tt)
}
func (s *state) uint32Tofloat(cvttab *u322fcvtTab, n *Node, x *ssa.Value, ft, tt *types.Type) *ssa.Value {
// if x >= 0 {
// result = floatY(x)
// } else {
// result = floatY(float64(x) + (1<<32))
// }
cmp := s.newValue2(ssa.OpGeq32, types.Types[TBOOL], x, s.zeroVal(ft))
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(cmp)
b.Likely = ssa.BranchLikely
bThen := s.f.NewBlock(ssa.BlockPlain)
bElse := s.f.NewBlock(ssa.BlockPlain)
bAfter := s.f.NewBlock(ssa.BlockPlain)
b.AddEdgeTo(bThen)
s.startBlock(bThen)
a0 := s.newValue1(cvttab.cvtI2F, tt, x)
s.vars[n] = a0
s.endBlock()
bThen.AddEdgeTo(bAfter)
b.AddEdgeTo(bElse)
s.startBlock(bElse)
a1 := s.newValue1(ssa.OpCvt32to64F, types.Types[TFLOAT64], x)
twoToThe32 := s.constFloat64(types.Types[TFLOAT64], float64(1<<32))
a2 := s.newValue2(ssa.OpAdd64F, types.Types[TFLOAT64], a1, twoToThe32)
a3 := s.newValue1(cvttab.cvtF2F, tt, a2)
s.vars[n] = a3
s.endBlock()
bElse.AddEdgeTo(bAfter)
s.startBlock(bAfter)
return s.variable(n, n.Type)
}
// referenceTypeBuiltin generates code for the len/cap builtins for maps and channels.
func (s *state) referenceTypeBuiltin(n *Node, x *ssa.Value) *ssa.Value {
if !n.Left.Type.IsMap() && !n.Left.Type.IsChan() {
s.Fatalf("node must be a map or a channel")
}
// if n == nil {
// return 0
// } else {
// // len
// return *((*int)n)
// // cap
// return *(((*int)n)+1)
// }
lenType := n.Type
nilValue := s.constNil(types.Types[TUINTPTR])
cmp := s.newValue2(ssa.OpEqPtr, types.Types[TBOOL], x, nilValue)
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(cmp)
b.Likely = ssa.BranchUnlikely
bThen := s.f.NewBlock(ssa.BlockPlain)
bElse := s.f.NewBlock(ssa.BlockPlain)
bAfter := s.f.NewBlock(ssa.BlockPlain)
// length/capacity of a nil map/chan is zero
b.AddEdgeTo(bThen)
s.startBlock(bThen)
s.vars[n] = s.zeroVal(lenType)
s.endBlock()
bThen.AddEdgeTo(bAfter)
b.AddEdgeTo(bElse)
s.startBlock(bElse)
switch n.Op {
case OLEN:
// length is stored in the first word for map/chan
s.vars[n] = s.load(lenType, x)
case OCAP:
// capacity is stored in the second word for chan
sw := s.newValue1I(ssa.OpOffPtr, lenType.PtrTo(), lenType.Width, x)
s.vars[n] = s.load(lenType, sw)
default:
s.Fatalf("op must be OLEN or OCAP")
}
s.endBlock()
bElse.AddEdgeTo(bAfter)
s.startBlock(bAfter)
return s.variable(n, lenType)
}
type f2uCvtTab struct {
ltf, cvt2U, subf, or ssa.Op
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
floatValue func(*state, *types.Type, float64) *ssa.Value
intValue func(*state, *types.Type, int64) *ssa.Value
cutoff uint64
}
var f32_u64 = f2uCvtTab{
ltf: ssa.OpLess32F,
cvt2U: ssa.OpCvt32Fto64,
subf: ssa.OpSub32F,
or: ssa.OpOr64,
floatValue: (*state).constFloat32,
intValue: (*state).constInt64,
cutoff: 1 << 63,
}
var f64_u64 = f2uCvtTab{
ltf: ssa.OpLess64F,
cvt2U: ssa.OpCvt64Fto64,
subf: ssa.OpSub64F,
or: ssa.OpOr64,
floatValue: (*state).constFloat64,
intValue: (*state).constInt64,
cutoff: 1 << 63,
}
var f32_u32 = f2uCvtTab{
ltf: ssa.OpLess32F,
cvt2U: ssa.OpCvt32Fto32,
subf: ssa.OpSub32F,
or: ssa.OpOr32,
floatValue: (*state).constFloat32,
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
intValue: func(s *state, t *types.Type, v int64) *ssa.Value { return s.constInt32(t, int32(v)) },
cutoff: 1 << 31,
}
var f64_u32 = f2uCvtTab{
ltf: ssa.OpLess64F,
cvt2U: ssa.OpCvt64Fto32,
subf: ssa.OpSub64F,
or: ssa.OpOr32,
floatValue: (*state).constFloat64,
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
intValue: func(s *state, t *types.Type, v int64) *ssa.Value { return s.constInt32(t, int32(v)) },
cutoff: 1 << 31,
}
func (s *state) float32ToUint64(n *Node, x *ssa.Value, ft, tt *types.Type) *ssa.Value {
return s.floatToUint(&f32_u64, n, x, ft, tt)
}
func (s *state) float64ToUint64(n *Node, x *ssa.Value, ft, tt *types.Type) *ssa.Value {
return s.floatToUint(&f64_u64, n, x, ft, tt)
}
func (s *state) float32ToUint32(n *Node, x *ssa.Value, ft, tt *types.Type) *ssa.Value {
return s.floatToUint(&f32_u32, n, x, ft, tt)
}
func (s *state) float64ToUint32(n *Node, x *ssa.Value, ft, tt *types.Type) *ssa.Value {
return s.floatToUint(&f64_u32, n, x, ft, tt)
}
func (s *state) floatToUint(cvttab *f2uCvtTab, n *Node, x *ssa.Value, ft, tt *types.Type) *ssa.Value {
// cutoff:=1<<(intY_Size-1)
// if x < floatX(cutoff) {
// result = uintY(x)
// } else {
// y = x - floatX(cutoff)
// z = uintY(y)
// result = z | -(cutoff)
// }
cutoff := cvttab.floatValue(s, ft, float64(cvttab.cutoff))
cmp := s.newValue2(cvttab.ltf, types.Types[TBOOL], x, cutoff)
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(cmp)
b.Likely = ssa.BranchLikely
bThen := s.f.NewBlock(ssa.BlockPlain)
bElse := s.f.NewBlock(ssa.BlockPlain)
bAfter := s.f.NewBlock(ssa.BlockPlain)
b.AddEdgeTo(bThen)
s.startBlock(bThen)
a0 := s.newValue1(cvttab.cvt2U, tt, x)
s.vars[n] = a0
s.endBlock()
bThen.AddEdgeTo(bAfter)
b.AddEdgeTo(bElse)
s.startBlock(bElse)
y := s.newValue2(cvttab.subf, ft, x, cutoff)
y = s.newValue1(cvttab.cvt2U, tt, y)
z := cvttab.intValue(s, tt, int64(-cvttab.cutoff))
a1 := s.newValue2(cvttab.or, tt, y, z)
s.vars[n] = a1
s.endBlock()
bElse.AddEdgeTo(bAfter)
s.startBlock(bAfter)
return s.variable(n, n.Type)
}
// dottype generates SSA for a type assertion node.
// commaok indicates whether to panic or return a bool.
// If commaok is false, resok will be nil.
func (s *state) dottype(n *Node, commaok bool) (res, resok *ssa.Value) {
iface := s.expr(n.Left) // input interface
target := s.expr(n.Right) // target type
byteptr := s.f.Config.Types.BytePtr
if n.Type.IsInterface() {
if n.Type.IsEmptyInterface() {
// Converting to an empty interface.
// Input could be an empty or nonempty interface.
if Debug_typeassert > 0 {
Warnl(n.Pos, "type assertion inlined")
}
// Get itab/type field from input.
itab := s.newValue1(ssa.OpITab, byteptr, iface)
// Conversion succeeds iff that field is not nil.
cond := s.newValue2(ssa.OpNeqPtr, types.Types[TBOOL], itab, s.constNil(byteptr))
if n.Left.Type.IsEmptyInterface() && commaok {
// Converting empty interface to empty interface with ,ok is just a nil check.
return iface, cond
}
// Branch on nilness.
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(cond)
b.Likely = ssa.BranchLikely
bOk := s.f.NewBlock(ssa.BlockPlain)
bFail := s.f.NewBlock(ssa.BlockPlain)
b.AddEdgeTo(bOk)
b.AddEdgeTo(bFail)
if !commaok {
// On failure, panic by calling panicnildottype.
s.startBlock(bFail)
s.rtcall(panicnildottype, false, nil, target)
// On success, return (perhaps modified) input interface.
s.startBlock(bOk)
if n.Left.Type.IsEmptyInterface() {
res = iface // Use input interface unchanged.
return
}
// Load type out of itab, build interface with existing idata.
off := s.newValue1I(ssa.OpOffPtr, byteptr, int64(Widthptr), itab)
typ := s.load(byteptr, off)
idata := s.newValue1(ssa.OpIData, n.Type, iface)
res = s.newValue2(ssa.OpIMake, n.Type, typ, idata)
return
}
s.startBlock(bOk)
// nonempty -> empty
// Need to load type from itab
off := s.newValue1I(ssa.OpOffPtr, byteptr, int64(Widthptr), itab)
s.vars[&typVar] = s.load(byteptr, off)
s.endBlock()
// itab is nil, might as well use that as the nil result.
s.startBlock(bFail)
s.vars[&typVar] = itab
s.endBlock()
// Merge point.
bEnd := s.f.NewBlock(ssa.BlockPlain)
bOk.AddEdgeTo(bEnd)
bFail.AddEdgeTo(bEnd)
s.startBlock(bEnd)
idata := s.newValue1(ssa.OpIData, n.Type, iface)
res = s.newValue2(ssa.OpIMake, n.Type, s.variable(&typVar, byteptr), idata)
resok = cond
delete(s.vars, &typVar)
return
}
// converting to a nonempty interface needs a runtime call.
if Debug_typeassert > 0 {
Warnl(n.Pos, "type assertion not inlined")
}
if n.Left.Type.IsEmptyInterface() {
if commaok {
call := s.rtcall(assertE2I2, true, []*types.Type{n.Type, types.Types[TBOOL]}, target, iface)
return call[0], call[1]
}
return s.rtcall(assertE2I, true, []*types.Type{n.Type}, target, iface)[0], nil
}
if commaok {
call := s.rtcall(assertI2I2, true, []*types.Type{n.Type, types.Types[TBOOL]}, target, iface)
return call[0], call[1]
}
return s.rtcall(assertI2I, true, []*types.Type{n.Type}, target, iface)[0], nil
}
if Debug_typeassert > 0 {
Warnl(n.Pos, "type assertion inlined")
}
// Converting to a concrete type.
direct := isdirectiface(n.Type)
itab := s.newValue1(ssa.OpITab, byteptr, iface) // type word of interface
if Debug_typeassert > 0 {
Warnl(n.Pos, "type assertion inlined")
}
var targetITab *ssa.Value
if n.Left.Type.IsEmptyInterface() {
// Looking for pointer to target type.
targetITab = target
} else {
// Looking for pointer to itab for target type and source interface.
cmd/compile: evaluate itabname during walk instead of SSA For backend concurrency safety. Follow-up to CL 38721. This does introduce a Nodes where there wasn't one before, but these are so rare that the performance impact is negligible. Does not pass toolstash-check, but the only change is line numbers, and the new line numbers appear preferable. Updates #15756 name old alloc/op new alloc/op delta Template 39.9MB ± 0% 39.9MB ± 0% ~ (p=0.841 n=5+5) Unicode 29.8MB ± 0% 29.8MB ± 0% ~ (p=0.690 n=5+5) GoTypes 113MB ± 0% 113MB ± 0% +0.09% (p=0.008 n=5+5) SSA 854MB ± 0% 855MB ± 0% ~ (p=0.222 n=5+5) Flate 25.3MB ± 0% 25.3MB ± 0% ~ (p=0.690 n=5+5) GoParser 31.8MB ± 0% 31.9MB ± 0% ~ (p=0.421 n=5+5) Reflect 78.2MB ± 0% 78.3MB ± 0% ~ (p=0.548 n=5+5) Tar 26.7MB ± 0% 26.7MB ± 0% ~ (p=0.690 n=5+5) XML 42.3MB ± 0% 42.3MB ± 0% ~ (p=0.222 n=5+5) name old allocs/op new allocs/op delta Template 391k ± 1% 391k ± 0% ~ (p=0.841 n=5+5) Unicode 320k ± 0% 320k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% +0.26% (p=0.008 n=5+5) SSA 7.60M ± 0% 7.60M ± 0% ~ (p=0.548 n=5+5) Flate 234k ± 0% 234k ± 1% ~ (p=1.000 n=5+5) GoParser 316k ± 1% 317k ± 0% ~ (p=0.841 n=5+5) Reflect 979k ± 0% 980k ± 0% ~ (p=0.690 n=5+5) Tar 251k ± 1% 251k ± 0% ~ (p=0.595 n=5+5) XML 394k ± 0% 393k ± 0% ~ (p=0.222 n=5+5) Change-Id: I237ae5502db4560f78ce021dc62f6d289797afd6 Reviewed-on: https://go-review.googlesource.com/39197 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2017-03-28 13:52:33 -07:00
targetITab = s.expr(n.List.First())
}
var tmp *Node // temporary for use with large types
var addr *ssa.Value // address of tmp
if commaok && !canSSAType(n.Type) {
// unSSAable type, use temporary.
// TODO: get rid of some of these temporaries.
tmp = tempAt(n.Pos, s.curfn, n.Type)
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
s.vars[&memVar] = s.newValue1A(ssa.OpVarDef, types.TypeMem, tmp, s.mem())
addr = s.addr(tmp, false)
}
cond := s.newValue2(ssa.OpEqPtr, types.Types[TBOOL], itab, targetITab)
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(cond)
b.Likely = ssa.BranchLikely
bOk := s.f.NewBlock(ssa.BlockPlain)
bFail := s.f.NewBlock(ssa.BlockPlain)
b.AddEdgeTo(bOk)
b.AddEdgeTo(bFail)
if !commaok {
// on failure, panic by calling panicdottype
s.startBlock(bFail)
taddr := s.expr(n.Right.Right)
if n.Left.Type.IsEmptyInterface() {
s.rtcall(panicdottypeE, false, nil, itab, target, taddr)
} else {
s.rtcall(panicdottypeI, false, nil, itab, target, taddr)
}
// on success, return data from interface
s.startBlock(bOk)
if direct {
return s.newValue1(ssa.OpIData, n.Type, iface), nil
}
p := s.newValue1(ssa.OpIData, types.NewPtr(n.Type), iface)
return s.load(n.Type, p), nil
}
// commaok is the more complicated case because we have
// a control flow merge point.
bEnd := s.f.NewBlock(ssa.BlockPlain)
// Note that we need a new valVar each time (unlike okVar where we can
// reuse the variable) because it might have a different type every time.
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
valVar := &Node{Op: ONAME, Sym: &types.Sym{Name: "val"}}
// type assertion succeeded
s.startBlock(bOk)
if tmp == nil {
if direct {
s.vars[valVar] = s.newValue1(ssa.OpIData, n.Type, iface)
} else {
p := s.newValue1(ssa.OpIData, types.NewPtr(n.Type), iface)
s.vars[valVar] = s.load(n.Type, p)
}
} else {
p := s.newValue1(ssa.OpIData, types.NewPtr(n.Type), iface)
s.move(n.Type, addr, p)
}
s.vars[&okVar] = s.constBool(true)
s.endBlock()
bOk.AddEdgeTo(bEnd)
// type assertion failed
s.startBlock(bFail)
if tmp == nil {
s.vars[valVar] = s.zeroVal(n.Type)
} else {
s.zero(n.Type, addr)
}
s.vars[&okVar] = s.constBool(false)
s.endBlock()
bFail.AddEdgeTo(bEnd)
// merge point
s.startBlock(bEnd)
if tmp == nil {
res = s.variable(valVar, n.Type)
delete(s.vars, valVar)
} else {
res = s.load(n.Type, addr)
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
s.vars[&memVar] = s.newValue1A(ssa.OpVarKill, types.TypeMem, tmp, s.mem())
}
resok = s.variable(&okVar, types.Types[TBOOL])
delete(s.vars, &okVar)
return res, resok
}
// variable returns the value of a variable at the current location.
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (s *state) variable(name *Node, t *types.Type) *ssa.Value {
v := s.vars[name]
if v != nil {
return v
}
v = s.fwdVars[name]
if v != nil {
return v
}
if s.curBlock == s.f.Entry {
// No variable should be live at entry.
s.Fatalf("Value live at entry. It shouldn't be. func %s, node %v, value %v", s.f.Name, name, v)
}
// Make a FwdRef, which records a value that's live on block input.
// We'll find the matching definition as part of insertPhis.
v = s.newValue0A(ssa.OpFwdRef, t, name)
s.fwdVars[name] = v
s.addNamedValue(name, v)
return v
}
func (s *state) mem() *ssa.Value {
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
return s.variable(&memVar, types.TypeMem)
}
func (s *state) addNamedValue(n *Node, v *ssa.Value) {
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if n.Class() == Pxxx {
// Don't track our dummy nodes (&memVar etc.).
return
}
if n.IsAutoTmp() {
// Don't track temporary variables.
return
}
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if n.Class() == PPARAMOUT {
// Don't track named output values. This prevents return values
// from being assigned too early. See #14591 and #14762. TODO: allow this.
return
}
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if n.Class() == PAUTO && n.Xoffset != 0 {
s.Fatalf("AUTO var with offset %v %d", n, n.Xoffset)
}
loc := ssa.LocalSlot{N: n, Type: n.Type, Off: 0}
values, ok := s.f.NamedValues[loc]
if !ok {
s.f.Names = append(s.f.Names, loc)
}
s.f.NamedValues[loc] = append(values, v)
}
// Branch is an unresolved branch.
type Branch struct {
P *obj.Prog // branch instruction
B *ssa.Block // target
}
// SSAGenState contains state needed during Prog generation.
type SSAGenState struct {
pp *Progs
// Branches remembers all the branch instructions we've seen
// and where they would like to go.
Branches []Branch
// bstart remembers where each block starts (indexed by block ID)
bstart []*obj.Prog
// 387 port: maps from SSE registers (REG_X?) to 387 registers (REG_F?)
SSEto387 map[int16]int16
// Some architectures require a 64-bit temporary for FP-related register shuffling. Examples include x86-387, PPC, and Sparc V8.
ScratchFpMem *Node
maxarg int64 // largest frame size for arguments to calls made by the function
// Map from GC safe points to liveness index, generated by
// liveness analysis.
livenessMap LivenessMap
cmd/compile: assign and preserve statement boundaries. A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
// lineRunStart records the beginning of the current run of instructions
// within a single block sharing the same line number
// Used to move statement marks to the beginning of such runs.
lineRunStart *obj.Prog
cmd/compile: add wasm stack optimization Go's SSA instructions only operate on registers. For example, an add instruction would read two registers, do the addition and then write to a register. WebAssembly's instructions, on the other hand, operate on the stack. The add instruction first pops two values from the stack, does the addition, then pushes the result to the stack. To fulfill Go's semantics, one needs to map Go's single add instruction to 4 WebAssembly instructions: - Push the value of local variable A to the stack - Push the value of local variable B to the stack - Do addition - Write value from stack to local variable C Now consider that B was set to the constant 42 before the addition: - Push constant 42 to the stack - Write value from stack to local variable B This works, but is inefficient. Instead, the stack is used directly by inlining instructions if possible. With inlining it becomes: - Push the value of local variable A to the stack (add) - Push constant 42 to the stack (constant) - Do addition (add) - Write value from stack to local variable C (add) Note that the two SSA instructions can not be generated sequentially anymore, because their WebAssembly instructions are interleaved. Design doc: https://docs.google.com/document/d/131vjr4DH6JFnb-blm_uRdaC0_Nv3OUwjEY5qVCxCup4 Updates #18892 Change-Id: Ie35e1c0bebf4985fddda0d6330eb2066f9ad6dec Reviewed-on: https://go-review.googlesource.com/103535 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-09 00:14:58 +01:00
// wasm: The number of values on the WebAssembly stack. This is only used as a safeguard.
OnWasmStackSkipped int
}
// Prog appends a new Prog.
func (s *SSAGenState) Prog(as obj.As) *obj.Prog {
cmd/compile: assign and preserve statement boundaries. A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
p := s.pp.Prog(as)
if ssa.LosesStmtMark(as) {
cmd/compile: assign and preserve statement boundaries. A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
return p
}
// Float a statement start to the beginning of any same-line run.
// lineRunStart is reset at block boundaries, which appears to work well.
if s.lineRunStart == nil || s.lineRunStart.Pos.Line() != p.Pos.Line() {
s.lineRunStart = p
} else if p.Pos.IsStmt() == src.PosIsStmt {
s.lineRunStart.Pos = s.lineRunStart.Pos.WithIsStmt()
p.Pos = p.Pos.WithNotStmt()
}
return p
}
// Pc returns the current Prog.
func (s *SSAGenState) Pc() *obj.Prog {
return s.pp.next
}
// SetPos sets the current source position.
2016-12-15 17:17:01 -08:00
func (s *SSAGenState) SetPos(pos src.XPos) {
s.pp.pos = pos
}
// Br emits a single branch instruction and returns the instruction.
// Not all architectures need the returned instruction, but otherwise
// the boilerplate is common to all.
func (s *SSAGenState) Br(op obj.As, target *ssa.Block) *obj.Prog {
p := s.Prog(op)
p.To.Type = obj.TYPE_BRANCH
s.Branches = append(s.Branches, Branch{P: p, B: target})
return p
}
cmd/compile: assign and preserve statement boundaries. A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
// DebugFriendlySetPos adjusts Pos.IsStmt subject to heuristics
// that reduce "jumpy" line number churn when debugging.
// Spill/fill/copy instructions from the register allocator,
// phi functions, and instructions with a no-pos position
// are examples of instructions that can cause churn.
func (s *SSAGenState) DebugFriendlySetPosFrom(v *ssa.Value) {
switch v.Op {
case ssa.OpPhi, ssa.OpCopy, ssa.OpLoadReg, ssa.OpStoreReg:
cmd/compile: assign and preserve statement boundaries. A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
// These are not statements
s.SetPos(v.Pos.WithNotStmt())
default:
cmd/compile: assign and preserve statement boundaries. A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
p := v.Pos
if p != src.NoXPos {
// If the position is defined, update the position.
// Also convert default IsStmt to NotStmt; only
// explicit statement boundaries should appear
// in the generated code.
if p.IsStmt() != src.PosIsStmt {
p = p.WithNotStmt()
}
s.SetPos(p)
}
}
}
// byXoffset implements sort.Interface for []*Node using Xoffset as the ordering.
type byXoffset []*Node
func (s byXoffset) Len() int { return len(s) }
func (s byXoffset) Less(i, j int) bool { return s[i].Xoffset < s[j].Xoffset }
func (s byXoffset) Swap(i, j int) { s[i], s[j] = s[j], s[i] }
func emitStackObjects(e *ssafn, pp *Progs) {
var vars []*Node
for _, n := range e.curfn.Func.Dcl {
if livenessShouldTrack(n) && n.Addrtaken() {
vars = append(vars, n)
}
}
if len(vars) == 0 {
return
}
// Sort variables from lowest to highest address.
sort.Sort(byXoffset(vars))
// Populate the stack object data.
// Format must match runtime/stack.go:stackObjectRecord.
x := e.curfn.Func.lsym.Func.StackObjects
off := 0
off = duintptr(x, off, uint64(len(vars)))
for _, v := range vars {
// Note: arguments and return values have non-negative Xoffset,
// in which case the offset is relative to argp.
// Locals have a negative Xoffset, in which case the offset is relative to varp.
off = duintptr(x, off, uint64(v.Xoffset))
if !typesym(v.Type).Siggen() {
Fatalf("stack object's type symbol not generated for type %s", v.Type)
}
off = dsymptr(x, off, dtypesym(v.Type), 0)
}
// Emit a funcdata pointing at the stack object data.
p := pp.Prog(obj.AFUNCDATA)
Addrconst(&p.From, objabi.FUNCDATA_StackObjects)
p.To.Type = obj.TYPE_MEM
p.To.Name = obj.NAME_EXTERN
p.To.Sym = x
if debuglive != 0 {
for _, v := range vars {
Warnl(v.Pos, "stack object %v %s", v, v.Type.String())
}
}
}
// genssa appends entries to pp for each instruction in f.
func genssa(f *ssa.Func, pp *Progs) {
var s SSAGenState
e := f.Frontend().(*ssafn)
[dev.ssa] cmd/compile: add GOSSAFUNC and GOSSAPKG These temporary environment variables make it possible to enable using SSA-generated code for a particular function or package without having to rebuild the compiler. This makes it possible to start bulk testing SSA generated code. First, bump up the default stack size (_StackMin in runtime/stack2.go) to something large like 32768, because without stackmaps we can't grow stacks. Then run something like: for pkg in `go list std` do GOGC=off GOSSAPKG=`basename $pkg` go test -a $pkg done When a test fails, you can re-run those tests, selectively enabling one function after another, until you find the one that is causing trouble. Doing this right now yields some interesting results: * There are several packages for which we generate some code and whose tests pass. Yay! * We can generate code for encoding/base64, but tests there fail, so there's a bug to fix. * Attempting to build the runtime yields a panic during codegen: panic: interface conversion: ssa.Location is nil, not *ssa.LocalSlot * The top unimplemented codegen items are (simplified): 59 genValue not implemented: REPMOVSB 18 genValue not implemented: REPSTOSQ 14 genValue not implemented: SUBQ 9 branch not implemented: If v -> b b. Control: XORQconst <bool> [1] 8 genValue not implemented: MOVQstoreidx8 4 branch not implemented: If v -> b b. Control: SETG <bool> 3 branch not implemented: If v -> b b. Control: SETLE <bool> 2 load flags not implemented: LoadReg8 <flags> 2 genValue not implemented: InvertFlags <flags> 1 store flags not implemented: StoreReg8 <flags> 1 branch not implemented: If v -> b b. Control: SETGE <bool> Change-Id: Ib64809ac0c917e25bcae27829ae634c70d290c7f Reviewed-on: https://go-review.googlesource.com/12547 Reviewed-by: Keith Randall <khr@golang.org>
2015-07-22 13:13:53 -07:00
s.livenessMap = liveness(e, f, pp)
emitStackObjects(e, pp)
// Remember where each block starts.
s.bstart = make([]*obj.Prog, f.NumBlocks())
s.pp = pp
var progToValue map[*obj.Prog]*ssa.Value
var progToBlock map[*obj.Prog]*ssa.Block
cmd/compile: adjust locationlist lifetimes A statement like foo = bar + qux might compile to AX := AX + BX resulting in a regkill for AX before this instruction. The buggy behavior is to kill AX "at" this instruction, before it has executed. (Code generation of no-instruction values like RegKills applies their effects at the next actual instruction emitted). However, bar is still associated with AX until after the instruction executes, so the effect of the regkill must occur at the boundary between this instruction and the next. Similarly, the new value bound to AX is not visible until this instruction executes (and in the case of values that require multiple instructions in code generation, until all of them have executed). The ranges are adjusted so that a value's start occurs at the next following instruction after its evaluation, and the end occurs after (execution of) the first instruction following the end of the lifetime as a value. (Notice the asymmetry; the entire value must be finished before it is visible, but execution of a single instruction invalidates. However, the value *is* visible before that next instruction executes). The test was adjusted to make it insensitive to the result numbering for variables printed by gdb, since that is not relevant to the test and makes the differences introduced by small changes larger than necessary/useful. The test was also improved to present variable probes more intuitively, and also to allow explicit indication of "this variable was optimized out" Change-Id: I39453eead8399e6bb05ebd957289b112d1100c0e Reviewed-on: https://go-review.googlesource.com/74090 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-27 17:02:11 -04:00
var valueToProgAfter []*obj.Prog // The first Prog following computation of a value v; v is visible at this point.
if f.PrintOrHtmlSSA {
progToValue = make(map[*obj.Prog]*ssa.Value, f.NumValues())
progToBlock = make(map[*obj.Prog]*ssa.Block, f.NumBlocks())
f.Logf("genssa %s\n", f.Name)
progToBlock[s.pp.next] = f.Blocks[0]
}
if thearch.Use387 {
s.SSEto387 = map[int16]int16{}
}
s.ScratchFpMem = e.scratchFpMem
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
if Ctxt.Flag_locationlists {
if cap(f.Cache.ValueToProgAfter) < f.NumValues() {
f.Cache.ValueToProgAfter = make([]*obj.Prog, f.NumValues())
}
valueToProgAfter = f.Cache.ValueToProgAfter[:f.NumValues()]
for i := range valueToProgAfter {
valueToProgAfter[i] = nil
}
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
cmd/compile: adjust locationlist lifetimes A statement like foo = bar + qux might compile to AX := AX + BX resulting in a regkill for AX before this instruction. The buggy behavior is to kill AX "at" this instruction, before it has executed. (Code generation of no-instruction values like RegKills applies their effects at the next actual instruction emitted). However, bar is still associated with AX until after the instruction executes, so the effect of the regkill must occur at the boundary between this instruction and the next. Similarly, the new value bound to AX is not visible until this instruction executes (and in the case of values that require multiple instructions in code generation, until all of them have executed). The ranges are adjusted so that a value's start occurs at the next following instruction after its evaluation, and the end occurs after (execution of) the first instruction following the end of the lifetime as a value. (Notice the asymmetry; the entire value must be finished before it is visible, but execution of a single instruction invalidates. However, the value *is* visible before that next instruction executes). The test was adjusted to make it insensitive to the result numbering for variables printed by gdb, since that is not relevant to the test and makes the differences introduced by small changes larger than necessary/useful. The test was also improved to present variable probes more intuitively, and also to allow explicit indication of "this variable was optimized out" Change-Id: I39453eead8399e6bb05ebd957289b112d1100c0e Reviewed-on: https://go-review.googlesource.com/74090 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-27 17:02:11 -04:00
// If the very first instruction is not tagged as a statement,
// debuggers may attribute it to previous function in program.
firstPos := src.NoXPos
for _, v := range f.Entry.Values {
cmd/compile: assign and preserve statement boundaries. A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
if v.Pos.IsStmt() == src.PosIsStmt {
firstPos = v.Pos
v.Pos = firstPos.WithDefaultStmt()
break
}
}
// inlMarks has an entry for each Prog that implements an inline mark.
// It maps from that Prog to the global inlining id of the inlined body
// which should unwind to this Prog's location.
var inlMarks map[*obj.Prog]int32
var inlMarkList []*obj.Prog
// inlMarksByPos maps from a (column 1) source position to the set of
// Progs that are in the set above and have that source position.
var inlMarksByPos map[src.XPos][]*obj.Prog
// Emit basic blocks
for i, b := range f.Blocks {
s.bstart[b.ID] = s.pp.next
s.pp.nextLive = LivenessInvalid
cmd/compile: assign and preserve statement boundaries. A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
s.lineRunStart = nil
// Emit values in block
thearch.SSAMarkMoves(&s, b)
for _, v := range b.Values {
x := s.pp.next
s.DebugFriendlySetPosFrom(v)
// Attach this safe point to the next
// instruction.
s.pp.nextLive = s.livenessMap.Get(v)
switch v.Op {
case ssa.OpInitMem:
// memory arg needs no code
case ssa.OpArg:
// input args need no code
case ssa.OpSP, ssa.OpSB:
// nothing to do
case ssa.OpSelect0, ssa.OpSelect1:
// nothing to do
case ssa.OpGetG:
// nothing to do when there's a g register,
// and checkLower complains if there's not
case ssa.OpVarDef, ssa.OpVarLive, ssa.OpKeepAlive, ssa.OpVarKill:
// nothing to do; already used by liveness
case ssa.OpPhi:
CheckLoweredPhi(v)
case ssa.OpConvert:
// nothing to do; no-op conversion for liveness
if v.Args[0].Reg() != v.Reg() {
v.Fatalf("OpConvert should be a no-op: %s; %s", v.Args[0].LongString(), v.LongString())
}
case ssa.OpInlMark:
p := thearch.Ginsnop(s.pp)
if inlMarks == nil {
inlMarks = map[*obj.Prog]int32{}
inlMarksByPos = map[src.XPos][]*obj.Prog{}
}
inlMarks[p] = v.AuxInt32()
inlMarkList = append(inlMarkList, p)
pos := v.Pos.AtColumn1()
inlMarksByPos[pos] = append(inlMarksByPos[pos], p)
default:
// let the backend handle it
// Special case for first line in function; move it to the start.
if firstPos != src.NoXPos {
s.SetPos(firstPos)
firstPos = src.NoXPos
}
thearch.SSAGenValue(&s, v)
}
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
if Ctxt.Flag_locationlists {
cmd/compile: adjust locationlist lifetimes A statement like foo = bar + qux might compile to AX := AX + BX resulting in a regkill for AX before this instruction. The buggy behavior is to kill AX "at" this instruction, before it has executed. (Code generation of no-instruction values like RegKills applies their effects at the next actual instruction emitted). However, bar is still associated with AX until after the instruction executes, so the effect of the regkill must occur at the boundary between this instruction and the next. Similarly, the new value bound to AX is not visible until this instruction executes (and in the case of values that require multiple instructions in code generation, until all of them have executed). The ranges are adjusted so that a value's start occurs at the next following instruction after its evaluation, and the end occurs after (execution of) the first instruction following the end of the lifetime as a value. (Notice the asymmetry; the entire value must be finished before it is visible, but execution of a single instruction invalidates. However, the value *is* visible before that next instruction executes). The test was adjusted to make it insensitive to the result numbering for variables printed by gdb, since that is not relevant to the test and makes the differences introduced by small changes larger than necessary/useful. The test was also improved to present variable probes more intuitively, and also to allow explicit indication of "this variable was optimized out" Change-Id: I39453eead8399e6bb05ebd957289b112d1100c0e Reviewed-on: https://go-review.googlesource.com/74090 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-27 17:02:11 -04:00
valueToProgAfter[v.ID] = s.pp.next
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
if f.PrintOrHtmlSSA {
for ; x != s.pp.next; x = x.Link {
progToValue[x] = v
}
}
}
// If this is an empty infinite loop, stick a hardware NOP in there so that debuggers are less confused.
if s.bstart[b.ID] == s.pp.next && len(b.Succs) == 1 && b.Succs[0].Block() == b {
p := thearch.Ginsnop(s.pp)
p.Pos = p.Pos.WithIsStmt()
b.Pos = b.Pos.WithBogusLine() // Debuggers are not good about infinite loops, force a change in line number
}
// Emit control flow instructions for block
var next *ssa.Block
if i < len(f.Blocks)-1 && Debug['N'] == 0 {
// If -N, leave next==nil so every block with successors
// ends in a JMP (except call blocks - plive doesn't like
// select{send,recv} followed by a JMP call). Helps keep
// line numbers for otherwise empty blocks.
next = f.Blocks[i+1]
}
x := s.pp.next
s.SetPos(b.Pos)
thearch.SSAGenBlock(&s, b, next)
if f.PrintOrHtmlSSA {
for ; x != s.pp.next; x = x.Link {
progToBlock[x] = b
}
}
}
if inlMarks != nil {
// We have some inline marks. Try to find other instructions we're
// going to emit anyway, and use those instructions instead of the
// inline marks.
for p := pp.Text; p != nil; p = p.Link {
if p.As == obj.ANOP || p.As == obj.AFUNCDATA || p.As == obj.APCDATA || p.As == obj.ATEXT || p.As == obj.APCALIGN || thearch.LinkArch.Family == sys.Wasm {
// Don't use 0-sized instructions as inline marks, because we need
// to identify inline mark instructions by pc offset.
// (Some of these instructions are sometimes zero-sized, sometimes not.
// We must not use anything that even might be zero-sized.)
// TODO: are there others?
continue
}
if _, ok := inlMarks[p]; ok {
// Don't use inline marks themselves. We don't know
// whether they will be zero-sized or not yet.
continue
}
pos := p.Pos.AtColumn1()
s := inlMarksByPos[pos]
if len(s) == 0 {
continue
}
for _, m := range s {
// We found an instruction with the same source position as
// some of the inline marks.
// Use this instruction instead.
pp.curfn.Func.lsym.Func.AddInlMark(p, inlMarks[m])
// Make the inline mark a real nop, so it doesn't generate any code.
m.As = obj.ANOP
m.Pos = src.NoXPos
m.From = obj.Addr{}
m.To = obj.Addr{}
}
delete(inlMarksByPos, pos)
}
// Any unmatched inline marks now need to be added to the inlining tree (and will generate a nop instruction).
for _, p := range inlMarkList {
if p.As != obj.ANOP {
pp.curfn.Func.lsym.Func.AddInlMark(p, inlMarks[p])
}
}
}
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
if Ctxt.Flag_locationlists {
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
e.curfn.Func.DebugInfo = ssa.BuildFuncDebug(Ctxt, f, Debug_locationlist > 1, stackOffset)
bstart := s.bstart
// Note that at this moment, Prog.Pc is a sequence number; it's
// not a real PC until after assembly, so this mapping has to
// be done later.
e.curfn.Func.DebugInfo.GetPC = func(b, v ssa.ID) int64 {
switch v {
case ssa.BlockStart.ID:
if b == f.Entry.ID {
return 0 // Start at the very beginning, at the assembler-generated prologue.
// this should only happen for function args (ssa.OpArg)
}
return bstart[b].Pc
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
case ssa.BlockEnd.ID:
return e.curfn.Func.lsym.Size
cmd/compile: reimplement location list generation Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
default:
return valueToProgAfter[v].Pc
}
}
}
// Resolve branches, and relax DefaultStmt into NotStmt
for _, br := range s.Branches {
br.P.To.Val = s.bstart[br.B.ID]
cmd/compile: assign and preserve statement boundaries. A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
if br.P.Pos.IsStmt() != src.PosIsStmt {
br.P.Pos = br.P.Pos.WithNotStmt()
} else if v0 := br.B.FirstPossibleStmtValue(); v0 != nil && v0.Pos.Line() == br.P.Pos.Line() && v0.Pos.IsStmt() == src.PosIsStmt {
br.P.Pos = br.P.Pos.WithNotStmt()
cmd/compile: assign and preserve statement boundaries. A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
}
}
if e.log { // spew to stdout
filename := ""
for p := pp.Text; p != nil; p = p.Link {
if p.Pos.IsKnown() && p.InnermostFilename() != filename {
filename = p.InnermostFilename()
f.Logf("# %s\n", filename)
}
var s string
if v, ok := progToValue[p]; ok {
s = v.String()
} else if b, ok := progToBlock[p]; ok {
s = b.String()
} else {
s = " " // most value and branch strings are 2-3 characters long
}
f.Logf(" %-6s\t%.5d (%s)\t%s\n", s, p.Pc, p.InnermostLineNumber(), p.InstructionString())
}
}
if f.HTMLWriter != nil { // spew to ssa.html
var buf bytes.Buffer
buf.WriteString("<code>")
buf.WriteString("<dl class=\"ssa-gen\">")
filename := ""
for p := pp.Text; p != nil; p = p.Link {
// Don't spam every line with the file name, which is often huge.
// Only print changes, and "unknown" is not a change.
if p.Pos.IsKnown() && p.InnermostFilename() != filename {
filename = p.InnermostFilename()
buf.WriteString("<dt class=\"ssa-prog-src\"></dt><dd class=\"ssa-prog\">")
buf.WriteString(html.EscapeString("# " + filename))
buf.WriteString("</dd>")
}
buf.WriteString("<dt class=\"ssa-prog-src\">")
if v, ok := progToValue[p]; ok {
buf.WriteString(v.HTML())
} else if b, ok := progToBlock[p]; ok {
buf.WriteString("<b>" + b.HTML() + "</b>")
}
buf.WriteString("</dt>")
buf.WriteString("<dd class=\"ssa-prog\">")
buf.WriteString(fmt.Sprintf("%.5d <span class=\"l%v line-number\">(%s)</span> %s", p.Pc, p.InnermostLineNumber(), p.InnermostLineNumberHTML(), html.EscapeString(p.InstructionString())))
buf.WriteString("</dd>")
}
buf.WriteString("</dl>")
buf.WriteString("</code>")
f.HTMLWriter.WriteColumn("genssa", "genssa", "ssa-prog", buf.String())
}
defframe(&s, e)
f.HTMLWriter.Close()
f.HTMLWriter = nil
}
func defframe(s *SSAGenState, e *ssafn) {
pp := s.pp
frame := Rnd(s.maxarg+e.stksize, int64(Widthreg))
if thearch.PadFrame != nil {
frame = thearch.PadFrame(frame)
}
// Fill in argument and frame size.
pp.Text.To.Type = obj.TYPE_TEXTSIZE
pp.Text.To.Val = int32(Rnd(e.curfn.Type.ArgWidth(), int64(Widthreg)))
pp.Text.To.Offset = frame
// Insert code to zero ambiguously live variables so that the
// garbage collector only sees initialized values when it
// looks for pointers.
p := pp.Text
var lo, hi int64
// Opaque state for backend to use. Current backends use it to
// keep track of which helper registers have been zeroed.
var state uint32
// Iterate through declarations. They are sorted in decreasing Xoffset order.
for _, n := range e.curfn.Func.Dcl {
if !n.Name.Needzero() {
continue
}
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if n.Class() != PAUTO {
Fatalf("needzero class %d", n.Class())
}
if n.Type.Size()%int64(Widthptr) != 0 || n.Xoffset%int64(Widthptr) != 0 || n.Type.Size() == 0 {
Fatalf("var %L has size %d offset %d", n, n.Type.Size(), n.Xoffset)
}
if lo != hi && n.Xoffset+n.Type.Size() >= lo-int64(2*Widthreg) {
// Merge with range we already have.
lo = n.Xoffset
continue
}
// Zero old range
p = thearch.ZeroRange(pp, p, frame+lo, hi-lo, &state)
// Set new range.
lo = n.Xoffset
hi = lo + n.Type.Size()
}
// Zero final range.
thearch.ZeroRange(pp, p, frame+lo, hi-lo, &state)
}
type FloatingEQNEJump struct {
Jump obj.As
Index int
}
func (s *SSAGenState) oneFPJump(b *ssa.Block, jumps *FloatingEQNEJump) {
p := s.Prog(jumps.Jump)
p.To.Type = obj.TYPE_BRANCH
p.Pos = b.Pos
to := jumps.Index
s.Branches = append(s.Branches, Branch{p, b.Succs[to].Block()})
}
func (s *SSAGenState) FPJump(b, next *ssa.Block, jumps *[2][2]FloatingEQNEJump) {
switch next {
case b.Succs[0].Block():
s.oneFPJump(b, &jumps[0][0])
s.oneFPJump(b, &jumps[0][1])
case b.Succs[1].Block():
s.oneFPJump(b, &jumps[1][0])
s.oneFPJump(b, &jumps[1][1])
default:
s.oneFPJump(b, &jumps[1][0])
s.oneFPJump(b, &jumps[1][1])
q := s.Prog(obj.AJMP)
q.Pos = b.Pos
q.To.Type = obj.TYPE_BRANCH
s.Branches = append(s.Branches, Branch{q, b.Succs[1].Block()})
}
}
func AuxOffset(v *ssa.Value) (offset int64) {
if v.Aux == nil {
return 0
}
n, ok := v.Aux.(*Node)
if !ok {
v.Fatalf("bad aux type in %s\n", v.LongString())
}
if n.Class() == PAUTO {
return n.Xoffset
}
return 0
}
// AddAux adds the offset in the aux fields (AuxInt and Aux) of v to a.
func AddAux(a *obj.Addr, v *ssa.Value) {
AddAux2(a, v, v.AuxInt)
}
func AddAux2(a *obj.Addr, v *ssa.Value, offset int64) {
if a.Type != obj.TYPE_MEM && a.Type != obj.TYPE_ADDR {
v.Fatalf("bad AddAux addr %v", a)
}
// add integer offset
a.Offset += offset
// If no additional symbol offset, we're done.
if v.Aux == nil {
return
}
// Add symbol's offset from its base register.
switch n := v.Aux.(type) {
case *obj.LSym:
a.Name = obj.NAME_EXTERN
a.Sym = n
case *Node:
if n.Class() == PPARAM || n.Class() == PPARAMOUT {
a.Name = obj.NAME_PARAM
a.Sym = n.Orig.Sym.Linksym()
a.Offset += n.Xoffset
break
}
a.Name = obj.NAME_AUTO
a.Sym = n.Sym.Linksym()
a.Offset += n.Xoffset
default:
v.Fatalf("aux in %s not implemented %#v", v, v.Aux)
}
}
// extendIndex extends v to a full int width.
// panic with the given kind if v does not fit in an int (only on 32-bit archs).
func (s *state) extendIndex(idx, len *ssa.Value, kind ssa.BoundsKind, bounded bool) *ssa.Value {
size := idx.Type.Size()
if size == s.config.PtrSize {
return idx
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
}
if size > s.config.PtrSize {
// truncate 64-bit indexes on 32-bit pointer archs. Test the
// high word and branch to out-of-bounds failure if it is not 0.
var lo *ssa.Value
if idx.Type.IsSigned() {
lo = s.newValue1(ssa.OpInt64Lo, types.Types[TINT], idx)
} else {
lo = s.newValue1(ssa.OpInt64Lo, types.Types[TUINT], idx)
}
if bounded || Debug['B'] != 0 {
return lo
}
bNext := s.f.NewBlock(ssa.BlockPlain)
bPanic := s.f.NewBlock(ssa.BlockExit)
hi := s.newValue1(ssa.OpInt64Hi, types.Types[TUINT32], idx)
cmp := s.newValue2(ssa.OpEq32, types.Types[TBOOL], hi, s.constInt32(types.Types[TUINT32], 0))
if !idx.Type.IsSigned() {
switch kind {
case ssa.BoundsIndex:
kind = ssa.BoundsIndexU
case ssa.BoundsSliceAlen:
kind = ssa.BoundsSliceAlenU
case ssa.BoundsSliceAcap:
kind = ssa.BoundsSliceAcapU
case ssa.BoundsSliceB:
kind = ssa.BoundsSliceBU
case ssa.BoundsSlice3Alen:
kind = ssa.BoundsSlice3AlenU
case ssa.BoundsSlice3Acap:
kind = ssa.BoundsSlice3AcapU
case ssa.BoundsSlice3B:
kind = ssa.BoundsSlice3BU
case ssa.BoundsSlice3C:
kind = ssa.BoundsSlice3CU
}
}
b := s.endBlock()
b.Kind = ssa.BlockIf
b.SetControl(cmp)
b.Likely = ssa.BranchLikely
b.AddEdgeTo(bNext)
b.AddEdgeTo(bPanic)
s.startBlock(bPanic)
mem := s.newValue4I(ssa.OpPanicExtend, types.TypeMem, int64(kind), hi, lo, len, s.mem())
s.endBlock().SetControl(mem)
s.startBlock(bNext)
return lo
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
}
// Extend value to the required size
var op ssa.Op
if idx.Type.IsSigned() {
switch 10*size + s.config.PtrSize {
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
case 14:
op = ssa.OpSignExt8to32
case 18:
op = ssa.OpSignExt8to64
case 24:
op = ssa.OpSignExt16to32
case 28:
op = ssa.OpSignExt16to64
case 48:
op = ssa.OpSignExt32to64
default:
s.Fatalf("bad signed index extension %s", idx.Type)
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
}
} else {
switch 10*size + s.config.PtrSize {
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
case 14:
op = ssa.OpZeroExt8to32
case 18:
op = ssa.OpZeroExt8to64
case 24:
op = ssa.OpZeroExt16to32
case 28:
op = ssa.OpZeroExt16to64
case 48:
op = ssa.OpZeroExt32to64
default:
s.Fatalf("bad unsigned index extension %s", idx.Type)
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
}
}
return s.newValue1(op, types.Types[TINT], idx)
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
}
// CheckLoweredPhi checks that regalloc and stackalloc correctly handled phi values.
// Called during ssaGenValue.
func CheckLoweredPhi(v *ssa.Value) {
if v.Op != ssa.OpPhi {
v.Fatalf("CheckLoweredPhi called with non-phi value: %v", v.LongString())
}
if v.Type.IsMemory() {
return
}
f := v.Block.Func
loc := f.RegAlloc[v.ID]
for _, a := range v.Args {
if aloc := f.RegAlloc[a.ID]; aloc != loc { // TODO: .Equal() instead?
v.Fatalf("phi arg at different location than phi: %v @ %s, but arg %v @ %s\n%s\n", v, loc, a, aloc, v.Block.Func)
}
}
}
// CheckLoweredGetClosurePtr checks that v is the first instruction in the function's entry block.
// The output of LoweredGetClosurePtr is generally hardwired to the correct register.
// That register contains the closure pointer on closure entry.
func CheckLoweredGetClosurePtr(v *ssa.Value) {
entry := v.Block.Func.Entry
if entry != v.Block || entry.Values[0] != v {
Fatalf("in %s, badly placed LoweredGetClosurePtr: %v %v", v.Block.Func.Name, v.Block, v)
}
}
// AutoVar returns a *Node and int64 representing the auto variable and offset within it
// where v should be spilled.
func AutoVar(v *ssa.Value) (*Node, int64) {
loc := v.Block.Func.RegAlloc[v.ID].(ssa.LocalSlot)
if v.Type.Size() > loc.Type.Size() {
v.Fatalf("spill/restore type %s doesn't fit in slot type %s", v.Type, loc.Type)
}
return loc.N.(*Node), loc.Off
}
func AddrAuto(a *obj.Addr, v *ssa.Value) {
n, off := AutoVar(v)
a.Type = obj.TYPE_MEM
a.Sym = n.Sym.Linksym()
a.Reg = int16(thearch.REGSP)
a.Offset = n.Xoffset + off
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if n.Class() == PPARAM || n.Class() == PPARAMOUT {
a.Name = obj.NAME_PARAM
} else {
a.Name = obj.NAME_AUTO
}
}
func (s *SSAGenState) AddrScratch(a *obj.Addr) {
if s.ScratchFpMem == nil {
panic("no scratch memory available; forgot to declare usesScratch for Op?")
}
a.Type = obj.TYPE_MEM
a.Name = obj.NAME_AUTO
a.Sym = s.ScratchFpMem.Sym.Linksym()
a.Reg = int16(thearch.REGSP)
a.Offset = s.ScratchFpMem.Xoffset
}
// Call returns a new CALL instruction for the SSA value v.
// It uses PrepareCall to prepare the call.
func (s *SSAGenState) Call(v *ssa.Value) *obj.Prog {
s.PrepareCall(v)
p := s.Prog(obj.ACALL)
p.Pos = v.Pos
if sym, ok := v.Aux.(*obj.LSym); ok {
p.To.Type = obj.TYPE_MEM
p.To.Name = obj.NAME_EXTERN
p.To.Sym = sym
} else {
// TODO(mdempsky): Can these differences be eliminated?
switch thearch.LinkArch.Family {
case sys.AMD64, sys.I386, sys.PPC64, sys.S390X, sys.Wasm:
p.To.Type = obj.TYPE_REG
case sys.ARM, sys.ARM64, sys.MIPS, sys.MIPS64:
p.To.Type = obj.TYPE_MEM
default:
Fatalf("unknown indirect call family")
}
p.To.Reg = v.Args[0].Reg()
}
return p
}
// PrepareCall prepares to emit a CALL instruction for v and does call-related bookkeeping.
// It must be called immediately before emitting the actual CALL instruction,
// since it emits PCDATA for the stack map at the call (calls are safe points).
func (s *SSAGenState) PrepareCall(v *ssa.Value) {
idx := s.livenessMap.Get(v)
if !idx.Valid() {
// typedmemclr and typedmemmove are write barriers and
// deeply non-preemptible. They are unsafe points and
// hence should not have liveness maps.
if sym, _ := v.Aux.(*obj.LSym); !(sym == typedmemclr || sym == typedmemmove) {
Fatalf("missing stack map index for %v", v.LongString())
}
}
if sym, _ := v.Aux.(*obj.LSym); sym == Deferreturn {
// Deferred calls will appear to be returning to
// the CALL deferreturn(SB) that we are about to emit.
// However, the stack trace code will show the line
// of the instruction byte before the return PC.
// To avoid that being an unrelated instruction,
// insert an actual hardware NOP that will have the right line number.
// This is different from obj.ANOP, which is a virtual no-op
// that doesn't make it into the instruction stream.
thearch.Ginsnopdefer(s.pp)
}
if sym, ok := v.Aux.(*obj.LSym); ok {
cmd/compile: improve coverage of nowritebarrierrec check The current go:nowritebarrierrec checker has two problems that limit its coverage: 1. It doesn't understand that systemstack calls its argument, which means there are several cases where we fail to detect prohibited write barriers. 2. It only observes calls in the AST, so calls constructed during lowering by SSA aren't followed. This CL completely rewrites this checker to address these issues. The current checker runs entirely after walk and uses visitBottomUp, which introduces several problems for checking across systemstack. First, visitBottomUp itself doesn't understand systemstack calls, so the callee may be ordered after the caller, causing the checker to fail to propagate constraints. Second, many systemstack calls are passed a closure, which is quite difficult to resolve back to the function definition after transformclosure and walk have run. Third, visitBottomUp works exclusively on the AST, so it can't observe calls created by SSA. To address these problems, this commit splits the check into two phases and rewrites it to use a call graph generated during SSA lowering. The first phase runs before transformclosure/walk and simply records systemstack arguments when they're easy to get. Then, it modifies genssa to record static call edges at the point where we're lowering to Progs (which is the latest point at which position information is conveniently available). Finally, the second phase runs after all functions have been lowered and uses a direct BFS walk of the call graph (combining systemstack calls with static calls) to find prohibited write barriers and construct nice error messages. Fixes #22384. For #22460. Change-Id: I39668f7f2366ab3c1ab1a71eaf25484d25349540 Reviewed-on: https://go-review.googlesource.com/72773 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-10-22 16:36:27 -04:00
// Record call graph information for nowritebarrierrec
// analysis.
if nowritebarrierrecCheck != nil {
nowritebarrierrecCheck.recordCall(s.pp.curfn, sym, v.Pos)
}
}
if s.maxarg < v.AuxInt {
s.maxarg = v.AuxInt
}
}
// UseArgs records the fact that an instruction needs a certain amount of
// callee args space for its use.
func (s *SSAGenState) UseArgs(n int64) {
if s.maxarg < n {
s.maxarg = n
}
}
// fieldIdx finds the index of the field referred to by the ODOT node n.
func fieldIdx(n *Node) int {
t := n.Left.Type
f := n.Sym
if !t.IsStruct() {
panic("ODOT's LHS is not a struct")
}
var i int
for _, t1 := range t.Fields().Slice() {
if t1.Sym != f {
i++
continue
}
if t1.Offset != n.Xoffset {
panic("field offset doesn't match")
}
return i
}
panic(fmt.Sprintf("can't find field in expr %v\n", n))
// TODO: keep the result of this function somewhere in the ODOT Node
// so we don't have to recompute it each time we need it.
}
// ssafn holds frontend information about a function that the backend is processing.
// It also exports a bunch of compiler services for the ssa backend.
type ssafn struct {
curfn *Node
strings map[string]interface{} // map from constant string to data symbols
scratchFpMem *Node // temp for floating point register / memory moves on some architectures
stksize int64 // stack size for current frame
stkptrsize int64 // prefix of stack containing pointers
log bool // print ssa debug to the stdout
}
// StringData returns a symbol (a *types.Sym wrapped in an interface) which
// is the data component of a global string constant containing s.
func (e *ssafn) StringData(s string) interface{} {
if aux, ok := e.strings[s]; ok {
return aux
}
if e.strings == nil {
e.strings = make(map[string]interface{})
}
data := stringsym(e.curfn.Pos, s)
e.strings[s] = data
return data
}
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (e *ssafn) Auto(pos src.XPos, t *types.Type) ssa.GCNode {
n := tempAt(pos, e.curfn, t) // Note: adds new auto to e.curfn.Func.Dcl list
return n
}
func (e *ssafn) SplitString(name ssa.LocalSlot) (ssa.LocalSlot, ssa.LocalSlot) {
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
n := name.N.(*Node)
ptrType := types.NewPtr(types.Types[TUINT8])
lenType := types.Types[TINT]
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if n.Class() == PAUTO && !n.Addrtaken() {
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
// Split this string up into two separate variables.
p := e.splitSlot(&name, ".ptr", 0, ptrType)
l := e.splitSlot(&name, ".len", ptrType.Size(), lenType)
return p, l
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
}
// Return the two parts of the larger variable.
return ssa.LocalSlot{N: n, Type: ptrType, Off: name.Off}, ssa.LocalSlot{N: n, Type: lenType, Off: name.Off + int64(Widthptr)}
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
}
func (e *ssafn) SplitInterface(name ssa.LocalSlot) (ssa.LocalSlot, ssa.LocalSlot) {
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
n := name.N.(*Node)
u := types.Types[TUINTPTR]
t := types.NewPtr(types.Types[TUINT8])
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if n.Class() == PAUTO && !n.Addrtaken() {
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
// Split this interface up into two separate variables.
f := ".itab"
if n.Type.IsEmptyInterface() {
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
f = ".type"
}
c := e.splitSlot(&name, f, 0, u) // see comment in plive.go:onebitwalktype1.
d := e.splitSlot(&name, ".data", u.Size(), t)
return c, d
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
}
// Return the two parts of the larger variable.
return ssa.LocalSlot{N: n, Type: u, Off: name.Off}, ssa.LocalSlot{N: n, Type: t, Off: name.Off + int64(Widthptr)}
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
}
func (e *ssafn) SplitSlice(name ssa.LocalSlot) (ssa.LocalSlot, ssa.LocalSlot, ssa.LocalSlot) {
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
n := name.N.(*Node)
ptrType := types.NewPtr(name.Type.Elem())
lenType := types.Types[TINT]
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if n.Class() == PAUTO && !n.Addrtaken() {
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
// Split this slice up into three separate variables.
p := e.splitSlot(&name, ".ptr", 0, ptrType)
l := e.splitSlot(&name, ".len", ptrType.Size(), lenType)
c := e.splitSlot(&name, ".cap", ptrType.Size()+lenType.Size(), lenType)
return p, l, c
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
}
// Return the three parts of the larger variable.
return ssa.LocalSlot{N: n, Type: ptrType, Off: name.Off},
ssa.LocalSlot{N: n, Type: lenType, Off: name.Off + int64(Widthptr)},
ssa.LocalSlot{N: n, Type: lenType, Off: name.Off + int64(2*Widthptr)}
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
}
func (e *ssafn) SplitComplex(name ssa.LocalSlot) (ssa.LocalSlot, ssa.LocalSlot) {
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
n := name.N.(*Node)
s := name.Type.Size() / 2
var t *types.Type
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
if s == 8 {
t = types.Types[TFLOAT64]
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
} else {
t = types.Types[TFLOAT32]
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
}
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if n.Class() == PAUTO && !n.Addrtaken() {
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
// Split this complex up into two separate variables.
r := e.splitSlot(&name, ".real", 0, t)
i := e.splitSlot(&name, ".imag", t.Size(), t)
return r, i
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
}
// Return the two parts of the larger variable.
return ssa.LocalSlot{N: n, Type: t, Off: name.Off}, ssa.LocalSlot{N: n, Type: t, Off: name.Off + s}
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
}
func (e *ssafn) SplitInt64(name ssa.LocalSlot) (ssa.LocalSlot, ssa.LocalSlot) {
n := name.N.(*Node)
var t *types.Type
if name.Type.IsSigned() {
t = types.Types[TINT32]
} else {
t = types.Types[TUINT32]
}
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if n.Class() == PAUTO && !n.Addrtaken() {
// Split this int64 up into two separate variables.
if thearch.LinkArch.ByteOrder == binary.BigEndian {
return e.splitSlot(&name, ".hi", 0, t), e.splitSlot(&name, ".lo", t.Size(), types.Types[TUINT32])
}
return e.splitSlot(&name, ".hi", t.Size(), t), e.splitSlot(&name, ".lo", 0, types.Types[TUINT32])
}
// Return the two parts of the larger variable.
if thearch.LinkArch.ByteOrder == binary.BigEndian {
return ssa.LocalSlot{N: n, Type: t, Off: name.Off}, ssa.LocalSlot{N: n, Type: types.Types[TUINT32], Off: name.Off + 4}
}
return ssa.LocalSlot{N: n, Type: t, Off: name.Off + 4}, ssa.LocalSlot{N: n, Type: types.Types[TUINT32], Off: name.Off}
}
func (e *ssafn) SplitStruct(name ssa.LocalSlot, i int) ssa.LocalSlot {
n := name.N.(*Node)
st := name.Type
ft := st.FieldType(i)
var offset int64
for f := 0; f < i; f++ {
offset += st.FieldType(f).Size()
}
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if n.Class() == PAUTO && !n.Addrtaken() {
// Note: the _ field may appear several times. But
// have no fear, identically-named but distinct Autos are
// ok, albeit maybe confusing for a debugger.
return e.splitSlot(&name, "."+st.FieldName(i), offset, ft)
}
return ssa.LocalSlot{N: n, Type: ft, Off: name.Off + st.FieldOff(i)}
}
func (e *ssafn) SplitArray(name ssa.LocalSlot) ssa.LocalSlot {
n := name.N.(*Node)
at := name.Type
if at.NumElem() != 1 {
Fatalf("bad array size")
}
et := at.Elem()
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if n.Class() == PAUTO && !n.Addrtaken() {
return e.splitSlot(&name, "[0]", 0, et)
}
return ssa.LocalSlot{N: n, Type: et, Off: name.Off}
}
func (e *ssafn) DerefItab(it *obj.LSym, offset int64) *obj.LSym {
return itabsym(it, offset)
}
// splitSlot returns a slot representing the data of parent starting at offset.
func (e *ssafn) splitSlot(parent *ssa.LocalSlot, suffix string, offset int64, t *types.Type) ssa.LocalSlot {
s := &types.Sym{Name: parent.N.(*Node).Sym.Name + suffix, Pkg: localpkg}
n := &Node{
Name: new(Name),
Op: ONAME,
Pos: parent.N.(*Node).Pos,
}
n.Orig = n
s.Def = asTypesNode(n)
asNode(s.Def).Name.SetUsed(true)
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
n.Sym = s
n.Type = t
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
n.SetClass(PAUTO)
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
n.SetAddable(true)
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
n.Esc = EscNever
n.Name.Curfn = e.curfn
e.curfn.Func.Dcl = append(e.curfn.Func.Dcl, n)
dowidth(t)
return ssa.LocalSlot{N: n, Type: t, Off: 0, SplitOf: parent, SplitOffset: offset}
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
}
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (e *ssafn) CanSSA(t *types.Type) bool {
return canSSAType(t)
}
func (e *ssafn) Line(pos src.XPos) string {
[dev.inline] cmd/internal/src: replace src.Pos with syntax.Pos This replaces the src.Pos LineHist-based position tracking with the syntax.Pos implementation and updates all uses. The LineHist table is not used anymore - the respective code is still there but should be removed eventually. CL forthcoming. Passes toolstash -cmp when comparing to the master repo (with the exception of a couple of swapped assembly instructions, likely due to different instruction scheduling because the line-based sorting has changed; though this is won't affect correctness). The sizes of various important compiler data structures have increased significantly (see the various sizes_test.go files); this is probably the reason for an increase of compilation times (to be addressed). Here are the results of compilebench -count 5, run on a "quiet" machine (no apps running besides a terminal): name old time/op new time/op delta Template 256ms ± 1% 280ms ±15% +9.54% (p=0.008 n=5+5) Unicode 132ms ± 1% 132ms ± 1% ~ (p=0.690 n=5+5) GoTypes 891ms ± 1% 917ms ± 2% +2.88% (p=0.008 n=5+5) Compiler 3.84s ± 2% 3.99s ± 2% +3.95% (p=0.016 n=5+5) MakeBash 47.1s ± 1% 47.2s ± 2% ~ (p=0.841 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 326M ± 2% +5.18% (p=0.008 n=5+5) Unicode 165M ± 1% 168M ± 4% ~ (p=0.421 n=5+5) GoTypes 1.14G ± 2% 1.18G ± 1% +3.47% (p=0.008 n=5+5) Compiler 5.00G ± 1% 5.16G ± 1% +3.12% (p=0.008 n=5+5) Change-Id: I241c4246cdff627d7ecb95cac23060b38f9775ec Reviewed-on: https://go-review.googlesource.com/34273 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09 17:15:05 -08:00
return linestr(pos)
}
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
// Log logs a message from the compiler.
func (e *ssafn) Logf(msg string, args ...interface{}) {
if e.log {
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
fmt.Printf(msg, args...)
}
}
func (e *ssafn) Log() bool {
return e.log
}
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
// Fatal reports a compiler error and exits.
func (e *ssafn) Fatalf(pos src.XPos, msg string, args ...interface{}) {
[dev.inline] cmd/internal/src: replace src.Pos with syntax.Pos This replaces the src.Pos LineHist-based position tracking with the syntax.Pos implementation and updates all uses. The LineHist table is not used anymore - the respective code is still there but should be removed eventually. CL forthcoming. Passes toolstash -cmp when comparing to the master repo (with the exception of a couple of swapped assembly instructions, likely due to different instruction scheduling because the line-based sorting has changed; though this is won't affect correctness). The sizes of various important compiler data structures have increased significantly (see the various sizes_test.go files); this is probably the reason for an increase of compilation times (to be addressed). Here are the results of compilebench -count 5, run on a "quiet" machine (no apps running besides a terminal): name old time/op new time/op delta Template 256ms ± 1% 280ms ±15% +9.54% (p=0.008 n=5+5) Unicode 132ms ± 1% 132ms ± 1% ~ (p=0.690 n=5+5) GoTypes 891ms ± 1% 917ms ± 2% +2.88% (p=0.008 n=5+5) Compiler 3.84s ± 2% 3.99s ± 2% +3.95% (p=0.016 n=5+5) MakeBash 47.1s ± 1% 47.2s ± 2% ~ (p=0.841 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 326M ± 2% +5.18% (p=0.008 n=5+5) Unicode 165M ± 1% 168M ± 4% ~ (p=0.421 n=5+5) GoTypes 1.14G ± 2% 1.18G ± 1% +3.47% (p=0.008 n=5+5) Compiler 5.00G ± 1% 5.16G ± 1% +3.12% (p=0.008 n=5+5) Change-Id: I241c4246cdff627d7ecb95cac23060b38f9775ec Reviewed-on: https://go-review.googlesource.com/34273 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09 17:15:05 -08:00
lineno = pos
nargs := append([]interface{}{e.curfn.funcname()}, args...)
Fatalf("'%s': "+msg, nargs...)
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
}
// Warnl reports a "warning", which is usually flag-triggered
// logging output for the benefit of tests.
func (e *ssafn) Warnl(pos src.XPos, fmt_ string, args ...interface{}) {
[dev.inline] cmd/internal/src: replace src.Pos with syntax.Pos This replaces the src.Pos LineHist-based position tracking with the syntax.Pos implementation and updates all uses. The LineHist table is not used anymore - the respective code is still there but should be removed eventually. CL forthcoming. Passes toolstash -cmp when comparing to the master repo (with the exception of a couple of swapped assembly instructions, likely due to different instruction scheduling because the line-based sorting has changed; though this is won't affect correctness). The sizes of various important compiler data structures have increased significantly (see the various sizes_test.go files); this is probably the reason for an increase of compilation times (to be addressed). Here are the results of compilebench -count 5, run on a "quiet" machine (no apps running besides a terminal): name old time/op new time/op delta Template 256ms ± 1% 280ms ±15% +9.54% (p=0.008 n=5+5) Unicode 132ms ± 1% 132ms ± 1% ~ (p=0.690 n=5+5) GoTypes 891ms ± 1% 917ms ± 2% +2.88% (p=0.008 n=5+5) Compiler 3.84s ± 2% 3.99s ± 2% +3.95% (p=0.016 n=5+5) MakeBash 47.1s ± 1% 47.2s ± 2% ~ (p=0.841 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 326M ± 2% +5.18% (p=0.008 n=5+5) Unicode 165M ± 1% 168M ± 4% ~ (p=0.421 n=5+5) GoTypes 1.14G ± 2% 1.18G ± 1% +3.47% (p=0.008 n=5+5) Compiler 5.00G ± 1% 5.16G ± 1% +3.12% (p=0.008 n=5+5) Change-Id: I241c4246cdff627d7ecb95cac23060b38f9775ec Reviewed-on: https://go-review.googlesource.com/34273 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09 17:15:05 -08:00
Warnl(pos, fmt_, args...)
}
func (e *ssafn) Debug_checknil() bool {
return Debug_checknil != 0
}
func (e *ssafn) UseWriteBarrier() bool {
return use_writebarrier
}
func (e *ssafn) Syslook(name string) *obj.LSym {
switch name {
case "goschedguarded":
return goschedguarded
case "writeBarrier":
return writeBarrier
cmd/compile: compiler support for buffered write barrier This CL implements the compiler support for calling the buffered write barrier added by the previous CL. Since the buffered write barrier is only implemented on amd64 right now, this still supports the old, eager write barrier as well. There's little overhead to supporting both and this way a few tests in test/fixedbugs that expect to have liveness maps at write barrier calls can easily opt-in to the old, eager barrier. This significantly improves the performance of the write barrier: name old time/op new time/op delta WriteBarrier-12 73.5ns ±20% 19.2ns ±27% -73.90% (p=0.000 n=19+18) It also reduces the size of binaries because the write barrier call is more compact: name old object-bytes new object-bytes delta Template 398k ± 0% 393k ± 0% -1.14% (p=0.008 n=5+5) Unicode 208k ± 0% 206k ± 0% -1.00% (p=0.008 n=5+5) GoTypes 1.18M ± 0% 1.15M ± 0% -2.00% (p=0.008 n=5+5) Compiler 4.05M ± 0% 3.88M ± 0% -4.26% (p=0.008 n=5+5) SSA 8.25M ± 0% 8.11M ± 0% -1.59% (p=0.008 n=5+5) Flate 228k ± 0% 224k ± 0% -1.83% (p=0.008 n=5+5) GoParser 295k ± 0% 284k ± 0% -3.62% (p=0.008 n=5+5) Reflect 1.00M ± 0% 0.99M ± 0% -0.70% (p=0.008 n=5+5) Tar 339k ± 0% 333k ± 0% -1.67% (p=0.008 n=5+5) XML 404k ± 0% 395k ± 0% -2.10% (p=0.008 n=5+5) [Geo mean] 704k 690k -2.00% name old exe-bytes new exe-bytes delta HelloSize 1.05M ± 0% 1.04M ± 0% -1.55% (p=0.008 n=5+5) https://perf.golang.org/search?q=upload:20171027.1 (Amusingly, this also reduces compiler allocations by 0.75%, which, combined with the better write barrier, speeds up the compiler overall by 2.10%. See the perf link.) It slightly improves the performance of most of the go1 benchmarks and improves the performance of the x/benchmarks: name old time/op new time/op delta BinaryTree17-12 2.40s ± 1% 2.47s ± 1% +2.69% (p=0.000 n=19+19) Fannkuch11-12 2.95s ± 0% 2.95s ± 0% +0.21% (p=0.000 n=20+19) FmtFprintfEmpty-12 41.8ns ± 4% 41.4ns ± 2% -1.03% (p=0.014 n=20+20) FmtFprintfString-12 68.7ns ± 2% 67.5ns ± 1% -1.75% (p=0.000 n=20+17) FmtFprintfInt-12 79.0ns ± 3% 77.1ns ± 1% -2.40% (p=0.000 n=19+17) FmtFprintfIntInt-12 127ns ± 1% 123ns ± 3% -3.42% (p=0.000 n=20+20) FmtFprintfPrefixedInt-12 152ns ± 1% 150ns ± 1% -1.02% (p=0.000 n=18+17) FmtFprintfFloat-12 211ns ± 1% 209ns ± 0% -0.99% (p=0.000 n=20+16) FmtManyArgs-12 500ns ± 0% 496ns ± 0% -0.73% (p=0.000 n=17+20) GobDecode-12 6.44ms ± 1% 6.53ms ± 0% +1.28% (p=0.000 n=20+19) GobEncode-12 5.46ms ± 0% 5.46ms ± 1% ~ (p=0.550 n=19+20) Gzip-12 220ms ± 1% 216ms ± 0% -1.75% (p=0.000 n=19+19) Gunzip-12 38.8ms ± 0% 38.6ms ± 0% -0.30% (p=0.000 n=18+19) HTTPClientServer-12 79.0µs ± 1% 78.2µs ± 1% -1.01% (p=0.000 n=20+20) JSONEncode-12 11.9ms ± 0% 11.9ms ± 0% -0.29% (p=0.000 n=20+19) JSONDecode-12 52.6ms ± 0% 52.2ms ± 0% -0.68% (p=0.000 n=19+20) Mandelbrot200-12 3.69ms ± 0% 3.68ms ± 0% -0.36% (p=0.000 n=20+20) GoParse-12 3.13ms ± 1% 3.18ms ± 1% +1.67% (p=0.000 n=19+20) RegexpMatchEasy0_32-12 73.2ns ± 1% 72.3ns ± 1% -1.19% (p=0.000 n=19+18) RegexpMatchEasy0_1K-12 241ns ± 0% 239ns ± 0% -0.83% (p=0.000 n=17+16) RegexpMatchEasy1_32-12 68.6ns ± 1% 69.0ns ± 1% +0.47% (p=0.015 n=18+16) RegexpMatchEasy1_1K-12 364ns ± 0% 361ns ± 0% -0.67% (p=0.000 n=16+17) RegexpMatchMedium_32-12 104ns ± 1% 103ns ± 1% -0.79% (p=0.001 n=20+15) RegexpMatchMedium_1K-12 33.8µs ± 3% 34.0µs ± 2% ~ (p=0.267 n=20+19) RegexpMatchHard_32-12 1.64µs ± 1% 1.62µs ± 2% -1.25% (p=0.000 n=19+18) RegexpMatchHard_1K-12 49.2µs ± 0% 48.7µs ± 1% -0.93% (p=0.000 n=19+18) Revcomp-12 391ms ± 5% 396ms ± 7% ~ (p=0.154 n=19+19) Template-12 63.1ms ± 0% 59.5ms ± 0% -5.76% (p=0.000 n=18+19) TimeParse-12 307ns ± 0% 306ns ± 0% -0.39% (p=0.000 n=19+17) TimeFormat-12 325ns ± 0% 323ns ± 0% -0.50% (p=0.000 n=19+19) [Geo mean] 47.3µs 46.9µs -0.67% https://perf.golang.org/search?q=upload:20171026.1 name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.25ms ± 1% 2.20ms ± 1% -2.31% (p=0.000 n=18+18) HTTP-12 12.6µs ± 0% 12.6µs ± 0% -0.72% (p=0.000 n=18+17) JSON-12 11.0ms ± 0% 11.0ms ± 1% -0.68% (p=0.000 n=17+19) https://perf.golang.org/search?q=upload:20171026.2 Updates #14951. Updates #22460. Change-Id: Id4c0932890a1d41020071bec73b8522b1367d3e7 Reviewed-on: https://go-review.googlesource.com/73712 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-26 12:33:04 -04:00
case "gcWriteBarrier":
return gcWriteBarrier
case "typedmemmove":
return typedmemmove
case "typedmemclr":
return typedmemclr
}
Fatalf("unknown Syslook func %v", name)
return nil
}
func (e *ssafn) SetWBPos(pos src.XPos) {
e.curfn.Func.setWBPos(pos)
}
cmd/compile: change ssa.Type into *types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
func (n *Node) Typ() *types.Type {
return n.Type
}
func (n *Node) StorageClass() ssa.StorageClass {
switch n.Class() {
case PPARAM:
return ssa.ClassParam
case PPARAMOUT:
return ssa.ClassParamOut
case PAUTO:
return ssa.ClassAuto
default:
Fatalf("untranslatable storage class for %v: %s", n, n.Class())
return 0
}
}