go/src/cmd/compile/internal/gc/plive.go

1484 lines
42 KiB
Go
Raw Normal View History

// Copyright 2013 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// Garbage collector liveness bitmap generation.
// The command line flag -live causes this code to print debug information.
// The levels are:
//
// -live (aka -live=1): print liveness lists as code warnings at safe points
// -live=2: print an assembly listing with liveness annotations
//
// Each level includes the earlier output as well.
package gc
import (
"cmd/compile/internal/ssa"
"cmd/compile/internal/types"
"cmd/internal/obj"
"cmd/internal/objabi"
"cmd/internal/src"
"crypto/md5"
"crypto/sha1"
"fmt"
"os"
"strings"
)
// OpVarDef is an annotation for the liveness analysis, marking a place
// where a complete initialization (definition) of a variable begins.
// Since the liveness analysis can see initialization of single-word
// variables quite easy, OpVarDef is only needed for multi-word
// variables satisfying isfat(n.Type). For simplicity though, buildssa
// emits OpVarDef regardless of variable width.
//
// An 'OpVarDef x' annotation in the instruction stream tells the liveness
// analysis to behave as though the variable x is being initialized at that
// point in the instruction stream. The OpVarDef must appear before the
// actual (multi-instruction) initialization, and it must also appear after
// any uses of the previous value, if any. For example, if compiling:
//
// x = x[1:]
//
// it is important to generate code like:
//
// base, len, cap = pieces of x[1:]
// OpVarDef x
// x = {base, len, cap}
//
// If instead the generated code looked like:
//
// OpVarDef x
// base, len, cap = pieces of x[1:]
// x = {base, len, cap}
//
// then the liveness analysis would decide the previous value of x was
// unnecessary even though it is about to be used by the x[1:] computation.
// Similarly, if the generated code looked like:
//
// base, len, cap = pieces of x[1:]
// x = {base, len, cap}
// OpVarDef x
//
// then the liveness analysis will not preserve the new value of x, because
// the OpVarDef appears to have "overwritten" it.
//
// OpVarDef is a bit of a kludge to work around the fact that the instruction
// stream is working on single-word values but the liveness analysis
// wants to work on individual variables, which might be multi-word
// aggregates. It might make sense at some point to look into letting
// the liveness analysis work on single-word values as well, although
// there are complications around interface values, slices, and strings,
// all of which cannot be treated as individual words.
//
// OpVarKill is the opposite of OpVarDef: it marks a value as no longer needed,
// even if its address has been taken. That is, an OpVarKill annotation asserts
// that its argument is certainly dead, for use when the liveness analysis
// would not otherwise be able to deduce that fact.
// BlockEffects summarizes the liveness effects on an SSA block.
type BlockEffects struct {
lastbitmapindex int // for Liveness.epilogue
// Computed during Liveness.prologue using only the content of
// individual blocks:
//
// uevar: upward exposed variables (used before set in block)
// varkill: killed variables (set in block)
// avarinit: addrtaken variables set or used (proof of initialization)
uevar bvec
varkill bvec
avarinit bvec
// Computed during Liveness.solve using control flow information:
//
// livein: variables live at block entry
// liveout: variables live at block exit
// avarinitany: addrtaken variables possibly initialized at block exit
// (initialized in block or at exit from any predecessor block)
// avarinitall: addrtaken variables certainly initialized at block exit
// (initialized in block or at exit from all predecessor blocks)
livein bvec
liveout bvec
avarinitany bvec
avarinitall bvec
}
// A collection of global state used by liveness analysis.
type Liveness struct {
fn *Node
f *ssa.Func
vars []*Node
cmd/compile: use a map to track liveness variable indices It is not safe to modify Node.Opt in the backend. Instead of using Node.Opt to store liveness variable indices, use a map. This simplifies the code and makes it much more clearly race-free. There are generally few such variables, so the maps are not a significant source of allocations; this also remove some allocations from putting int32s into interfaces. Because map lookups are more expensive than interface value extraction, reorder valueEffects to do the map lookup last. The only remaining use of Node.Opt is now in esc.go. Passes toolstash-check. Fixes #20144 name old alloc/op new alloc/op delta Template 37.8MB ± 0% 37.9MB ± 0% ~ (p=0.548 n=5+5) Unicode 28.9MB ± 0% 28.9MB ± 0% ~ (p=0.548 n=5+5) GoTypes 110MB ± 0% 110MB ± 0% +0.16% (p=0.008 n=5+5) Compiler 461MB ± 0% 462MB ± 0% +0.08% (p=0.008 n=5+5) SSA 1.11GB ± 0% 1.11GB ± 0% +0.11% (p=0.008 n=5+5) Flate 24.7MB ± 0% 24.7MB ± 0% ~ (p=0.690 n=5+5) GoParser 31.1MB ± 0% 31.1MB ± 0% ~ (p=0.841 n=5+5) Reflect 73.7MB ± 0% 73.8MB ± 0% +0.23% (p=0.008 n=5+5) Tar 25.8MB ± 0% 25.7MB ± 0% ~ (p=0.690 n=5+5) XML 41.2MB ± 0% 41.2MB ± 0% ~ (p=0.841 n=5+5) [Geo mean] 71.9MB 71.9MB +0.06% name old allocs/op new allocs/op delta Template 385k ± 0% 384k ± 0% ~ (p=0.548 n=5+5) Unicode 344k ± 0% 343k ± 1% ~ (p=0.421 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.690 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% ~ (p=0.095 n=5+5) SSA 9.86M ± 0% 9.84M ± 0% -0.19% (p=0.008 n=5+5) Flate 238k ± 0% 238k ± 0% ~ (p=1.000 n=5+5) GoParser 321k ± 0% 320k ± 0% ~ (p=0.310 n=5+5) Reflect 956k ± 0% 956k ± 0% ~ (p=1.000 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.056 n=5+5) XML 402k ± 1% 400k ± 1% -0.57% (p=0.032 n=5+5) [Geo mean] 740k 739k -0.19% Change-Id: Id5916c9def76add272e89c59fe10968f0a6bb01d Reviewed-on: https://go-review.googlesource.com/42135 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-27 16:27:47 -07:00
idx map[*Node]int32
stkptrsize int64
be []BlockEffects
// unsafePoints bit i is set if Value ID i is not a safe point.
unsafePoints bvec
// An array with a bit vector for each safe point tracking live variables.
// Indexed sequentially by safe points in Block and Value order.
livevars []bvec
cmd/compile: make liveness more efficient When the number of variables in a function is very large, liveness analysis gets less efficient, since every bit vector is O(number of variables). Improve the situation by returning a sparse representation from progeffects. In all scenarios, progeffects either returns a slice that is shared function-wide, and which is usually small, or a slice that is guaranteed to have at most three values. Reduces compilation time for the code in #8225 Comment 1 by ~10%. Minor effects on regular packages (below). Passes toolstash -cmp. Updates #8225 name old time/op new time/op delta Template 215ms ± 2% 212ms ± 4% -1.31% (p=0.001 n=30+30) Unicode 98.3ms ± 3% 98.4ms ± 5% ~ (p=0.971 n=30+30) GoTypes 657ms ± 3% 651ms ± 2% -0.98% (p=0.001 n=30+27) Compiler 2.78s ± 2% 2.77s ± 2% -0.60% (p=0.006 n=30+30) Flate 130ms ± 4% 130ms ± 4% ~ (p=0.712 n=29+30) GoParser 159ms ± 5% 158ms ± 3% ~ (p=0.331 n=29+30) Reflect 406ms ± 3% 404ms ± 3% -0.69% (p=0.041 n=29+30) Tar 117ms ± 4% 117ms ± 3% ~ (p=0.886 n=30+29) XML 219ms ± 2% 217ms ± 2% ~ (p=0.091 n=29+24) name old user-ns/op new user-ns/op delta Template 272user-ms ± 3% 270user-ms ± 3% -1.03% (p=0.004 n=30+30) Unicode 138user-ms ± 2% 138user-ms ± 3% ~ (p=0.902 n=29+29) GoTypes 891user-ms ± 2% 883user-ms ± 2% -0.95% (p=0.000 n=29+29) Compiler 3.85user-s ± 2% 3.84user-s ± 2% ~ (p=0.236 n=30+30) Flate 167user-ms ± 2% 166user-ms ± 4% ~ (p=0.511 n=28+30) GoParser 211user-ms ± 4% 210user-ms ± 3% ~ (p=0.287 n=29+30) Reflect 539user-ms ± 3% 536user-ms ± 2% -0.59% (p=0.034 n=29+30) Tar 154user-ms ± 3% 155user-ms ± 4% ~ (p=0.786 n=30+30) XML 289user-ms ± 3% 288user-ms ± 4% ~ (p=0.249 n=30+26) name old alloc/op new alloc/op delta Template 40.7MB ± 0% 40.8MB ± 0% +0.09% (p=0.001 n=30+30) Unicode 30.8MB ± 0% 30.8MB ± 0% ~ (p=0.112 n=30+30) GoTypes 123MB ± 0% 124MB ± 0% +0.09% (p=0.000 n=30+30) Compiler 473MB ± 0% 473MB ± 0% +0.05% (p=0.000 n=30+30) Flate 26.5MB ± 0% 26.5MB ± 0% ~ (p=0.186 n=29+30) GoParser 32.3MB ± 0% 32.4MB ± 0% +0.07% (p=0.021 n=28+30) Reflect 84.4MB ± 0% 84.6MB ± 0% +0.21% (p=0.000 n=30+30) Tar 27.3MB ± 0% 27.3MB ± 0% +0.09% (p=0.010 n=30+28) XML 44.7MB ± 0% 44.7MB ± 0% +0.07% (p=0.002 n=30+30) name old allocs/op new allocs/op delta Template 401k ± 1% 400k ± 1% ~ (p=0.321 n=30+30) Unicode 331k ± 1% 331k ± 1% ~ (p=0.357 n=30+28) GoTypes 1.24M ± 0% 1.24M ± 1% -0.19% (p=0.001 n=30+30) Compiler 4.27M ± 0% 4.27M ± 0% -0.13% (p=0.000 n=30+30) Flate 252k ± 1% 251k ± 1% -0.30% (p=0.005 n=30+30) GoParser 325k ± 1% 325k ± 1% ~ (p=0.224 n=28+30) Reflect 1.06M ± 0% 1.05M ± 0% -0.34% (p=0.000 n=30+30) Tar 266k ± 1% 266k ± 1% ~ (p=0.333 n=30+30) XML 416k ± 1% 415k ± 1% ~ (p=0.144 n=30+29) Change-Id: I6ba67a9203516373062a2618122306da73333d98 Reviewed-on: https://go-review.googlesource.com/36211 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-01-14 23:43:26 -08:00
// livenessMap maps from safe points (i.e., CALLs) to their
// liveness map indexes.
//
// TODO(austin): Now that we have liveness at almost every PC,
// should this be a dense structure?
livenessMap LivenessMap
stackMaps []bvec
cmd/compile: make liveness more efficient When the number of variables in a function is very large, liveness analysis gets less efficient, since every bit vector is O(number of variables). Improve the situation by returning a sparse representation from progeffects. In all scenarios, progeffects either returns a slice that is shared function-wide, and which is usually small, or a slice that is guaranteed to have at most three values. Reduces compilation time for the code in #8225 Comment 1 by ~10%. Minor effects on regular packages (below). Passes toolstash -cmp. Updates #8225 name old time/op new time/op delta Template 215ms ± 2% 212ms ± 4% -1.31% (p=0.001 n=30+30) Unicode 98.3ms ± 3% 98.4ms ± 5% ~ (p=0.971 n=30+30) GoTypes 657ms ± 3% 651ms ± 2% -0.98% (p=0.001 n=30+27) Compiler 2.78s ± 2% 2.77s ± 2% -0.60% (p=0.006 n=30+30) Flate 130ms ± 4% 130ms ± 4% ~ (p=0.712 n=29+30) GoParser 159ms ± 5% 158ms ± 3% ~ (p=0.331 n=29+30) Reflect 406ms ± 3% 404ms ± 3% -0.69% (p=0.041 n=29+30) Tar 117ms ± 4% 117ms ± 3% ~ (p=0.886 n=30+29) XML 219ms ± 2% 217ms ± 2% ~ (p=0.091 n=29+24) name old user-ns/op new user-ns/op delta Template 272user-ms ± 3% 270user-ms ± 3% -1.03% (p=0.004 n=30+30) Unicode 138user-ms ± 2% 138user-ms ± 3% ~ (p=0.902 n=29+29) GoTypes 891user-ms ± 2% 883user-ms ± 2% -0.95% (p=0.000 n=29+29) Compiler 3.85user-s ± 2% 3.84user-s ± 2% ~ (p=0.236 n=30+30) Flate 167user-ms ± 2% 166user-ms ± 4% ~ (p=0.511 n=28+30) GoParser 211user-ms ± 4% 210user-ms ± 3% ~ (p=0.287 n=29+30) Reflect 539user-ms ± 3% 536user-ms ± 2% -0.59% (p=0.034 n=29+30) Tar 154user-ms ± 3% 155user-ms ± 4% ~ (p=0.786 n=30+30) XML 289user-ms ± 3% 288user-ms ± 4% ~ (p=0.249 n=30+26) name old alloc/op new alloc/op delta Template 40.7MB ± 0% 40.8MB ± 0% +0.09% (p=0.001 n=30+30) Unicode 30.8MB ± 0% 30.8MB ± 0% ~ (p=0.112 n=30+30) GoTypes 123MB ± 0% 124MB ± 0% +0.09% (p=0.000 n=30+30) Compiler 473MB ± 0% 473MB ± 0% +0.05% (p=0.000 n=30+30) Flate 26.5MB ± 0% 26.5MB ± 0% ~ (p=0.186 n=29+30) GoParser 32.3MB ± 0% 32.4MB ± 0% +0.07% (p=0.021 n=28+30) Reflect 84.4MB ± 0% 84.6MB ± 0% +0.21% (p=0.000 n=30+30) Tar 27.3MB ± 0% 27.3MB ± 0% +0.09% (p=0.010 n=30+28) XML 44.7MB ± 0% 44.7MB ± 0% +0.07% (p=0.002 n=30+30) name old allocs/op new allocs/op delta Template 401k ± 1% 400k ± 1% ~ (p=0.321 n=30+30) Unicode 331k ± 1% 331k ± 1% ~ (p=0.357 n=30+28) GoTypes 1.24M ± 0% 1.24M ± 1% -0.19% (p=0.001 n=30+30) Compiler 4.27M ± 0% 4.27M ± 0% -0.13% (p=0.000 n=30+30) Flate 252k ± 1% 251k ± 1% -0.30% (p=0.005 n=30+30) GoParser 325k ± 1% 325k ± 1% ~ (p=0.224 n=28+30) Reflect 1.06M ± 0% 1.05M ± 0% -0.34% (p=0.000 n=30+30) Tar 266k ± 1% 266k ± 1% ~ (p=0.333 n=30+30) XML 416k ± 1% 415k ± 1% ~ (p=0.144 n=30+29) Change-Id: I6ba67a9203516373062a2618122306da73333d98 Reviewed-on: https://go-review.googlesource.com/36211 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-01-14 23:43:26 -08:00
cache progeffectscache
}
// LivenessMap maps from *ssa.Value to LivenessIndex.
type LivenessMap struct {
m map[*ssa.Value]LivenessIndex
}
func (m LivenessMap) Get(v *ssa.Value) LivenessIndex {
if i, ok := m.m[v]; ok {
return i
}
// Not a safe point.
return LivenessInvalid
}
// LivenessIndex stores the liveness map index for a safe-point.
type LivenessIndex struct {
stackMapIndex int
}
// LivenessInvalid indicates an unsafe point.
var LivenessInvalid = LivenessIndex{-1}
func (idx LivenessIndex) Valid() bool {
return idx.stackMapIndex >= 0
}
cmd/compile: make liveness more efficient When the number of variables in a function is very large, liveness analysis gets less efficient, since every bit vector is O(number of variables). Improve the situation by returning a sparse representation from progeffects. In all scenarios, progeffects either returns a slice that is shared function-wide, and which is usually small, or a slice that is guaranteed to have at most three values. Reduces compilation time for the code in #8225 Comment 1 by ~10%. Minor effects on regular packages (below). Passes toolstash -cmp. Updates #8225 name old time/op new time/op delta Template 215ms ± 2% 212ms ± 4% -1.31% (p=0.001 n=30+30) Unicode 98.3ms ± 3% 98.4ms ± 5% ~ (p=0.971 n=30+30) GoTypes 657ms ± 3% 651ms ± 2% -0.98% (p=0.001 n=30+27) Compiler 2.78s ± 2% 2.77s ± 2% -0.60% (p=0.006 n=30+30) Flate 130ms ± 4% 130ms ± 4% ~ (p=0.712 n=29+30) GoParser 159ms ± 5% 158ms ± 3% ~ (p=0.331 n=29+30) Reflect 406ms ± 3% 404ms ± 3% -0.69% (p=0.041 n=29+30) Tar 117ms ± 4% 117ms ± 3% ~ (p=0.886 n=30+29) XML 219ms ± 2% 217ms ± 2% ~ (p=0.091 n=29+24) name old user-ns/op new user-ns/op delta Template 272user-ms ± 3% 270user-ms ± 3% -1.03% (p=0.004 n=30+30) Unicode 138user-ms ± 2% 138user-ms ± 3% ~ (p=0.902 n=29+29) GoTypes 891user-ms ± 2% 883user-ms ± 2% -0.95% (p=0.000 n=29+29) Compiler 3.85user-s ± 2% 3.84user-s ± 2% ~ (p=0.236 n=30+30) Flate 167user-ms ± 2% 166user-ms ± 4% ~ (p=0.511 n=28+30) GoParser 211user-ms ± 4% 210user-ms ± 3% ~ (p=0.287 n=29+30) Reflect 539user-ms ± 3% 536user-ms ± 2% -0.59% (p=0.034 n=29+30) Tar 154user-ms ± 3% 155user-ms ± 4% ~ (p=0.786 n=30+30) XML 289user-ms ± 3% 288user-ms ± 4% ~ (p=0.249 n=30+26) name old alloc/op new alloc/op delta Template 40.7MB ± 0% 40.8MB ± 0% +0.09% (p=0.001 n=30+30) Unicode 30.8MB ± 0% 30.8MB ± 0% ~ (p=0.112 n=30+30) GoTypes 123MB ± 0% 124MB ± 0% +0.09% (p=0.000 n=30+30) Compiler 473MB ± 0% 473MB ± 0% +0.05% (p=0.000 n=30+30) Flate 26.5MB ± 0% 26.5MB ± 0% ~ (p=0.186 n=29+30) GoParser 32.3MB ± 0% 32.4MB ± 0% +0.07% (p=0.021 n=28+30) Reflect 84.4MB ± 0% 84.6MB ± 0% +0.21% (p=0.000 n=30+30) Tar 27.3MB ± 0% 27.3MB ± 0% +0.09% (p=0.010 n=30+28) XML 44.7MB ± 0% 44.7MB ± 0% +0.07% (p=0.002 n=30+30) name old allocs/op new allocs/op delta Template 401k ± 1% 400k ± 1% ~ (p=0.321 n=30+30) Unicode 331k ± 1% 331k ± 1% ~ (p=0.357 n=30+28) GoTypes 1.24M ± 0% 1.24M ± 1% -0.19% (p=0.001 n=30+30) Compiler 4.27M ± 0% 4.27M ± 0% -0.13% (p=0.000 n=30+30) Flate 252k ± 1% 251k ± 1% -0.30% (p=0.005 n=30+30) GoParser 325k ± 1% 325k ± 1% ~ (p=0.224 n=28+30) Reflect 1.06M ± 0% 1.05M ± 0% -0.34% (p=0.000 n=30+30) Tar 266k ± 1% 266k ± 1% ~ (p=0.333 n=30+30) XML 416k ± 1% 415k ± 1% ~ (p=0.144 n=30+29) Change-Id: I6ba67a9203516373062a2618122306da73333d98 Reviewed-on: https://go-review.googlesource.com/36211 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-01-14 23:43:26 -08:00
type progeffectscache struct {
textavarinit []int32
retuevar []int32
tailuevar []int32
cmd/compile: make liveness more efficient When the number of variables in a function is very large, liveness analysis gets less efficient, since every bit vector is O(number of variables). Improve the situation by returning a sparse representation from progeffects. In all scenarios, progeffects either returns a slice that is shared function-wide, and which is usually small, or a slice that is guaranteed to have at most three values. Reduces compilation time for the code in #8225 Comment 1 by ~10%. Minor effects on regular packages (below). Passes toolstash -cmp. Updates #8225 name old time/op new time/op delta Template 215ms ± 2% 212ms ± 4% -1.31% (p=0.001 n=30+30) Unicode 98.3ms ± 3% 98.4ms ± 5% ~ (p=0.971 n=30+30) GoTypes 657ms ± 3% 651ms ± 2% -0.98% (p=0.001 n=30+27) Compiler 2.78s ± 2% 2.77s ± 2% -0.60% (p=0.006 n=30+30) Flate 130ms ± 4% 130ms ± 4% ~ (p=0.712 n=29+30) GoParser 159ms ± 5% 158ms ± 3% ~ (p=0.331 n=29+30) Reflect 406ms ± 3% 404ms ± 3% -0.69% (p=0.041 n=29+30) Tar 117ms ± 4% 117ms ± 3% ~ (p=0.886 n=30+29) XML 219ms ± 2% 217ms ± 2% ~ (p=0.091 n=29+24) name old user-ns/op new user-ns/op delta Template 272user-ms ± 3% 270user-ms ± 3% -1.03% (p=0.004 n=30+30) Unicode 138user-ms ± 2% 138user-ms ± 3% ~ (p=0.902 n=29+29) GoTypes 891user-ms ± 2% 883user-ms ± 2% -0.95% (p=0.000 n=29+29) Compiler 3.85user-s ± 2% 3.84user-s ± 2% ~ (p=0.236 n=30+30) Flate 167user-ms ± 2% 166user-ms ± 4% ~ (p=0.511 n=28+30) GoParser 211user-ms ± 4% 210user-ms ± 3% ~ (p=0.287 n=29+30) Reflect 539user-ms ± 3% 536user-ms ± 2% -0.59% (p=0.034 n=29+30) Tar 154user-ms ± 3% 155user-ms ± 4% ~ (p=0.786 n=30+30) XML 289user-ms ± 3% 288user-ms ± 4% ~ (p=0.249 n=30+26) name old alloc/op new alloc/op delta Template 40.7MB ± 0% 40.8MB ± 0% +0.09% (p=0.001 n=30+30) Unicode 30.8MB ± 0% 30.8MB ± 0% ~ (p=0.112 n=30+30) GoTypes 123MB ± 0% 124MB ± 0% +0.09% (p=0.000 n=30+30) Compiler 473MB ± 0% 473MB ± 0% +0.05% (p=0.000 n=30+30) Flate 26.5MB ± 0% 26.5MB ± 0% ~ (p=0.186 n=29+30) GoParser 32.3MB ± 0% 32.4MB ± 0% +0.07% (p=0.021 n=28+30) Reflect 84.4MB ± 0% 84.6MB ± 0% +0.21% (p=0.000 n=30+30) Tar 27.3MB ± 0% 27.3MB ± 0% +0.09% (p=0.010 n=30+28) XML 44.7MB ± 0% 44.7MB ± 0% +0.07% (p=0.002 n=30+30) name old allocs/op new allocs/op delta Template 401k ± 1% 400k ± 1% ~ (p=0.321 n=30+30) Unicode 331k ± 1% 331k ± 1% ~ (p=0.357 n=30+28) GoTypes 1.24M ± 0% 1.24M ± 1% -0.19% (p=0.001 n=30+30) Compiler 4.27M ± 0% 4.27M ± 0% -0.13% (p=0.000 n=30+30) Flate 252k ± 1% 251k ± 1% -0.30% (p=0.005 n=30+30) GoParser 325k ± 1% 325k ± 1% ~ (p=0.224 n=28+30) Reflect 1.06M ± 0% 1.05M ± 0% -0.34% (p=0.000 n=30+30) Tar 266k ± 1% 266k ± 1% ~ (p=0.333 n=30+30) XML 416k ± 1% 415k ± 1% ~ (p=0.144 n=30+29) Change-Id: I6ba67a9203516373062a2618122306da73333d98 Reviewed-on: https://go-review.googlesource.com/36211 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-01-14 23:43:26 -08:00
initialized bool
}
// livenessShouldTrack reports whether the liveness analysis
// should track the variable n.
// We don't care about variables that have no pointers,
// nor do we care about non-local variables,
// nor do we care about empty structs (handled by the pointer check),
// nor do we care about the fake PAUTOHEAP variables.
func livenessShouldTrack(n *Node) bool {
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
return n.Op == ONAME && (n.Class() == PAUTO || n.Class() == PPARAM || n.Class() == PPARAMOUT) && types.Haspointers(n.Type)
}
cmd/compile: fix liveness computation for heap-escaped parameters The liveness computation of parameters generally was never correct, but forcing all parameters to be live throughout the function covered up that problem. The new SSA back end is too clever: even though it currently keeps the parameter values live throughout the function, it may find optimizations that mean the current values are not written back to the original parameter stack slots immediately or ever (for example if a parameter is set to nil, SSA constant propagation may replace all later uses of the parameter with a constant nil, eliminating the need to write the nil value back to the stack slot), so the liveness code must now track the actual operations on the stack slots, exposing these problems. One small problem in the handling of arguments is that nodarg can return ONAME PPARAM nodes with adjusted offsets, so that there are actually multiple *Node pointers for the same parameter in the instruction stream. This might be possible to correct, but not in this CL. For now, we fix this by using n.Orig instead of n when considering PPARAM and PPARAMOUT nodes. The major problem in the handling of arguments is general confusion in the liveness code about the meaning of PPARAM|PHEAP and PPARAMOUT|PHEAP nodes, especially as contrasted with PAUTO|PHEAP. The difference between these two is that when a local variable "moves" to the heap, it's really just allocated there to start with; in contrast, when an argument moves to the heap, the actual data has to be copied there from the stack at the beginning of the function, and when a result "moves" to the heap the value in the heap has to be copied back to the stack when the function returns This general confusion is also present in the SSA back end. The PHEAP bit worked decently when I first introduced it 7 years ago (!) in 391425ae. The back end did nothing sophisticated, and in particular there was no analysis at all: no escape analysis, no liveness analysis, and certainly no SSA back end. But the complications caused in the various downstream consumers suggest that this should be a detail kept mainly in the front end. This CL therefore eliminates both the PHEAP bit and even the idea of "heap variables" from the back ends. First, it replaces the PPARAM|PHEAP, PPARAMOUT|PHEAP, and PAUTO|PHEAP variable classes with the single PAUTOHEAP, a pseudo-class indicating a variable maintained on the heap and available by indirecting a local variable kept on the stack (a plain PAUTO). Second, walkexpr replaces all references to PAUTOHEAP variables with indirections of the corresponding PAUTO variable. The back ends and the liveness code now just see plain indirected variables. This may actually produce better code, but the real goal here is to eliminate these little-used and somewhat suspect code paths in the back end analyses. The OPARAM node type goes away too. A followup CL will do the same to PPARAMREF. I'm not sure that the back ends (SSA in particular) are handling those right either, and with the framework established in this CL that change is trivial and the result clearly more correct. Fixes #15747. Change-Id: I2770b1ce3cbc93981bfc7166be66a9da12013d74 Reviewed-on: https://go-review.googlesource.com/23393 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-25 01:33:24 -04:00
cmd/compile: use a map to track liveness variable indices It is not safe to modify Node.Opt in the backend. Instead of using Node.Opt to store liveness variable indices, use a map. This simplifies the code and makes it much more clearly race-free. There are generally few such variables, so the maps are not a significant source of allocations; this also remove some allocations from putting int32s into interfaces. Because map lookups are more expensive than interface value extraction, reorder valueEffects to do the map lookup last. The only remaining use of Node.Opt is now in esc.go. Passes toolstash-check. Fixes #20144 name old alloc/op new alloc/op delta Template 37.8MB ± 0% 37.9MB ± 0% ~ (p=0.548 n=5+5) Unicode 28.9MB ± 0% 28.9MB ± 0% ~ (p=0.548 n=5+5) GoTypes 110MB ± 0% 110MB ± 0% +0.16% (p=0.008 n=5+5) Compiler 461MB ± 0% 462MB ± 0% +0.08% (p=0.008 n=5+5) SSA 1.11GB ± 0% 1.11GB ± 0% +0.11% (p=0.008 n=5+5) Flate 24.7MB ± 0% 24.7MB ± 0% ~ (p=0.690 n=5+5) GoParser 31.1MB ± 0% 31.1MB ± 0% ~ (p=0.841 n=5+5) Reflect 73.7MB ± 0% 73.8MB ± 0% +0.23% (p=0.008 n=5+5) Tar 25.8MB ± 0% 25.7MB ± 0% ~ (p=0.690 n=5+5) XML 41.2MB ± 0% 41.2MB ± 0% ~ (p=0.841 n=5+5) [Geo mean] 71.9MB 71.9MB +0.06% name old allocs/op new allocs/op delta Template 385k ± 0% 384k ± 0% ~ (p=0.548 n=5+5) Unicode 344k ± 0% 343k ± 1% ~ (p=0.421 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.690 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% ~ (p=0.095 n=5+5) SSA 9.86M ± 0% 9.84M ± 0% -0.19% (p=0.008 n=5+5) Flate 238k ± 0% 238k ± 0% ~ (p=1.000 n=5+5) GoParser 321k ± 0% 320k ± 0% ~ (p=0.310 n=5+5) Reflect 956k ± 0% 956k ± 0% ~ (p=1.000 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.056 n=5+5) XML 402k ± 1% 400k ± 1% -0.57% (p=0.032 n=5+5) [Geo mean] 740k 739k -0.19% Change-Id: Id5916c9def76add272e89c59fe10968f0a6bb01d Reviewed-on: https://go-review.googlesource.com/42135 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-27 16:27:47 -07:00
// getvariables returns the list of on-stack variables that we need to track
// and a map for looking up indices by *Node.
func getvariables(fn *Node) ([]*Node, map[*Node]int32) {
var vars []*Node
for _, n := range fn.Func.Dcl {
if livenessShouldTrack(n) {
vars = append(vars, n)
}
}
cmd/compile: use a map to track liveness variable indices It is not safe to modify Node.Opt in the backend. Instead of using Node.Opt to store liveness variable indices, use a map. This simplifies the code and makes it much more clearly race-free. There are generally few such variables, so the maps are not a significant source of allocations; this also remove some allocations from putting int32s into interfaces. Because map lookups are more expensive than interface value extraction, reorder valueEffects to do the map lookup last. The only remaining use of Node.Opt is now in esc.go. Passes toolstash-check. Fixes #20144 name old alloc/op new alloc/op delta Template 37.8MB ± 0% 37.9MB ± 0% ~ (p=0.548 n=5+5) Unicode 28.9MB ± 0% 28.9MB ± 0% ~ (p=0.548 n=5+5) GoTypes 110MB ± 0% 110MB ± 0% +0.16% (p=0.008 n=5+5) Compiler 461MB ± 0% 462MB ± 0% +0.08% (p=0.008 n=5+5) SSA 1.11GB ± 0% 1.11GB ± 0% +0.11% (p=0.008 n=5+5) Flate 24.7MB ± 0% 24.7MB ± 0% ~ (p=0.690 n=5+5) GoParser 31.1MB ± 0% 31.1MB ± 0% ~ (p=0.841 n=5+5) Reflect 73.7MB ± 0% 73.8MB ± 0% +0.23% (p=0.008 n=5+5) Tar 25.8MB ± 0% 25.7MB ± 0% ~ (p=0.690 n=5+5) XML 41.2MB ± 0% 41.2MB ± 0% ~ (p=0.841 n=5+5) [Geo mean] 71.9MB 71.9MB +0.06% name old allocs/op new allocs/op delta Template 385k ± 0% 384k ± 0% ~ (p=0.548 n=5+5) Unicode 344k ± 0% 343k ± 1% ~ (p=0.421 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.690 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% ~ (p=0.095 n=5+5) SSA 9.86M ± 0% 9.84M ± 0% -0.19% (p=0.008 n=5+5) Flate 238k ± 0% 238k ± 0% ~ (p=1.000 n=5+5) GoParser 321k ± 0% 320k ± 0% ~ (p=0.310 n=5+5) Reflect 956k ± 0% 956k ± 0% ~ (p=1.000 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.056 n=5+5) XML 402k ± 1% 400k ± 1% -0.57% (p=0.032 n=5+5) [Geo mean] 740k 739k -0.19% Change-Id: Id5916c9def76add272e89c59fe10968f0a6bb01d Reviewed-on: https://go-review.googlesource.com/42135 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-27 16:27:47 -07:00
idx := make(map[*Node]int32, len(vars))
for i, n := range vars {
idx[n] = int32(i)
}
return vars, idx
}
cmd/compile: make liveness more efficient When the number of variables in a function is very large, liveness analysis gets less efficient, since every bit vector is O(number of variables). Improve the situation by returning a sparse representation from progeffects. In all scenarios, progeffects either returns a slice that is shared function-wide, and which is usually small, or a slice that is guaranteed to have at most three values. Reduces compilation time for the code in #8225 Comment 1 by ~10%. Minor effects on regular packages (below). Passes toolstash -cmp. Updates #8225 name old time/op new time/op delta Template 215ms ± 2% 212ms ± 4% -1.31% (p=0.001 n=30+30) Unicode 98.3ms ± 3% 98.4ms ± 5% ~ (p=0.971 n=30+30) GoTypes 657ms ± 3% 651ms ± 2% -0.98% (p=0.001 n=30+27) Compiler 2.78s ± 2% 2.77s ± 2% -0.60% (p=0.006 n=30+30) Flate 130ms ± 4% 130ms ± 4% ~ (p=0.712 n=29+30) GoParser 159ms ± 5% 158ms ± 3% ~ (p=0.331 n=29+30) Reflect 406ms ± 3% 404ms ± 3% -0.69% (p=0.041 n=29+30) Tar 117ms ± 4% 117ms ± 3% ~ (p=0.886 n=30+29) XML 219ms ± 2% 217ms ± 2% ~ (p=0.091 n=29+24) name old user-ns/op new user-ns/op delta Template 272user-ms ± 3% 270user-ms ± 3% -1.03% (p=0.004 n=30+30) Unicode 138user-ms ± 2% 138user-ms ± 3% ~ (p=0.902 n=29+29) GoTypes 891user-ms ± 2% 883user-ms ± 2% -0.95% (p=0.000 n=29+29) Compiler 3.85user-s ± 2% 3.84user-s ± 2% ~ (p=0.236 n=30+30) Flate 167user-ms ± 2% 166user-ms ± 4% ~ (p=0.511 n=28+30) GoParser 211user-ms ± 4% 210user-ms ± 3% ~ (p=0.287 n=29+30) Reflect 539user-ms ± 3% 536user-ms ± 2% -0.59% (p=0.034 n=29+30) Tar 154user-ms ± 3% 155user-ms ± 4% ~ (p=0.786 n=30+30) XML 289user-ms ± 3% 288user-ms ± 4% ~ (p=0.249 n=30+26) name old alloc/op new alloc/op delta Template 40.7MB ± 0% 40.8MB ± 0% +0.09% (p=0.001 n=30+30) Unicode 30.8MB ± 0% 30.8MB ± 0% ~ (p=0.112 n=30+30) GoTypes 123MB ± 0% 124MB ± 0% +0.09% (p=0.000 n=30+30) Compiler 473MB ± 0% 473MB ± 0% +0.05% (p=0.000 n=30+30) Flate 26.5MB ± 0% 26.5MB ± 0% ~ (p=0.186 n=29+30) GoParser 32.3MB ± 0% 32.4MB ± 0% +0.07% (p=0.021 n=28+30) Reflect 84.4MB ± 0% 84.6MB ± 0% +0.21% (p=0.000 n=30+30) Tar 27.3MB ± 0% 27.3MB ± 0% +0.09% (p=0.010 n=30+28) XML 44.7MB ± 0% 44.7MB ± 0% +0.07% (p=0.002 n=30+30) name old allocs/op new allocs/op delta Template 401k ± 1% 400k ± 1% ~ (p=0.321 n=30+30) Unicode 331k ± 1% 331k ± 1% ~ (p=0.357 n=30+28) GoTypes 1.24M ± 0% 1.24M ± 1% -0.19% (p=0.001 n=30+30) Compiler 4.27M ± 0% 4.27M ± 0% -0.13% (p=0.000 n=30+30) Flate 252k ± 1% 251k ± 1% -0.30% (p=0.005 n=30+30) GoParser 325k ± 1% 325k ± 1% ~ (p=0.224 n=28+30) Reflect 1.06M ± 0% 1.05M ± 0% -0.34% (p=0.000 n=30+30) Tar 266k ± 1% 266k ± 1% ~ (p=0.333 n=30+30) XML 416k ± 1% 415k ± 1% ~ (p=0.144 n=30+29) Change-Id: I6ba67a9203516373062a2618122306da73333d98 Reviewed-on: https://go-review.googlesource.com/36211 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-01-14 23:43:26 -08:00
func (lv *Liveness) initcache() {
if lv.cache.initialized {
Fatalf("liveness cache initialized twice")
return
}
lv.cache.initialized = true
for i, node := range lv.vars {
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
switch node.Class() {
cmd/compile: make liveness more efficient When the number of variables in a function is very large, liveness analysis gets less efficient, since every bit vector is O(number of variables). Improve the situation by returning a sparse representation from progeffects. In all scenarios, progeffects either returns a slice that is shared function-wide, and which is usually small, or a slice that is guaranteed to have at most three values. Reduces compilation time for the code in #8225 Comment 1 by ~10%. Minor effects on regular packages (below). Passes toolstash -cmp. Updates #8225 name old time/op new time/op delta Template 215ms ± 2% 212ms ± 4% -1.31% (p=0.001 n=30+30) Unicode 98.3ms ± 3% 98.4ms ± 5% ~ (p=0.971 n=30+30) GoTypes 657ms ± 3% 651ms ± 2% -0.98% (p=0.001 n=30+27) Compiler 2.78s ± 2% 2.77s ± 2% -0.60% (p=0.006 n=30+30) Flate 130ms ± 4% 130ms ± 4% ~ (p=0.712 n=29+30) GoParser 159ms ± 5% 158ms ± 3% ~ (p=0.331 n=29+30) Reflect 406ms ± 3% 404ms ± 3% -0.69% (p=0.041 n=29+30) Tar 117ms ± 4% 117ms ± 3% ~ (p=0.886 n=30+29) XML 219ms ± 2% 217ms ± 2% ~ (p=0.091 n=29+24) name old user-ns/op new user-ns/op delta Template 272user-ms ± 3% 270user-ms ± 3% -1.03% (p=0.004 n=30+30) Unicode 138user-ms ± 2% 138user-ms ± 3% ~ (p=0.902 n=29+29) GoTypes 891user-ms ± 2% 883user-ms ± 2% -0.95% (p=0.000 n=29+29) Compiler 3.85user-s ± 2% 3.84user-s ± 2% ~ (p=0.236 n=30+30) Flate 167user-ms ± 2% 166user-ms ± 4% ~ (p=0.511 n=28+30) GoParser 211user-ms ± 4% 210user-ms ± 3% ~ (p=0.287 n=29+30) Reflect 539user-ms ± 3% 536user-ms ± 2% -0.59% (p=0.034 n=29+30) Tar 154user-ms ± 3% 155user-ms ± 4% ~ (p=0.786 n=30+30) XML 289user-ms ± 3% 288user-ms ± 4% ~ (p=0.249 n=30+26) name old alloc/op new alloc/op delta Template 40.7MB ± 0% 40.8MB ± 0% +0.09% (p=0.001 n=30+30) Unicode 30.8MB ± 0% 30.8MB ± 0% ~ (p=0.112 n=30+30) GoTypes 123MB ± 0% 124MB ± 0% +0.09% (p=0.000 n=30+30) Compiler 473MB ± 0% 473MB ± 0% +0.05% (p=0.000 n=30+30) Flate 26.5MB ± 0% 26.5MB ± 0% ~ (p=0.186 n=29+30) GoParser 32.3MB ± 0% 32.4MB ± 0% +0.07% (p=0.021 n=28+30) Reflect 84.4MB ± 0% 84.6MB ± 0% +0.21% (p=0.000 n=30+30) Tar 27.3MB ± 0% 27.3MB ± 0% +0.09% (p=0.010 n=30+28) XML 44.7MB ± 0% 44.7MB ± 0% +0.07% (p=0.002 n=30+30) name old allocs/op new allocs/op delta Template 401k ± 1% 400k ± 1% ~ (p=0.321 n=30+30) Unicode 331k ± 1% 331k ± 1% ~ (p=0.357 n=30+28) GoTypes 1.24M ± 0% 1.24M ± 1% -0.19% (p=0.001 n=30+30) Compiler 4.27M ± 0% 4.27M ± 0% -0.13% (p=0.000 n=30+30) Flate 252k ± 1% 251k ± 1% -0.30% (p=0.005 n=30+30) GoParser 325k ± 1% 325k ± 1% ~ (p=0.224 n=28+30) Reflect 1.06M ± 0% 1.05M ± 0% -0.34% (p=0.000 n=30+30) Tar 266k ± 1% 266k ± 1% ~ (p=0.333 n=30+30) XML 416k ± 1% 415k ± 1% ~ (p=0.144 n=30+29) Change-Id: I6ba67a9203516373062a2618122306da73333d98 Reviewed-on: https://go-review.googlesource.com/36211 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-01-14 23:43:26 -08:00
case PPARAM:
// A return instruction with a p.to is a tail return, which brings
// the stack pointer back up (if it ever went down) and then jumps
// to a new function entirely. That form of instruction must read
// all the parameters for correctness, and similarly it must not
// read the out arguments - they won't be set until the new
// function runs.
cmd/compile: make liveness more efficient When the number of variables in a function is very large, liveness analysis gets less efficient, since every bit vector is O(number of variables). Improve the situation by returning a sparse representation from progeffects. In all scenarios, progeffects either returns a slice that is shared function-wide, and which is usually small, or a slice that is guaranteed to have at most three values. Reduces compilation time for the code in #8225 Comment 1 by ~10%. Minor effects on regular packages (below). Passes toolstash -cmp. Updates #8225 name old time/op new time/op delta Template 215ms ± 2% 212ms ± 4% -1.31% (p=0.001 n=30+30) Unicode 98.3ms ± 3% 98.4ms ± 5% ~ (p=0.971 n=30+30) GoTypes 657ms ± 3% 651ms ± 2% -0.98% (p=0.001 n=30+27) Compiler 2.78s ± 2% 2.77s ± 2% -0.60% (p=0.006 n=30+30) Flate 130ms ± 4% 130ms ± 4% ~ (p=0.712 n=29+30) GoParser 159ms ± 5% 158ms ± 3% ~ (p=0.331 n=29+30) Reflect 406ms ± 3% 404ms ± 3% -0.69% (p=0.041 n=29+30) Tar 117ms ± 4% 117ms ± 3% ~ (p=0.886 n=30+29) XML 219ms ± 2% 217ms ± 2% ~ (p=0.091 n=29+24) name old user-ns/op new user-ns/op delta Template 272user-ms ± 3% 270user-ms ± 3% -1.03% (p=0.004 n=30+30) Unicode 138user-ms ± 2% 138user-ms ± 3% ~ (p=0.902 n=29+29) GoTypes 891user-ms ± 2% 883user-ms ± 2% -0.95% (p=0.000 n=29+29) Compiler 3.85user-s ± 2% 3.84user-s ± 2% ~ (p=0.236 n=30+30) Flate 167user-ms ± 2% 166user-ms ± 4% ~ (p=0.511 n=28+30) GoParser 211user-ms ± 4% 210user-ms ± 3% ~ (p=0.287 n=29+30) Reflect 539user-ms ± 3% 536user-ms ± 2% -0.59% (p=0.034 n=29+30) Tar 154user-ms ± 3% 155user-ms ± 4% ~ (p=0.786 n=30+30) XML 289user-ms ± 3% 288user-ms ± 4% ~ (p=0.249 n=30+26) name old alloc/op new alloc/op delta Template 40.7MB ± 0% 40.8MB ± 0% +0.09% (p=0.001 n=30+30) Unicode 30.8MB ± 0% 30.8MB ± 0% ~ (p=0.112 n=30+30) GoTypes 123MB ± 0% 124MB ± 0% +0.09% (p=0.000 n=30+30) Compiler 473MB ± 0% 473MB ± 0% +0.05% (p=0.000 n=30+30) Flate 26.5MB ± 0% 26.5MB ± 0% ~ (p=0.186 n=29+30) GoParser 32.3MB ± 0% 32.4MB ± 0% +0.07% (p=0.021 n=28+30) Reflect 84.4MB ± 0% 84.6MB ± 0% +0.21% (p=0.000 n=30+30) Tar 27.3MB ± 0% 27.3MB ± 0% +0.09% (p=0.010 n=30+28) XML 44.7MB ± 0% 44.7MB ± 0% +0.07% (p=0.002 n=30+30) name old allocs/op new allocs/op delta Template 401k ± 1% 400k ± 1% ~ (p=0.321 n=30+30) Unicode 331k ± 1% 331k ± 1% ~ (p=0.357 n=30+28) GoTypes 1.24M ± 0% 1.24M ± 1% -0.19% (p=0.001 n=30+30) Compiler 4.27M ± 0% 4.27M ± 0% -0.13% (p=0.000 n=30+30) Flate 252k ± 1% 251k ± 1% -0.30% (p=0.005 n=30+30) GoParser 325k ± 1% 325k ± 1% ~ (p=0.224 n=28+30) Reflect 1.06M ± 0% 1.05M ± 0% -0.34% (p=0.000 n=30+30) Tar 266k ± 1% 266k ± 1% ~ (p=0.333 n=30+30) XML 416k ± 1% 415k ± 1% ~ (p=0.144 n=30+29) Change-Id: I6ba67a9203516373062a2618122306da73333d98 Reviewed-on: https://go-review.googlesource.com/36211 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-01-14 23:43:26 -08:00
lv.cache.tailuevar = append(lv.cache.tailuevar, int32(i))
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
if node.Addrtaken() {
cmd/compile: make liveness more efficient When the number of variables in a function is very large, liveness analysis gets less efficient, since every bit vector is O(number of variables). Improve the situation by returning a sparse representation from progeffects. In all scenarios, progeffects either returns a slice that is shared function-wide, and which is usually small, or a slice that is guaranteed to have at most three values. Reduces compilation time for the code in #8225 Comment 1 by ~10%. Minor effects on regular packages (below). Passes toolstash -cmp. Updates #8225 name old time/op new time/op delta Template 215ms ± 2% 212ms ± 4% -1.31% (p=0.001 n=30+30) Unicode 98.3ms ± 3% 98.4ms ± 5% ~ (p=0.971 n=30+30) GoTypes 657ms ± 3% 651ms ± 2% -0.98% (p=0.001 n=30+27) Compiler 2.78s ± 2% 2.77s ± 2% -0.60% (p=0.006 n=30+30) Flate 130ms ± 4% 130ms ± 4% ~ (p=0.712 n=29+30) GoParser 159ms ± 5% 158ms ± 3% ~ (p=0.331 n=29+30) Reflect 406ms ± 3% 404ms ± 3% -0.69% (p=0.041 n=29+30) Tar 117ms ± 4% 117ms ± 3% ~ (p=0.886 n=30+29) XML 219ms ± 2% 217ms ± 2% ~ (p=0.091 n=29+24) name old user-ns/op new user-ns/op delta Template 272user-ms ± 3% 270user-ms ± 3% -1.03% (p=0.004 n=30+30) Unicode 138user-ms ± 2% 138user-ms ± 3% ~ (p=0.902 n=29+29) GoTypes 891user-ms ± 2% 883user-ms ± 2% -0.95% (p=0.000 n=29+29) Compiler 3.85user-s ± 2% 3.84user-s ± 2% ~ (p=0.236 n=30+30) Flate 167user-ms ± 2% 166user-ms ± 4% ~ (p=0.511 n=28+30) GoParser 211user-ms ± 4% 210user-ms ± 3% ~ (p=0.287 n=29+30) Reflect 539user-ms ± 3% 536user-ms ± 2% -0.59% (p=0.034 n=29+30) Tar 154user-ms ± 3% 155user-ms ± 4% ~ (p=0.786 n=30+30) XML 289user-ms ± 3% 288user-ms ± 4% ~ (p=0.249 n=30+26) name old alloc/op new alloc/op delta Template 40.7MB ± 0% 40.8MB ± 0% +0.09% (p=0.001 n=30+30) Unicode 30.8MB ± 0% 30.8MB ± 0% ~ (p=0.112 n=30+30) GoTypes 123MB ± 0% 124MB ± 0% +0.09% (p=0.000 n=30+30) Compiler 473MB ± 0% 473MB ± 0% +0.05% (p=0.000 n=30+30) Flate 26.5MB ± 0% 26.5MB ± 0% ~ (p=0.186 n=29+30) GoParser 32.3MB ± 0% 32.4MB ± 0% +0.07% (p=0.021 n=28+30) Reflect 84.4MB ± 0% 84.6MB ± 0% +0.21% (p=0.000 n=30+30) Tar 27.3MB ± 0% 27.3MB ± 0% +0.09% (p=0.010 n=30+28) XML 44.7MB ± 0% 44.7MB ± 0% +0.07% (p=0.002 n=30+30) name old allocs/op new allocs/op delta Template 401k ± 1% 400k ± 1% ~ (p=0.321 n=30+30) Unicode 331k ± 1% 331k ± 1% ~ (p=0.357 n=30+28) GoTypes 1.24M ± 0% 1.24M ± 1% -0.19% (p=0.001 n=30+30) Compiler 4.27M ± 0% 4.27M ± 0% -0.13% (p=0.000 n=30+30) Flate 252k ± 1% 251k ± 1% -0.30% (p=0.005 n=30+30) GoParser 325k ± 1% 325k ± 1% ~ (p=0.224 n=28+30) Reflect 1.06M ± 0% 1.05M ± 0% -0.34% (p=0.000 n=30+30) Tar 266k ± 1% 266k ± 1% ~ (p=0.333 n=30+30) XML 416k ± 1% 415k ± 1% ~ (p=0.144 n=30+29) Change-Id: I6ba67a9203516373062a2618122306da73333d98 Reviewed-on: https://go-review.googlesource.com/36211 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-01-14 23:43:26 -08:00
lv.cache.textavarinit = append(lv.cache.textavarinit, int32(i))
}
case PPARAMOUT:
// If the result had its address taken, it is being tracked
// by the avarinit code, which does not use uevar.
// If we added it to uevar too, we'd not see any kill
// and decide that the variable was live entry, which it is not.
// So only use uevar in the non-addrtaken case.
// The p.to.type == obj.TYPE_NONE limits the bvset to
// non-tail-call return instructions; see note below for details.
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
if !node.Addrtaken() {
cmd/compile: make liveness more efficient When the number of variables in a function is very large, liveness analysis gets less efficient, since every bit vector is O(number of variables). Improve the situation by returning a sparse representation from progeffects. In all scenarios, progeffects either returns a slice that is shared function-wide, and which is usually small, or a slice that is guaranteed to have at most three values. Reduces compilation time for the code in #8225 Comment 1 by ~10%. Minor effects on regular packages (below). Passes toolstash -cmp. Updates #8225 name old time/op new time/op delta Template 215ms ± 2% 212ms ± 4% -1.31% (p=0.001 n=30+30) Unicode 98.3ms ± 3% 98.4ms ± 5% ~ (p=0.971 n=30+30) GoTypes 657ms ± 3% 651ms ± 2% -0.98% (p=0.001 n=30+27) Compiler 2.78s ± 2% 2.77s ± 2% -0.60% (p=0.006 n=30+30) Flate 130ms ± 4% 130ms ± 4% ~ (p=0.712 n=29+30) GoParser 159ms ± 5% 158ms ± 3% ~ (p=0.331 n=29+30) Reflect 406ms ± 3% 404ms ± 3% -0.69% (p=0.041 n=29+30) Tar 117ms ± 4% 117ms ± 3% ~ (p=0.886 n=30+29) XML 219ms ± 2% 217ms ± 2% ~ (p=0.091 n=29+24) name old user-ns/op new user-ns/op delta Template 272user-ms ± 3% 270user-ms ± 3% -1.03% (p=0.004 n=30+30) Unicode 138user-ms ± 2% 138user-ms ± 3% ~ (p=0.902 n=29+29) GoTypes 891user-ms ± 2% 883user-ms ± 2% -0.95% (p=0.000 n=29+29) Compiler 3.85user-s ± 2% 3.84user-s ± 2% ~ (p=0.236 n=30+30) Flate 167user-ms ± 2% 166user-ms ± 4% ~ (p=0.511 n=28+30) GoParser 211user-ms ± 4% 210user-ms ± 3% ~ (p=0.287 n=29+30) Reflect 539user-ms ± 3% 536user-ms ± 2% -0.59% (p=0.034 n=29+30) Tar 154user-ms ± 3% 155user-ms ± 4% ~ (p=0.786 n=30+30) XML 289user-ms ± 3% 288user-ms ± 4% ~ (p=0.249 n=30+26) name old alloc/op new alloc/op delta Template 40.7MB ± 0% 40.8MB ± 0% +0.09% (p=0.001 n=30+30) Unicode 30.8MB ± 0% 30.8MB ± 0% ~ (p=0.112 n=30+30) GoTypes 123MB ± 0% 124MB ± 0% +0.09% (p=0.000 n=30+30) Compiler 473MB ± 0% 473MB ± 0% +0.05% (p=0.000 n=30+30) Flate 26.5MB ± 0% 26.5MB ± 0% ~ (p=0.186 n=29+30) GoParser 32.3MB ± 0% 32.4MB ± 0% +0.07% (p=0.021 n=28+30) Reflect 84.4MB ± 0% 84.6MB ± 0% +0.21% (p=0.000 n=30+30) Tar 27.3MB ± 0% 27.3MB ± 0% +0.09% (p=0.010 n=30+28) XML 44.7MB ± 0% 44.7MB ± 0% +0.07% (p=0.002 n=30+30) name old allocs/op new allocs/op delta Template 401k ± 1% 400k ± 1% ~ (p=0.321 n=30+30) Unicode 331k ± 1% 331k ± 1% ~ (p=0.357 n=30+28) GoTypes 1.24M ± 0% 1.24M ± 1% -0.19% (p=0.001 n=30+30) Compiler 4.27M ± 0% 4.27M ± 0% -0.13% (p=0.000 n=30+30) Flate 252k ± 1% 251k ± 1% -0.30% (p=0.005 n=30+30) GoParser 325k ± 1% 325k ± 1% ~ (p=0.224 n=28+30) Reflect 1.06M ± 0% 1.05M ± 0% -0.34% (p=0.000 n=30+30) Tar 266k ± 1% 266k ± 1% ~ (p=0.333 n=30+30) XML 416k ± 1% 415k ± 1% ~ (p=0.144 n=30+29) Change-Id: I6ba67a9203516373062a2618122306da73333d98 Reviewed-on: https://go-review.googlesource.com/36211 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-01-14 23:43:26 -08:00
lv.cache.retuevar = append(lv.cache.retuevar, int32(i))
}
}
}
}
// A liveEffect is a set of flags that describe an instruction's
// liveness effects on a variable.
//
// The possible flags are:
// uevar - used by the instruction
// varkill - killed by the instruction
// for variables without address taken, means variable was set
// for variables with address taken, means variable was marked dead
// avarinit - initialized or referred to by the instruction,
// only for variables with address taken but not escaping to heap
//
// The avarinit output serves as a signal that the data has been
// initialized, because any use of a variable must come after its
// initialization.
type liveEffect int
const (
uevar liveEffect = 1 << iota
varkill
avarinit
)
// valueEffects returns the index of a variable in lv.vars and the
// liveness effects v has on that variable.
// If v does not affect any tracked variables, it returns -1, 0.
cmd/compile: use a map to track liveness variable indices It is not safe to modify Node.Opt in the backend. Instead of using Node.Opt to store liveness variable indices, use a map. This simplifies the code and makes it much more clearly race-free. There are generally few such variables, so the maps are not a significant source of allocations; this also remove some allocations from putting int32s into interfaces. Because map lookups are more expensive than interface value extraction, reorder valueEffects to do the map lookup last. The only remaining use of Node.Opt is now in esc.go. Passes toolstash-check. Fixes #20144 name old alloc/op new alloc/op delta Template 37.8MB ± 0% 37.9MB ± 0% ~ (p=0.548 n=5+5) Unicode 28.9MB ± 0% 28.9MB ± 0% ~ (p=0.548 n=5+5) GoTypes 110MB ± 0% 110MB ± 0% +0.16% (p=0.008 n=5+5) Compiler 461MB ± 0% 462MB ± 0% +0.08% (p=0.008 n=5+5) SSA 1.11GB ± 0% 1.11GB ± 0% +0.11% (p=0.008 n=5+5) Flate 24.7MB ± 0% 24.7MB ± 0% ~ (p=0.690 n=5+5) GoParser 31.1MB ± 0% 31.1MB ± 0% ~ (p=0.841 n=5+5) Reflect 73.7MB ± 0% 73.8MB ± 0% +0.23% (p=0.008 n=5+5) Tar 25.8MB ± 0% 25.7MB ± 0% ~ (p=0.690 n=5+5) XML 41.2MB ± 0% 41.2MB ± 0% ~ (p=0.841 n=5+5) [Geo mean] 71.9MB 71.9MB +0.06% name old allocs/op new allocs/op delta Template 385k ± 0% 384k ± 0% ~ (p=0.548 n=5+5) Unicode 344k ± 0% 343k ± 1% ~ (p=0.421 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.690 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% ~ (p=0.095 n=5+5) SSA 9.86M ± 0% 9.84M ± 0% -0.19% (p=0.008 n=5+5) Flate 238k ± 0% 238k ± 0% ~ (p=1.000 n=5+5) GoParser 321k ± 0% 320k ± 0% ~ (p=0.310 n=5+5) Reflect 956k ± 0% 956k ± 0% ~ (p=1.000 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.056 n=5+5) XML 402k ± 1% 400k ± 1% -0.57% (p=0.032 n=5+5) [Geo mean] 740k 739k -0.19% Change-Id: Id5916c9def76add272e89c59fe10968f0a6bb01d Reviewed-on: https://go-review.googlesource.com/42135 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-27 16:27:47 -07:00
func (lv *Liveness) valueEffects(v *ssa.Value) (int32, liveEffect) {
n, e := affectedNode(v)
cmd/compile: use a map to track liveness variable indices It is not safe to modify Node.Opt in the backend. Instead of using Node.Opt to store liveness variable indices, use a map. This simplifies the code and makes it much more clearly race-free. There are generally few such variables, so the maps are not a significant source of allocations; this also remove some allocations from putting int32s into interfaces. Because map lookups are more expensive than interface value extraction, reorder valueEffects to do the map lookup last. The only remaining use of Node.Opt is now in esc.go. Passes toolstash-check. Fixes #20144 name old alloc/op new alloc/op delta Template 37.8MB ± 0% 37.9MB ± 0% ~ (p=0.548 n=5+5) Unicode 28.9MB ± 0% 28.9MB ± 0% ~ (p=0.548 n=5+5) GoTypes 110MB ± 0% 110MB ± 0% +0.16% (p=0.008 n=5+5) Compiler 461MB ± 0% 462MB ± 0% +0.08% (p=0.008 n=5+5) SSA 1.11GB ± 0% 1.11GB ± 0% +0.11% (p=0.008 n=5+5) Flate 24.7MB ± 0% 24.7MB ± 0% ~ (p=0.690 n=5+5) GoParser 31.1MB ± 0% 31.1MB ± 0% ~ (p=0.841 n=5+5) Reflect 73.7MB ± 0% 73.8MB ± 0% +0.23% (p=0.008 n=5+5) Tar 25.8MB ± 0% 25.7MB ± 0% ~ (p=0.690 n=5+5) XML 41.2MB ± 0% 41.2MB ± 0% ~ (p=0.841 n=5+5) [Geo mean] 71.9MB 71.9MB +0.06% name old allocs/op new allocs/op delta Template 385k ± 0% 384k ± 0% ~ (p=0.548 n=5+5) Unicode 344k ± 0% 343k ± 1% ~ (p=0.421 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.690 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% ~ (p=0.095 n=5+5) SSA 9.86M ± 0% 9.84M ± 0% -0.19% (p=0.008 n=5+5) Flate 238k ± 0% 238k ± 0% ~ (p=1.000 n=5+5) GoParser 321k ± 0% 320k ± 0% ~ (p=0.310 n=5+5) Reflect 956k ± 0% 956k ± 0% ~ (p=1.000 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.056 n=5+5) XML 402k ± 1% 400k ± 1% -0.57% (p=0.032 n=5+5) [Geo mean] 740k 739k -0.19% Change-Id: Id5916c9def76add272e89c59fe10968f0a6bb01d Reviewed-on: https://go-review.googlesource.com/42135 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-27 16:27:47 -07:00
if e == 0 || n == nil || n.Op != ONAME { // cheapest checks first
return -1, 0
}
// AllocFrame has dropped unused variables from
// lv.fn.Func.Dcl, but they might still be referenced by
// OpVarFoo pseudo-ops. Ignore them to prevent "lost track of
// variable" ICEs (issue 19632).
switch v.Op {
case ssa.OpVarDef, ssa.OpVarKill, ssa.OpVarLive, ssa.OpKeepAlive:
if !n.Name.Used() {
return -1, 0
}
}
cmd/compile: use a map to track liveness variable indices It is not safe to modify Node.Opt in the backend. Instead of using Node.Opt to store liveness variable indices, use a map. This simplifies the code and makes it much more clearly race-free. There are generally few such variables, so the maps are not a significant source of allocations; this also remove some allocations from putting int32s into interfaces. Because map lookups are more expensive than interface value extraction, reorder valueEffects to do the map lookup last. The only remaining use of Node.Opt is now in esc.go. Passes toolstash-check. Fixes #20144 name old alloc/op new alloc/op delta Template 37.8MB ± 0% 37.9MB ± 0% ~ (p=0.548 n=5+5) Unicode 28.9MB ± 0% 28.9MB ± 0% ~ (p=0.548 n=5+5) GoTypes 110MB ± 0% 110MB ± 0% +0.16% (p=0.008 n=5+5) Compiler 461MB ± 0% 462MB ± 0% +0.08% (p=0.008 n=5+5) SSA 1.11GB ± 0% 1.11GB ± 0% +0.11% (p=0.008 n=5+5) Flate 24.7MB ± 0% 24.7MB ± 0% ~ (p=0.690 n=5+5) GoParser 31.1MB ± 0% 31.1MB ± 0% ~ (p=0.841 n=5+5) Reflect 73.7MB ± 0% 73.8MB ± 0% +0.23% (p=0.008 n=5+5) Tar 25.8MB ± 0% 25.7MB ± 0% ~ (p=0.690 n=5+5) XML 41.2MB ± 0% 41.2MB ± 0% ~ (p=0.841 n=5+5) [Geo mean] 71.9MB 71.9MB +0.06% name old allocs/op new allocs/op delta Template 385k ± 0% 384k ± 0% ~ (p=0.548 n=5+5) Unicode 344k ± 0% 343k ± 1% ~ (p=0.421 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.690 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% ~ (p=0.095 n=5+5) SSA 9.86M ± 0% 9.84M ± 0% -0.19% (p=0.008 n=5+5) Flate 238k ± 0% 238k ± 0% ~ (p=1.000 n=5+5) GoParser 321k ± 0% 320k ± 0% ~ (p=0.310 n=5+5) Reflect 956k ± 0% 956k ± 0% ~ (p=1.000 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.056 n=5+5) XML 402k ± 1% 400k ± 1% -0.57% (p=0.032 n=5+5) [Geo mean] 740k 739k -0.19% Change-Id: Id5916c9def76add272e89c59fe10968f0a6bb01d Reviewed-on: https://go-review.googlesource.com/42135 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-27 16:27:47 -07:00
var effect liveEffect
if n.Addrtaken() {
if v.Op != ssa.OpVarKill {
effect |= avarinit
}
if v.Op == ssa.OpVarDef || v.Op == ssa.OpVarKill {
effect |= varkill
}
} else {
// Read is a read, obviously.
// Addr by itself is also implicitly a read.
//
// Addr|Write means that the address is being taken
// but only so that the instruction can write to the value.
// It is not a read.
if e&ssa.SymRead != 0 || e&(ssa.SymAddr|ssa.SymWrite) == ssa.SymAddr {
effect |= uevar
}
if e&ssa.SymWrite != 0 && (!isfat(n.Type) || v.Op == ssa.OpVarDef) {
effect |= varkill
}
}
cmd/compile: use a map to track liveness variable indices It is not safe to modify Node.Opt in the backend. Instead of using Node.Opt to store liveness variable indices, use a map. This simplifies the code and makes it much more clearly race-free. There are generally few such variables, so the maps are not a significant source of allocations; this also remove some allocations from putting int32s into interfaces. Because map lookups are more expensive than interface value extraction, reorder valueEffects to do the map lookup last. The only remaining use of Node.Opt is now in esc.go. Passes toolstash-check. Fixes #20144 name old alloc/op new alloc/op delta Template 37.8MB ± 0% 37.9MB ± 0% ~ (p=0.548 n=5+5) Unicode 28.9MB ± 0% 28.9MB ± 0% ~ (p=0.548 n=5+5) GoTypes 110MB ± 0% 110MB ± 0% +0.16% (p=0.008 n=5+5) Compiler 461MB ± 0% 462MB ± 0% +0.08% (p=0.008 n=5+5) SSA 1.11GB ± 0% 1.11GB ± 0% +0.11% (p=0.008 n=5+5) Flate 24.7MB ± 0% 24.7MB ± 0% ~ (p=0.690 n=5+5) GoParser 31.1MB ± 0% 31.1MB ± 0% ~ (p=0.841 n=5+5) Reflect 73.7MB ± 0% 73.8MB ± 0% +0.23% (p=0.008 n=5+5) Tar 25.8MB ± 0% 25.7MB ± 0% ~ (p=0.690 n=5+5) XML 41.2MB ± 0% 41.2MB ± 0% ~ (p=0.841 n=5+5) [Geo mean] 71.9MB 71.9MB +0.06% name old allocs/op new allocs/op delta Template 385k ± 0% 384k ± 0% ~ (p=0.548 n=5+5) Unicode 344k ± 0% 343k ± 1% ~ (p=0.421 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.690 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% ~ (p=0.095 n=5+5) SSA 9.86M ± 0% 9.84M ± 0% -0.19% (p=0.008 n=5+5) Flate 238k ± 0% 238k ± 0% ~ (p=1.000 n=5+5) GoParser 321k ± 0% 320k ± 0% ~ (p=0.310 n=5+5) Reflect 956k ± 0% 956k ± 0% ~ (p=1.000 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.056 n=5+5) XML 402k ± 1% 400k ± 1% -0.57% (p=0.032 n=5+5) [Geo mean] 740k 739k -0.19% Change-Id: Id5916c9def76add272e89c59fe10968f0a6bb01d Reviewed-on: https://go-review.googlesource.com/42135 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-27 16:27:47 -07:00
if effect == 0 {
return -1, 0
}
if pos, ok := lv.idx[n]; ok {
return pos, effect
}
return -1, 0
}
// affectedNode returns the *Node affected by v
func affectedNode(v *ssa.Value) (*Node, ssa.SymEffect) {
// Special cases.
switch v.Op {
case ssa.OpLoadReg:
n, _ := AutoVar(v.Args[0])
return n, ssa.SymRead
case ssa.OpStoreReg:
n, _ := AutoVar(v)
return n, ssa.SymWrite
case ssa.OpVarLive:
return v.Aux.(*Node), ssa.SymRead
case ssa.OpVarDef, ssa.OpVarKill:
return v.Aux.(*Node), ssa.SymWrite
case ssa.OpKeepAlive:
n, _ := AutoVar(v.Args[0])
return n, ssa.SymRead
}
e := v.Op.SymEffect()
if e == 0 {
return nil, 0
}
switch a := v.Aux.(type) {
case nil, *obj.LSym:
// ok, but no node
return nil, e
case *Node:
return a, e
default:
Fatalf("weird aux: %s", v.LongString())
return nil, e
}
}
// Constructs a new liveness structure used to hold the global state of the
// liveness computation. The cfg argument is a slice of *BasicBlocks and the
// vars argument is a slice of *Nodes.
cmd/compile: use a map to track liveness variable indices It is not safe to modify Node.Opt in the backend. Instead of using Node.Opt to store liveness variable indices, use a map. This simplifies the code and makes it much more clearly race-free. There are generally few such variables, so the maps are not a significant source of allocations; this also remove some allocations from putting int32s into interfaces. Because map lookups are more expensive than interface value extraction, reorder valueEffects to do the map lookup last. The only remaining use of Node.Opt is now in esc.go. Passes toolstash-check. Fixes #20144 name old alloc/op new alloc/op delta Template 37.8MB ± 0% 37.9MB ± 0% ~ (p=0.548 n=5+5) Unicode 28.9MB ± 0% 28.9MB ± 0% ~ (p=0.548 n=5+5) GoTypes 110MB ± 0% 110MB ± 0% +0.16% (p=0.008 n=5+5) Compiler 461MB ± 0% 462MB ± 0% +0.08% (p=0.008 n=5+5) SSA 1.11GB ± 0% 1.11GB ± 0% +0.11% (p=0.008 n=5+5) Flate 24.7MB ± 0% 24.7MB ± 0% ~ (p=0.690 n=5+5) GoParser 31.1MB ± 0% 31.1MB ± 0% ~ (p=0.841 n=5+5) Reflect 73.7MB ± 0% 73.8MB ± 0% +0.23% (p=0.008 n=5+5) Tar 25.8MB ± 0% 25.7MB ± 0% ~ (p=0.690 n=5+5) XML 41.2MB ± 0% 41.2MB ± 0% ~ (p=0.841 n=5+5) [Geo mean] 71.9MB 71.9MB +0.06% name old allocs/op new allocs/op delta Template 385k ± 0% 384k ± 0% ~ (p=0.548 n=5+5) Unicode 344k ± 0% 343k ± 1% ~ (p=0.421 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.690 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% ~ (p=0.095 n=5+5) SSA 9.86M ± 0% 9.84M ± 0% -0.19% (p=0.008 n=5+5) Flate 238k ± 0% 238k ± 0% ~ (p=1.000 n=5+5) GoParser 321k ± 0% 320k ± 0% ~ (p=0.310 n=5+5) Reflect 956k ± 0% 956k ± 0% ~ (p=1.000 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.056 n=5+5) XML 402k ± 1% 400k ± 1% -0.57% (p=0.032 n=5+5) [Geo mean] 740k 739k -0.19% Change-Id: Id5916c9def76add272e89c59fe10968f0a6bb01d Reviewed-on: https://go-review.googlesource.com/42135 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-27 16:27:47 -07:00
func newliveness(fn *Node, f *ssa.Func, vars []*Node, idx map[*Node]int32, stkptrsize int64) *Liveness {
lv := &Liveness{
fn: fn,
f: f,
vars: vars,
cmd/compile: use a map to track liveness variable indices It is not safe to modify Node.Opt in the backend. Instead of using Node.Opt to store liveness variable indices, use a map. This simplifies the code and makes it much more clearly race-free. There are generally few such variables, so the maps are not a significant source of allocations; this also remove some allocations from putting int32s into interfaces. Because map lookups are more expensive than interface value extraction, reorder valueEffects to do the map lookup last. The only remaining use of Node.Opt is now in esc.go. Passes toolstash-check. Fixes #20144 name old alloc/op new alloc/op delta Template 37.8MB ± 0% 37.9MB ± 0% ~ (p=0.548 n=5+5) Unicode 28.9MB ± 0% 28.9MB ± 0% ~ (p=0.548 n=5+5) GoTypes 110MB ± 0% 110MB ± 0% +0.16% (p=0.008 n=5+5) Compiler 461MB ± 0% 462MB ± 0% +0.08% (p=0.008 n=5+5) SSA 1.11GB ± 0% 1.11GB ± 0% +0.11% (p=0.008 n=5+5) Flate 24.7MB ± 0% 24.7MB ± 0% ~ (p=0.690 n=5+5) GoParser 31.1MB ± 0% 31.1MB ± 0% ~ (p=0.841 n=5+5) Reflect 73.7MB ± 0% 73.8MB ± 0% +0.23% (p=0.008 n=5+5) Tar 25.8MB ± 0% 25.7MB ± 0% ~ (p=0.690 n=5+5) XML 41.2MB ± 0% 41.2MB ± 0% ~ (p=0.841 n=5+5) [Geo mean] 71.9MB 71.9MB +0.06% name old allocs/op new allocs/op delta Template 385k ± 0% 384k ± 0% ~ (p=0.548 n=5+5) Unicode 344k ± 0% 343k ± 1% ~ (p=0.421 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.690 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% ~ (p=0.095 n=5+5) SSA 9.86M ± 0% 9.84M ± 0% -0.19% (p=0.008 n=5+5) Flate 238k ± 0% 238k ± 0% ~ (p=1.000 n=5+5) GoParser 321k ± 0% 320k ± 0% ~ (p=0.310 n=5+5) Reflect 956k ± 0% 956k ± 0% ~ (p=1.000 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.056 n=5+5) XML 402k ± 1% 400k ± 1% -0.57% (p=0.032 n=5+5) [Geo mean] 740k 739k -0.19% Change-Id: Id5916c9def76add272e89c59fe10968f0a6bb01d Reviewed-on: https://go-review.googlesource.com/42135 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-27 16:27:47 -07:00
idx: idx,
stkptrsize: stkptrsize,
be: make([]BlockEffects, f.NumBlocks()),
}
nblocks := int32(len(f.Blocks))
nvars := int32(len(vars))
bulk := bvbulkalloc(nvars, nblocks*7)
for _, b := range f.Blocks {
be := lv.blockEffects(b)
be.uevar = bulk.next()
be.varkill = bulk.next()
be.livein = bulk.next()
be.liveout = bulk.next()
be.avarinit = bulk.next()
be.avarinitany = bulk.next()
be.avarinitall = bulk.next()
}
lv.markUnsafePoints()
return lv
}
func (lv *Liveness) blockEffects(b *ssa.Block) *BlockEffects {
return &lv.be[b.ID]
}
// NOTE: The bitmap for a specific type t could be cached in t after
// the first run and then simply copied into bv at the correct offset
// on future calls with the same type t.
func onebitwalktype1(t *types.Type, off int64, bv bvec) {
if t.Align > 0 && off&int64(t.Align-1) != 0 {
cmd/compile: shrink liveness maps The GC maps don't care about trailing non-pointers in args. Work harder to eliminate them. This should provide a slight speedup to everything that reads these maps, mainly GC and stack copying. The non-ptr-y runtime benchmarks happen to go from having a non-empty args map to an empty args map, so they have a significant speedup. name old time/op new time/op delta StackCopyPtr-8 80.2ms ± 4% 79.7ms ± 2% -0.63% (p=0.001 n=94+91) StackCopy-8 63.3ms ± 3% 59.2ms ± 3% -6.45% (p=0.000 n=98+97) StackCopyNoCache-8 107ms ± 3% 98ms ± 3% -8.00% (p=0.000 n=95+88) It also shrinks object files a tiny bit: name old object-bytes new object-bytes delta Template 476kB ± 0% 476kB ± 0% -0.03% (p=0.008 n=5+5) Unicode 218kB ± 0% 218kB ± 0% -0.09% (p=0.008 n=5+5) GoTypes 1.58MB ± 0% 1.58MB ± 0% -0.03% (p=0.008 n=5+5) Compiler 6.25MB ± 0% 6.24MB ± 0% -0.06% (p=0.008 n=5+5) SSA 15.9MB ± 0% 15.9MB ± 0% -0.06% (p=0.008 n=5+5) Flate 304kB ± 0% 303kB ± 0% -0.29% (p=0.008 n=5+5) GoParser 370kB ± 0% 370kB ± 0% +0.02% (p=0.008 n=5+5) Reflect 1.27MB ± 0% 1.27MB ± 0% -0.07% (p=0.008 n=5+5) Tar 421kB ± 0% 421kB ± 0% -0.05% (p=0.008 n=5+5) XML 518kB ± 0% 517kB ± 0% -0.06% (p=0.008 n=5+5) [Geo mean] 934kB 933kB -0.07% Note that some object files do grow; this can happen because some maps that were duplicates of each others must be stored separately. Change-Id: Ie076891bd8e9d269ff2ff5435d5d25c721e0e31d Reviewed-on: https://go-review.googlesource.com/104175 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-04-02 14:21:27 -07:00
Fatalf("onebitwalktype1: invalid initial alignment: type %v has alignment %d, but offset is %v", t, t.Align, off)
}
switch t.Etype {
case TINT8, TUINT8, TINT16, TUINT16,
TINT32, TUINT32, TINT64, TUINT64,
TINT, TUINT, TUINTPTR, TBOOL,
TFLOAT32, TFLOAT64, TCOMPLEX64, TCOMPLEX128:
case TPTR32, TPTR64, TUNSAFEPTR, TFUNC, TCHAN, TMAP:
if off&int64(Widthptr-1) != 0 {
Fatalf("onebitwalktype1: invalid alignment, %v", t)
}
bv.Set(int32(off / int64(Widthptr))) // pointer
case TSTRING:
cmd/internal/gc, runtime: use 1-bit bitmap for stack frames, data, bss The bitmaps were 2 bits per pointer because we needed to distinguish scalar, pointer, multiword, and we used the leftover value to distinguish uninitialized from scalar, even though the garbage collector (GC) didn't care. Now that there are no multiword structures from the GC's point of view, cut the bitmaps down to 1 bit per pointer, recording just live pointer vs not. The GC assumes the same layout for stack frames and for the maps describing the global data and bss sections, so change them all in one CL. The code still refers to 4-bit heap bitmaps and 2-bit "type bitmaps", since the 2-bit representation lives (at least for now) in some of the reflect data. Because these stack frame bitmaps are stored directly in the rodata in the binary, this CL reduces the size of the 6g binary by about 1.1%. Performance change is basically a wash, but using less memory, and smaller binaries, and enables other bitmap reductions. name old mean new mean delta BenchmarkBinaryTree17 13.2s × (0.97,1.03) 13.0s × (0.99,1.01) -0.93% (p=0.005) BenchmarkBinaryTree17-2 9.69s × (0.96,1.05) 9.51s × (0.96,1.03) -1.86% (p=0.001) BenchmarkBinaryTree17-4 10.1s × (0.97,1.05) 10.0s × (0.96,1.05) ~ (p=0.141) BenchmarkFannkuch11 4.35s × (0.99,1.01) 4.43s × (0.98,1.04) +1.75% (p=0.001) BenchmarkFannkuch11-2 4.31s × (0.99,1.03) 4.32s × (1.00,1.00) ~ (p=0.095) BenchmarkFannkuch11-4 4.32s × (0.99,1.02) 4.38s × (0.98,1.04) +1.38% (p=0.008) BenchmarkFmtFprintfEmpty 83.5ns × (0.97,1.10) 87.3ns × (0.92,1.11) +4.55% (p=0.014) BenchmarkFmtFprintfEmpty-2 81.8ns × (0.98,1.04) 82.5ns × (0.97,1.08) ~ (p=0.364) BenchmarkFmtFprintfEmpty-4 80.9ns × (0.99,1.01) 82.6ns × (0.97,1.08) +2.12% (p=0.010) BenchmarkFmtFprintfString 320ns × (0.95,1.04) 322ns × (0.97,1.05) ~ (p=0.368) BenchmarkFmtFprintfString-2 303ns × (0.97,1.04) 304ns × (0.97,1.04) ~ (p=0.484) BenchmarkFmtFprintfString-4 305ns × (0.97,1.05) 306ns × (0.98,1.05) ~ (p=0.543) BenchmarkFmtFprintfInt 311ns × (0.98,1.03) 319ns × (0.97,1.03) +2.63% (p=0.000) BenchmarkFmtFprintfInt-2 297ns × (0.98,1.04) 301ns × (0.97,1.04) +1.19% (p=0.023) BenchmarkFmtFprintfInt-4 302ns × (0.98,1.02) 304ns × (0.97,1.03) ~ (p=0.126) BenchmarkFmtFprintfIntInt 554ns × (0.96,1.05) 554ns × (0.97,1.03) ~ (p=0.975) BenchmarkFmtFprintfIntInt-2 520ns × (0.98,1.03) 517ns × (0.98,1.02) ~ (p=0.153) BenchmarkFmtFprintfIntInt-4 524ns × (0.98,1.02) 525ns × (0.98,1.03) ~ (p=0.597) BenchmarkFmtFprintfPrefixedInt 433ns × (0.97,1.06) 434ns × (0.97,1.06) ~ (p=0.804) BenchmarkFmtFprintfPrefixedInt-2 413ns × (0.98,1.04) 413ns × (0.98,1.03) ~ (p=0.881) BenchmarkFmtFprintfPrefixedInt-4 420ns × (0.97,1.03) 421ns × (0.97,1.03) ~ (p=0.561) BenchmarkFmtFprintfFloat 620ns × (0.99,1.03) 636ns × (0.97,1.03) +2.57% (p=0.000) BenchmarkFmtFprintfFloat-2 601ns × (0.98,1.02) 617ns × (0.98,1.03) +2.58% (p=0.000) BenchmarkFmtFprintfFloat-4 613ns × (0.98,1.03) 626ns × (0.98,1.02) +2.15% (p=0.000) BenchmarkFmtManyArgs 2.19µs × (0.96,1.04) 2.23µs × (0.97,1.02) +1.65% (p=0.000) BenchmarkFmtManyArgs-2 2.08µs × (0.98,1.03) 2.10µs × (0.99,1.02) +0.79% (p=0.019) BenchmarkFmtManyArgs-4 2.10µs × (0.98,1.02) 2.13µs × (0.98,1.02) +1.72% (p=0.000) BenchmarkGobDecode 21.3ms × (0.97,1.05) 21.1ms × (0.97,1.04) -1.36% (p=0.025) BenchmarkGobDecode-2 20.0ms × (0.97,1.03) 19.2ms × (0.97,1.03) -4.00% (p=0.000) BenchmarkGobDecode-4 19.5ms × (0.99,1.02) 19.0ms × (0.99,1.01) -2.39% (p=0.000) BenchmarkGobEncode 18.3ms × (0.95,1.07) 18.1ms × (0.96,1.08) ~ (p=0.305) BenchmarkGobEncode-2 16.8ms × (0.97,1.02) 16.4ms × (0.98,1.02) -2.79% (p=0.000) BenchmarkGobEncode-4 15.4ms × (0.98,1.02) 15.4ms × (0.98,1.02) ~ (p=0.465) BenchmarkGzip 650ms × (0.98,1.03) 655ms × (0.97,1.04) ~ (p=0.075) BenchmarkGzip-2 652ms × (0.98,1.03) 655ms × (0.98,1.02) ~ (p=0.337) BenchmarkGzip-4 656ms × (0.98,1.04) 653ms × (0.98,1.03) ~ (p=0.291) BenchmarkGunzip 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.507) BenchmarkGunzip-2 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.313) BenchmarkGunzip-4 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.312) BenchmarkHTTPClientServer 110µs × (0.98,1.03) 109µs × (0.99,1.02) -1.40% (p=0.000) BenchmarkHTTPClientServer-2 154µs × (0.90,1.08) 149µs × (0.90,1.08) -3.43% (p=0.007) BenchmarkHTTPClientServer-4 138µs × (0.97,1.04) 138µs × (0.96,1.04) ~ (p=0.670) BenchmarkJSONEncode 40.2ms × (0.98,1.02) 40.2ms × (0.98,1.05) ~ (p=0.828) BenchmarkJSONEncode-2 35.1ms × (0.99,1.02) 35.2ms × (0.98,1.03) ~ (p=0.392) BenchmarkJSONEncode-4 35.3ms × (0.98,1.03) 35.3ms × (0.98,1.02) ~ (p=0.813) BenchmarkJSONDecode 119ms × (0.97,1.02) 117ms × (0.98,1.02) -1.80% (p=0.000) BenchmarkJSONDecode-2 115ms × (0.99,1.02) 114ms × (0.98,1.02) -1.18% (p=0.000) BenchmarkJSONDecode-4 116ms × (0.98,1.02) 114ms × (0.98,1.02) -1.43% (p=0.000) BenchmarkMandelbrot200 6.03ms × (1.00,1.01) 6.03ms × (1.00,1.01) ~ (p=0.985) BenchmarkMandelbrot200-2 6.03ms × (1.00,1.01) 6.02ms × (1.00,1.01) ~ (p=0.320) BenchmarkMandelbrot200-4 6.03ms × (1.00,1.01) 6.03ms × (1.00,1.01) ~ (p=0.799) BenchmarkGoParse 8.63ms × (0.89,1.10) 8.58ms × (0.93,1.09) ~ (p=0.667) BenchmarkGoParse-2 8.20ms × (0.97,1.04) 8.37ms × (0.97,1.04) +1.96% (p=0.001) BenchmarkGoParse-4 8.00ms × (0.98,1.02) 8.14ms × (0.99,1.02) +1.75% (p=0.000) BenchmarkRegexpMatchEasy0_32 162ns × (1.00,1.01) 164ns × (0.98,1.04) +1.35% (p=0.011) BenchmarkRegexpMatchEasy0_32-2 161ns × (1.00,1.01) 161ns × (1.00,1.00) ~ (p=0.185) BenchmarkRegexpMatchEasy0_32-4 161ns × (1.00,1.00) 161ns × (1.00,1.00) -0.19% (p=0.001) BenchmarkRegexpMatchEasy0_1K 540ns × (0.99,1.02) 566ns × (0.98,1.04) +4.98% (p=0.000) BenchmarkRegexpMatchEasy0_1K-2 540ns × (0.99,1.01) 557ns × (0.99,1.01) +3.21% (p=0.000) BenchmarkRegexpMatchEasy0_1K-4 541ns × (0.99,1.01) 559ns × (0.99,1.01) +3.26% (p=0.000) BenchmarkRegexpMatchEasy1_32 139ns × (0.98,1.04) 139ns × (0.99,1.03) ~ (p=0.979) BenchmarkRegexpMatchEasy1_32-2 139ns × (0.99,1.04) 139ns × (0.99,1.02) ~ (p=0.777) BenchmarkRegexpMatchEasy1_32-4 139ns × (0.98,1.04) 139ns × (0.99,1.04) ~ (p=0.771) BenchmarkRegexpMatchEasy1_1K 890ns × (0.99,1.03) 885ns × (1.00,1.01) -0.50% (p=0.004) BenchmarkRegexpMatchEasy1_1K-2 888ns × (0.99,1.01) 885ns × (0.99,1.01) -0.37% (p=0.004) BenchmarkRegexpMatchEasy1_1K-4 890ns × (0.99,1.02) 884ns × (1.00,1.00) -0.70% (p=0.000) BenchmarkRegexpMatchMedium_32 252ns × (0.99,1.01) 251ns × (0.99,1.01) ~ (p=0.081) BenchmarkRegexpMatchMedium_32-2 254ns × (0.99,1.04) 252ns × (0.99,1.01) -0.78% (p=0.027) BenchmarkRegexpMatchMedium_32-4 253ns × (0.99,1.04) 252ns × (0.99,1.01) -0.70% (p=0.022) BenchmarkRegexpMatchMedium_1K 72.9µs × (0.99,1.01) 72.7µs × (1.00,1.00) ~ (p=0.064) BenchmarkRegexpMatchMedium_1K-2 74.1µs × (0.98,1.05) 72.9µs × (1.00,1.01) -1.61% (p=0.001) BenchmarkRegexpMatchMedium_1K-4 73.6µs × (0.99,1.05) 72.8µs × (1.00,1.00) -1.13% (p=0.007) BenchmarkRegexpMatchHard_32 3.88µs × (0.99,1.03) 3.92µs × (0.98,1.05) ~ (p=0.143) BenchmarkRegexpMatchHard_32-2 3.89µs × (0.99,1.03) 3.93µs × (0.98,1.09) ~ (p=0.278) BenchmarkRegexpMatchHard_32-4 3.90µs × (0.99,1.05) 3.93µs × (0.98,1.05) ~ (p=0.252) BenchmarkRegexpMatchHard_1K 118µs × (0.99,1.01) 117µs × (0.99,1.02) -0.54% (p=0.003) BenchmarkRegexpMatchHard_1K-2 118µs × (0.99,1.01) 118µs × (0.99,1.03) ~ (p=0.581) BenchmarkRegexpMatchHard_1K-4 118µs × (0.99,1.02) 117µs × (0.99,1.01) -0.54% (p=0.002) BenchmarkRevcomp 991ms × (0.95,1.10) 989ms × (0.94,1.08) ~ (p=0.879) BenchmarkRevcomp-2 978ms × (0.95,1.11) 962ms × (0.96,1.08) ~ (p=0.257) BenchmarkRevcomp-4 979ms × (0.96,1.07) 974ms × (0.96,1.11) ~ (p=0.678) BenchmarkTemplate 141ms × (0.99,1.02) 145ms × (0.99,1.02) +2.75% (p=0.000) BenchmarkTemplate-2 135ms × (0.98,1.02) 138ms × (0.99,1.02) +2.34% (p=0.000) BenchmarkTemplate-4 136ms × (0.98,1.02) 140ms × (0.99,1.02) +2.71% (p=0.000) BenchmarkTimeParse 640ns × (0.99,1.01) 622ns × (0.99,1.01) -2.88% (p=0.000) BenchmarkTimeParse-2 640ns × (0.99,1.01) 622ns × (1.00,1.00) -2.81% (p=0.000) BenchmarkTimeParse-4 640ns × (1.00,1.01) 622ns × (0.99,1.01) -2.82% (p=0.000) BenchmarkTimeFormat 730ns × (0.98,1.02) 731ns × (0.98,1.03) ~ (p=0.767) BenchmarkTimeFormat-2 709ns × (0.99,1.02) 707ns × (0.99,1.02) ~ (p=0.347) BenchmarkTimeFormat-4 717ns × (0.98,1.01) 718ns × (0.98,1.02) ~ (p=0.793) Change-Id: Ie779c47e912bf80eb918bafa13638bd8dfd6c2d9 Reviewed-on: https://go-review.googlesource.com/9406 Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-27 22:45:57 -04:00
// struct { byte *str; intgo len; }
if off&int64(Widthptr-1) != 0 {
Fatalf("onebitwalktype1: invalid alignment, %v", t)
}
bv.Set(int32(off / int64(Widthptr))) //pointer in first slot
case TINTER:
cmd/internal/gc, runtime: use 1-bit bitmap for stack frames, data, bss The bitmaps were 2 bits per pointer because we needed to distinguish scalar, pointer, multiword, and we used the leftover value to distinguish uninitialized from scalar, even though the garbage collector (GC) didn't care. Now that there are no multiword structures from the GC's point of view, cut the bitmaps down to 1 bit per pointer, recording just live pointer vs not. The GC assumes the same layout for stack frames and for the maps describing the global data and bss sections, so change them all in one CL. The code still refers to 4-bit heap bitmaps and 2-bit "type bitmaps", since the 2-bit representation lives (at least for now) in some of the reflect data. Because these stack frame bitmaps are stored directly in the rodata in the binary, this CL reduces the size of the 6g binary by about 1.1%. Performance change is basically a wash, but using less memory, and smaller binaries, and enables other bitmap reductions. name old mean new mean delta BenchmarkBinaryTree17 13.2s × (0.97,1.03) 13.0s × (0.99,1.01) -0.93% (p=0.005) BenchmarkBinaryTree17-2 9.69s × (0.96,1.05) 9.51s × (0.96,1.03) -1.86% (p=0.001) BenchmarkBinaryTree17-4 10.1s × (0.97,1.05) 10.0s × (0.96,1.05) ~ (p=0.141) BenchmarkFannkuch11 4.35s × (0.99,1.01) 4.43s × (0.98,1.04) +1.75% (p=0.001) BenchmarkFannkuch11-2 4.31s × (0.99,1.03) 4.32s × (1.00,1.00) ~ (p=0.095) BenchmarkFannkuch11-4 4.32s × (0.99,1.02) 4.38s × (0.98,1.04) +1.38% (p=0.008) BenchmarkFmtFprintfEmpty 83.5ns × (0.97,1.10) 87.3ns × (0.92,1.11) +4.55% (p=0.014) BenchmarkFmtFprintfEmpty-2 81.8ns × (0.98,1.04) 82.5ns × (0.97,1.08) ~ (p=0.364) BenchmarkFmtFprintfEmpty-4 80.9ns × (0.99,1.01) 82.6ns × (0.97,1.08) +2.12% (p=0.010) BenchmarkFmtFprintfString 320ns × (0.95,1.04) 322ns × (0.97,1.05) ~ (p=0.368) BenchmarkFmtFprintfString-2 303ns × (0.97,1.04) 304ns × (0.97,1.04) ~ (p=0.484) BenchmarkFmtFprintfString-4 305ns × (0.97,1.05) 306ns × (0.98,1.05) ~ (p=0.543) BenchmarkFmtFprintfInt 311ns × (0.98,1.03) 319ns × (0.97,1.03) +2.63% (p=0.000) BenchmarkFmtFprintfInt-2 297ns × (0.98,1.04) 301ns × (0.97,1.04) +1.19% (p=0.023) BenchmarkFmtFprintfInt-4 302ns × (0.98,1.02) 304ns × (0.97,1.03) ~ (p=0.126) BenchmarkFmtFprintfIntInt 554ns × (0.96,1.05) 554ns × (0.97,1.03) ~ (p=0.975) BenchmarkFmtFprintfIntInt-2 520ns × (0.98,1.03) 517ns × (0.98,1.02) ~ (p=0.153) BenchmarkFmtFprintfIntInt-4 524ns × (0.98,1.02) 525ns × (0.98,1.03) ~ (p=0.597) BenchmarkFmtFprintfPrefixedInt 433ns × (0.97,1.06) 434ns × (0.97,1.06) ~ (p=0.804) BenchmarkFmtFprintfPrefixedInt-2 413ns × (0.98,1.04) 413ns × (0.98,1.03) ~ (p=0.881) BenchmarkFmtFprintfPrefixedInt-4 420ns × (0.97,1.03) 421ns × (0.97,1.03) ~ (p=0.561) BenchmarkFmtFprintfFloat 620ns × (0.99,1.03) 636ns × (0.97,1.03) +2.57% (p=0.000) BenchmarkFmtFprintfFloat-2 601ns × (0.98,1.02) 617ns × (0.98,1.03) +2.58% (p=0.000) BenchmarkFmtFprintfFloat-4 613ns × (0.98,1.03) 626ns × (0.98,1.02) +2.15% (p=0.000) BenchmarkFmtManyArgs 2.19µs × (0.96,1.04) 2.23µs × (0.97,1.02) +1.65% (p=0.000) BenchmarkFmtManyArgs-2 2.08µs × (0.98,1.03) 2.10µs × (0.99,1.02) +0.79% (p=0.019) BenchmarkFmtManyArgs-4 2.10µs × (0.98,1.02) 2.13µs × (0.98,1.02) +1.72% (p=0.000) BenchmarkGobDecode 21.3ms × (0.97,1.05) 21.1ms × (0.97,1.04) -1.36% (p=0.025) BenchmarkGobDecode-2 20.0ms × (0.97,1.03) 19.2ms × (0.97,1.03) -4.00% (p=0.000) BenchmarkGobDecode-4 19.5ms × (0.99,1.02) 19.0ms × (0.99,1.01) -2.39% (p=0.000) BenchmarkGobEncode 18.3ms × (0.95,1.07) 18.1ms × (0.96,1.08) ~ (p=0.305) BenchmarkGobEncode-2 16.8ms × (0.97,1.02) 16.4ms × (0.98,1.02) -2.79% (p=0.000) BenchmarkGobEncode-4 15.4ms × (0.98,1.02) 15.4ms × (0.98,1.02) ~ (p=0.465) BenchmarkGzip 650ms × (0.98,1.03) 655ms × (0.97,1.04) ~ (p=0.075) BenchmarkGzip-2 652ms × (0.98,1.03) 655ms × (0.98,1.02) ~ (p=0.337) BenchmarkGzip-4 656ms × (0.98,1.04) 653ms × (0.98,1.03) ~ (p=0.291) BenchmarkGunzip 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.507) BenchmarkGunzip-2 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.313) BenchmarkGunzip-4 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.312) BenchmarkHTTPClientServer 110µs × (0.98,1.03) 109µs × (0.99,1.02) -1.40% (p=0.000) BenchmarkHTTPClientServer-2 154µs × (0.90,1.08) 149µs × (0.90,1.08) -3.43% (p=0.007) BenchmarkHTTPClientServer-4 138µs × (0.97,1.04) 138µs × (0.96,1.04) ~ (p=0.670) BenchmarkJSONEncode 40.2ms × (0.98,1.02) 40.2ms × (0.98,1.05) ~ (p=0.828) BenchmarkJSONEncode-2 35.1ms × (0.99,1.02) 35.2ms × (0.98,1.03) ~ (p=0.392) BenchmarkJSONEncode-4 35.3ms × (0.98,1.03) 35.3ms × (0.98,1.02) ~ (p=0.813) BenchmarkJSONDecode 119ms × (0.97,1.02) 117ms × (0.98,1.02) -1.80% (p=0.000) BenchmarkJSONDecode-2 115ms × (0.99,1.02) 114ms × (0.98,1.02) -1.18% (p=0.000) BenchmarkJSONDecode-4 116ms × (0.98,1.02) 114ms × (0.98,1.02) -1.43% (p=0.000) BenchmarkMandelbrot200 6.03ms × (1.00,1.01) 6.03ms × (1.00,1.01) ~ (p=0.985) BenchmarkMandelbrot200-2 6.03ms × (1.00,1.01) 6.02ms × (1.00,1.01) ~ (p=0.320) BenchmarkMandelbrot200-4 6.03ms × (1.00,1.01) 6.03ms × (1.00,1.01) ~ (p=0.799) BenchmarkGoParse 8.63ms × (0.89,1.10) 8.58ms × (0.93,1.09) ~ (p=0.667) BenchmarkGoParse-2 8.20ms × (0.97,1.04) 8.37ms × (0.97,1.04) +1.96% (p=0.001) BenchmarkGoParse-4 8.00ms × (0.98,1.02) 8.14ms × (0.99,1.02) +1.75% (p=0.000) BenchmarkRegexpMatchEasy0_32 162ns × (1.00,1.01) 164ns × (0.98,1.04) +1.35% (p=0.011) BenchmarkRegexpMatchEasy0_32-2 161ns × (1.00,1.01) 161ns × (1.00,1.00) ~ (p=0.185) BenchmarkRegexpMatchEasy0_32-4 161ns × (1.00,1.00) 161ns × (1.00,1.00) -0.19% (p=0.001) BenchmarkRegexpMatchEasy0_1K 540ns × (0.99,1.02) 566ns × (0.98,1.04) +4.98% (p=0.000) BenchmarkRegexpMatchEasy0_1K-2 540ns × (0.99,1.01) 557ns × (0.99,1.01) +3.21% (p=0.000) BenchmarkRegexpMatchEasy0_1K-4 541ns × (0.99,1.01) 559ns × (0.99,1.01) +3.26% (p=0.000) BenchmarkRegexpMatchEasy1_32 139ns × (0.98,1.04) 139ns × (0.99,1.03) ~ (p=0.979) BenchmarkRegexpMatchEasy1_32-2 139ns × (0.99,1.04) 139ns × (0.99,1.02) ~ (p=0.777) BenchmarkRegexpMatchEasy1_32-4 139ns × (0.98,1.04) 139ns × (0.99,1.04) ~ (p=0.771) BenchmarkRegexpMatchEasy1_1K 890ns × (0.99,1.03) 885ns × (1.00,1.01) -0.50% (p=0.004) BenchmarkRegexpMatchEasy1_1K-2 888ns × (0.99,1.01) 885ns × (0.99,1.01) -0.37% (p=0.004) BenchmarkRegexpMatchEasy1_1K-4 890ns × (0.99,1.02) 884ns × (1.00,1.00) -0.70% (p=0.000) BenchmarkRegexpMatchMedium_32 252ns × (0.99,1.01) 251ns × (0.99,1.01) ~ (p=0.081) BenchmarkRegexpMatchMedium_32-2 254ns × (0.99,1.04) 252ns × (0.99,1.01) -0.78% (p=0.027) BenchmarkRegexpMatchMedium_32-4 253ns × (0.99,1.04) 252ns × (0.99,1.01) -0.70% (p=0.022) BenchmarkRegexpMatchMedium_1K 72.9µs × (0.99,1.01) 72.7µs × (1.00,1.00) ~ (p=0.064) BenchmarkRegexpMatchMedium_1K-2 74.1µs × (0.98,1.05) 72.9µs × (1.00,1.01) -1.61% (p=0.001) BenchmarkRegexpMatchMedium_1K-4 73.6µs × (0.99,1.05) 72.8µs × (1.00,1.00) -1.13% (p=0.007) BenchmarkRegexpMatchHard_32 3.88µs × (0.99,1.03) 3.92µs × (0.98,1.05) ~ (p=0.143) BenchmarkRegexpMatchHard_32-2 3.89µs × (0.99,1.03) 3.93µs × (0.98,1.09) ~ (p=0.278) BenchmarkRegexpMatchHard_32-4 3.90µs × (0.99,1.05) 3.93µs × (0.98,1.05) ~ (p=0.252) BenchmarkRegexpMatchHard_1K 118µs × (0.99,1.01) 117µs × (0.99,1.02) -0.54% (p=0.003) BenchmarkRegexpMatchHard_1K-2 118µs × (0.99,1.01) 118µs × (0.99,1.03) ~ (p=0.581) BenchmarkRegexpMatchHard_1K-4 118µs × (0.99,1.02) 117µs × (0.99,1.01) -0.54% (p=0.002) BenchmarkRevcomp 991ms × (0.95,1.10) 989ms × (0.94,1.08) ~ (p=0.879) BenchmarkRevcomp-2 978ms × (0.95,1.11) 962ms × (0.96,1.08) ~ (p=0.257) BenchmarkRevcomp-4 979ms × (0.96,1.07) 974ms × (0.96,1.11) ~ (p=0.678) BenchmarkTemplate 141ms × (0.99,1.02) 145ms × (0.99,1.02) +2.75% (p=0.000) BenchmarkTemplate-2 135ms × (0.98,1.02) 138ms × (0.99,1.02) +2.34% (p=0.000) BenchmarkTemplate-4 136ms × (0.98,1.02) 140ms × (0.99,1.02) +2.71% (p=0.000) BenchmarkTimeParse 640ns × (0.99,1.01) 622ns × (0.99,1.01) -2.88% (p=0.000) BenchmarkTimeParse-2 640ns × (0.99,1.01) 622ns × (1.00,1.00) -2.81% (p=0.000) BenchmarkTimeParse-4 640ns × (1.00,1.01) 622ns × (0.99,1.01) -2.82% (p=0.000) BenchmarkTimeFormat 730ns × (0.98,1.02) 731ns × (0.98,1.03) ~ (p=0.767) BenchmarkTimeFormat-2 709ns × (0.99,1.02) 707ns × (0.99,1.02) ~ (p=0.347) BenchmarkTimeFormat-4 717ns × (0.98,1.01) 718ns × (0.98,1.02) ~ (p=0.793) Change-Id: Ie779c47e912bf80eb918bafa13638bd8dfd6c2d9 Reviewed-on: https://go-review.googlesource.com/9406 Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-27 22:45:57 -04:00
// struct { Itab *tab; void *data; }
// or, when isnilinter(t)==true:
// struct { Type *type; void *data; }
if off&int64(Widthptr-1) != 0 {
Fatalf("onebitwalktype1: invalid alignment, %v", t)
}
// The first word of an interface is a pointer, but we don't
// treat it as such.
// 1. If it is a non-empty interface, the pointer points to an itab
// which is always in persistentalloc space.
// 2. If it is an empty interface, the pointer points to a _type.
// a. If it is a compile-time-allocated type, it points into
// the read-only data section.
// b. If it is a reflect-allocated type, it points into the Go heap.
// Reflect is responsible for keeping a reference to
// the underlying type so it won't be GCd.
// If we ever have a moving GC, we need to change this for 2b (as
// well as scan itabs to update their itab._type fields).
bv.Set(int32(off/int64(Widthptr) + 1)) // pointer in second slot
case TSLICE:
// struct { byte *array; uintgo len; uintgo cap; }
if off&int64(Widthptr-1) != 0 {
Fatalf("onebitwalktype1: invalid TARRAY alignment, %v", t)
}
bv.Set(int32(off / int64(Widthptr))) // pointer in first slot (BitsPointer)
case TARRAY:
elt := t.Elem()
if elt.Width == 0 {
// Short-circuit for #20739.
break
}
for i := int64(0); i < t.NumElem(); i++ {
onebitwalktype1(elt, off, bv)
off += elt.Width
}
case TSTRUCT:
for _, f := range t.Fields().Slice() {
onebitwalktype1(f.Type, off+f.Offset, bv)
}
default:
Fatalf("onebitwalktype1: unexpected type, %v", t)
}
}
// Generates live pointer value maps for arguments and local variables. The
// this argument and the in arguments are always assumed live. The vars
// argument is a slice of *Nodes.
func (lv *Liveness) pointerMap(liveout bvec, vars []*Node, args, locals bvec) {
for i := int32(0); ; i++ {
i = liveout.Next(i)
if i < 0 {
break
}
node := vars[i]
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
switch node.Class() {
case PAUTO:
onebitwalktype1(node.Type, node.Xoffset+lv.stkptrsize, locals)
case PPARAM, PPARAMOUT:
onebitwalktype1(node.Type, node.Xoffset, args)
}
}
}
// markUnsafePoints finds unsafe points and computes lv.unsafePoints.
func (lv *Liveness) markUnsafePoints() {
if compiling_runtime || lv.f.NoSplit {
// No complex analysis necessary. Do this on the fly
// in issafepoint.
return
}
lv.unsafePoints = bvalloc(int32(lv.f.NumValues()))
// Mark write barrier unsafe points.
for _, wbBlock := range lv.f.WBLoads {
// Check that we have the expected diamond shape.
if len(wbBlock.Succs) != 2 {
lv.f.Fatalf("expected branch at write barrier block %v", wbBlock)
}
s0, s1 := wbBlock.Succs[0].Block(), wbBlock.Succs[1].Block()
if s0.Kind != ssa.BlockPlain || s1.Kind != ssa.BlockPlain {
lv.f.Fatalf("expected successors of write barrier block %v to be plain", wbBlock)
}
if s0.Succs[0].Block() != s1.Succs[0].Block() {
lv.f.Fatalf("expected successors of write barrier block %v to converge", wbBlock)
}
// Flow backwards from the control value to find the
// flag load. We don't know what lowered ops we're
// looking for, but all current arches produce a
// single op that does the memory load from the flag
// address, so we look for that.
var load *ssa.Value
v := wbBlock.Control
for {
if sym, ok := v.Aux.(*obj.LSym); ok && sym == writeBarrier {
load = v
break
}
switch v.Op {
case ssa.Op386TESTL:
// 386 lowers Neq32 to (TESTL cond cond),
if v.Args[0] == v.Args[1] {
v = v.Args[0]
continue
}
case ssa.OpPPC64MOVWZload, ssa.Op386MOVLload:
// Args[0] is the address of the write
// barrier control. Ignore Args[1],
// which is the mem operand.
v = v.Args[0]
continue
}
// Common case: just flow backwards.
if len(v.Args) != 1 {
v.Fatalf("write barrier control value has more than one argument: %s", v.LongString())
}
v = v.Args[0]
}
// Mark everything after the load unsafe.
found := false
for _, v := range wbBlock.Values {
found = found || v == load
if found {
lv.unsafePoints.Set(int32(v.ID))
}
}
// Mark the two successor blocks unsafe. These come
// back together immediately after the direct write in
// one successor and the last write barrier call in
// the other, so there's no need to be more precise.
for _, succ := range wbBlock.Succs {
for _, v := range succ.Block().Values {
lv.unsafePoints.Set(int32(v.ID))
}
}
}
// Find uintptr -> unsafe.Pointer conversions and flood
// unsafeness back to a call (which is always a safe point).
//
// Looking for the uintptr -> unsafe.Pointer conversion has a
// few advantages over looking for unsafe.Pointer -> uintptr
// conversions:
//
// 1. We avoid needlessly blocking safe-points for
// unsafe.Pointer -> uintptr conversions that never go back to
// a Pointer.
//
// 2. We don't have to detect calls to reflect.Value.Pointer,
// reflect.Value.UnsafeAddr, and reflect.Value.InterfaceData,
// which are implicit unsafe.Pointer -> uintptr conversions.
// We can't even reliably detect this if there's an indirect
// call to one of these methods.
//
// TODO: For trivial unsafe.Pointer arithmetic, it would be
// nice to only flood as far as the unsafe.Pointer -> uintptr
// conversion, but it's hard to know which argument of an Add
// or Sub to follow.
var flooded bvec
var flood func(b *ssa.Block, vi int)
flood = func(b *ssa.Block, vi int) {
if flooded.n == 0 {
flooded = bvalloc(int32(lv.f.NumBlocks()))
}
if flooded.Get(int32(b.ID)) {
return
}
for i := vi - 1; i >= 0; i-- {
v := b.Values[i]
if v.Op.IsCall() {
// Uintptrs must not contain live
// pointers across calls, so stop
// flooding.
return
}
lv.unsafePoints.Set(int32(v.ID))
}
if vi == len(b.Values) {
// We marked all values in this block, so no
// need to flood this block again.
flooded.Set(int32(b.ID))
}
for _, pred := range b.Preds {
flood(pred.Block(), len(pred.Block().Values))
}
}
for _, b := range lv.f.Blocks {
for i, v := range b.Values {
if !(v.Op == ssa.OpConvert && v.Type.IsPtrShaped()) {
continue
}
// Flood the unsafe-ness of this backwards
// until we hit a call.
flood(b, i+1)
}
}
}
// Returns true for instructions that are safe points that must be annotated
// with liveness information.
func (lv *Liveness) issafepoint(v *ssa.Value) bool {
// The runtime was written with the assumption that
// safe-points only appear at call sites (because that's how
// it used to be). We could and should improve that, but for
// now keep the old safe-point rules in the runtime.
//
// go:nosplit functions are similar. Since safe points used to
// be coupled with stack checks, go:nosplit often actually
// means "no safe points in this function".
if compiling_runtime || lv.f.NoSplit {
return v.Op.IsCall()
}
switch v.Op {
case ssa.OpInitMem, ssa.OpArg, ssa.OpSP, ssa.OpSB,
ssa.OpSelect0, ssa.OpSelect1, ssa.OpGetG,
ssa.OpVarDef, ssa.OpVarLive, ssa.OpKeepAlive,
ssa.OpPhi:
// These don't produce code (see genssa).
return false
}
return !lv.unsafePoints.Get(int32(v.ID))
}
// Initializes the sets for solving the live variables. Visits all the
// instructions in each basic block to summarizes the information at each basic
// block
func (lv *Liveness) prologue() {
cmd/compile: make liveness more efficient When the number of variables in a function is very large, liveness analysis gets less efficient, since every bit vector is O(number of variables). Improve the situation by returning a sparse representation from progeffects. In all scenarios, progeffects either returns a slice that is shared function-wide, and which is usually small, or a slice that is guaranteed to have at most three values. Reduces compilation time for the code in #8225 Comment 1 by ~10%. Minor effects on regular packages (below). Passes toolstash -cmp. Updates #8225 name old time/op new time/op delta Template 215ms ± 2% 212ms ± 4% -1.31% (p=0.001 n=30+30) Unicode 98.3ms ± 3% 98.4ms ± 5% ~ (p=0.971 n=30+30) GoTypes 657ms ± 3% 651ms ± 2% -0.98% (p=0.001 n=30+27) Compiler 2.78s ± 2% 2.77s ± 2% -0.60% (p=0.006 n=30+30) Flate 130ms ± 4% 130ms ± 4% ~ (p=0.712 n=29+30) GoParser 159ms ± 5% 158ms ± 3% ~ (p=0.331 n=29+30) Reflect 406ms ± 3% 404ms ± 3% -0.69% (p=0.041 n=29+30) Tar 117ms ± 4% 117ms ± 3% ~ (p=0.886 n=30+29) XML 219ms ± 2% 217ms ± 2% ~ (p=0.091 n=29+24) name old user-ns/op new user-ns/op delta Template 272user-ms ± 3% 270user-ms ± 3% -1.03% (p=0.004 n=30+30) Unicode 138user-ms ± 2% 138user-ms ± 3% ~ (p=0.902 n=29+29) GoTypes 891user-ms ± 2% 883user-ms ± 2% -0.95% (p=0.000 n=29+29) Compiler 3.85user-s ± 2% 3.84user-s ± 2% ~ (p=0.236 n=30+30) Flate 167user-ms ± 2% 166user-ms ± 4% ~ (p=0.511 n=28+30) GoParser 211user-ms ± 4% 210user-ms ± 3% ~ (p=0.287 n=29+30) Reflect 539user-ms ± 3% 536user-ms ± 2% -0.59% (p=0.034 n=29+30) Tar 154user-ms ± 3% 155user-ms ± 4% ~ (p=0.786 n=30+30) XML 289user-ms ± 3% 288user-ms ± 4% ~ (p=0.249 n=30+26) name old alloc/op new alloc/op delta Template 40.7MB ± 0% 40.8MB ± 0% +0.09% (p=0.001 n=30+30) Unicode 30.8MB ± 0% 30.8MB ± 0% ~ (p=0.112 n=30+30) GoTypes 123MB ± 0% 124MB ± 0% +0.09% (p=0.000 n=30+30) Compiler 473MB ± 0% 473MB ± 0% +0.05% (p=0.000 n=30+30) Flate 26.5MB ± 0% 26.5MB ± 0% ~ (p=0.186 n=29+30) GoParser 32.3MB ± 0% 32.4MB ± 0% +0.07% (p=0.021 n=28+30) Reflect 84.4MB ± 0% 84.6MB ± 0% +0.21% (p=0.000 n=30+30) Tar 27.3MB ± 0% 27.3MB ± 0% +0.09% (p=0.010 n=30+28) XML 44.7MB ± 0% 44.7MB ± 0% +0.07% (p=0.002 n=30+30) name old allocs/op new allocs/op delta Template 401k ± 1% 400k ± 1% ~ (p=0.321 n=30+30) Unicode 331k ± 1% 331k ± 1% ~ (p=0.357 n=30+28) GoTypes 1.24M ± 0% 1.24M ± 1% -0.19% (p=0.001 n=30+30) Compiler 4.27M ± 0% 4.27M ± 0% -0.13% (p=0.000 n=30+30) Flate 252k ± 1% 251k ± 1% -0.30% (p=0.005 n=30+30) GoParser 325k ± 1% 325k ± 1% ~ (p=0.224 n=28+30) Reflect 1.06M ± 0% 1.05M ± 0% -0.34% (p=0.000 n=30+30) Tar 266k ± 1% 266k ± 1% ~ (p=0.333 n=30+30) XML 416k ± 1% 415k ± 1% ~ (p=0.144 n=30+29) Change-Id: I6ba67a9203516373062a2618122306da73333d98 Reviewed-on: https://go-review.googlesource.com/36211 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-01-14 23:43:26 -08:00
lv.initcache()
for _, b := range lv.f.Blocks {
be := lv.blockEffects(b)
// Walk the block instructions backward and update the block
// effects with the each prog effects.
for j := len(b.Values) - 1; j >= 0; j-- {
pos, e := lv.valueEffects(b.Values[j])
if e&varkill != 0 {
be.varkill.Set(pos)
be.uevar.Unset(pos)
cmd/compile: make liveness more efficient When the number of variables in a function is very large, liveness analysis gets less efficient, since every bit vector is O(number of variables). Improve the situation by returning a sparse representation from progeffects. In all scenarios, progeffects either returns a slice that is shared function-wide, and which is usually small, or a slice that is guaranteed to have at most three values. Reduces compilation time for the code in #8225 Comment 1 by ~10%. Minor effects on regular packages (below). Passes toolstash -cmp. Updates #8225 name old time/op new time/op delta Template 215ms ± 2% 212ms ± 4% -1.31% (p=0.001 n=30+30) Unicode 98.3ms ± 3% 98.4ms ± 5% ~ (p=0.971 n=30+30) GoTypes 657ms ± 3% 651ms ± 2% -0.98% (p=0.001 n=30+27) Compiler 2.78s ± 2% 2.77s ± 2% -0.60% (p=0.006 n=30+30) Flate 130ms ± 4% 130ms ± 4% ~ (p=0.712 n=29+30) GoParser 159ms ± 5% 158ms ± 3% ~ (p=0.331 n=29+30) Reflect 406ms ± 3% 404ms ± 3% -0.69% (p=0.041 n=29+30) Tar 117ms ± 4% 117ms ± 3% ~ (p=0.886 n=30+29) XML 219ms ± 2% 217ms ± 2% ~ (p=0.091 n=29+24) name old user-ns/op new user-ns/op delta Template 272user-ms ± 3% 270user-ms ± 3% -1.03% (p=0.004 n=30+30) Unicode 138user-ms ± 2% 138user-ms ± 3% ~ (p=0.902 n=29+29) GoTypes 891user-ms ± 2% 883user-ms ± 2% -0.95% (p=0.000 n=29+29) Compiler 3.85user-s ± 2% 3.84user-s ± 2% ~ (p=0.236 n=30+30) Flate 167user-ms ± 2% 166user-ms ± 4% ~ (p=0.511 n=28+30) GoParser 211user-ms ± 4% 210user-ms ± 3% ~ (p=0.287 n=29+30) Reflect 539user-ms ± 3% 536user-ms ± 2% -0.59% (p=0.034 n=29+30) Tar 154user-ms ± 3% 155user-ms ± 4% ~ (p=0.786 n=30+30) XML 289user-ms ± 3% 288user-ms ± 4% ~ (p=0.249 n=30+26) name old alloc/op new alloc/op delta Template 40.7MB ± 0% 40.8MB ± 0% +0.09% (p=0.001 n=30+30) Unicode 30.8MB ± 0% 30.8MB ± 0% ~ (p=0.112 n=30+30) GoTypes 123MB ± 0% 124MB ± 0% +0.09% (p=0.000 n=30+30) Compiler 473MB ± 0% 473MB ± 0% +0.05% (p=0.000 n=30+30) Flate 26.5MB ± 0% 26.5MB ± 0% ~ (p=0.186 n=29+30) GoParser 32.3MB ± 0% 32.4MB ± 0% +0.07% (p=0.021 n=28+30) Reflect 84.4MB ± 0% 84.6MB ± 0% +0.21% (p=0.000 n=30+30) Tar 27.3MB ± 0% 27.3MB ± 0% +0.09% (p=0.010 n=30+28) XML 44.7MB ± 0% 44.7MB ± 0% +0.07% (p=0.002 n=30+30) name old allocs/op new allocs/op delta Template 401k ± 1% 400k ± 1% ~ (p=0.321 n=30+30) Unicode 331k ± 1% 331k ± 1% ~ (p=0.357 n=30+28) GoTypes 1.24M ± 0% 1.24M ± 1% -0.19% (p=0.001 n=30+30) Compiler 4.27M ± 0% 4.27M ± 0% -0.13% (p=0.000 n=30+30) Flate 252k ± 1% 251k ± 1% -0.30% (p=0.005 n=30+30) GoParser 325k ± 1% 325k ± 1% ~ (p=0.224 n=28+30) Reflect 1.06M ± 0% 1.05M ± 0% -0.34% (p=0.000 n=30+30) Tar 266k ± 1% 266k ± 1% ~ (p=0.333 n=30+30) XML 416k ± 1% 415k ± 1% ~ (p=0.144 n=30+29) Change-Id: I6ba67a9203516373062a2618122306da73333d98 Reviewed-on: https://go-review.googlesource.com/36211 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-01-14 23:43:26 -08:00
}
if e&uevar != 0 {
be.uevar.Set(pos)
}
}
// Walk the block instructions forward to update avarinit bits.
// avarinit describes the effect at the end of the block, not the beginning.
for _, val := range b.Values {
pos, e := lv.valueEffects(val)
if e&varkill != 0 {
be.avarinit.Unset(pos)
cmd/compile: make liveness more efficient When the number of variables in a function is very large, liveness analysis gets less efficient, since every bit vector is O(number of variables). Improve the situation by returning a sparse representation from progeffects. In all scenarios, progeffects either returns a slice that is shared function-wide, and which is usually small, or a slice that is guaranteed to have at most three values. Reduces compilation time for the code in #8225 Comment 1 by ~10%. Minor effects on regular packages (below). Passes toolstash -cmp. Updates #8225 name old time/op new time/op delta Template 215ms ± 2% 212ms ± 4% -1.31% (p=0.001 n=30+30) Unicode 98.3ms ± 3% 98.4ms ± 5% ~ (p=0.971 n=30+30) GoTypes 657ms ± 3% 651ms ± 2% -0.98% (p=0.001 n=30+27) Compiler 2.78s ± 2% 2.77s ± 2% -0.60% (p=0.006 n=30+30) Flate 130ms ± 4% 130ms ± 4% ~ (p=0.712 n=29+30) GoParser 159ms ± 5% 158ms ± 3% ~ (p=0.331 n=29+30) Reflect 406ms ± 3% 404ms ± 3% -0.69% (p=0.041 n=29+30) Tar 117ms ± 4% 117ms ± 3% ~ (p=0.886 n=30+29) XML 219ms ± 2% 217ms ± 2% ~ (p=0.091 n=29+24) name old user-ns/op new user-ns/op delta Template 272user-ms ± 3% 270user-ms ± 3% -1.03% (p=0.004 n=30+30) Unicode 138user-ms ± 2% 138user-ms ± 3% ~ (p=0.902 n=29+29) GoTypes 891user-ms ± 2% 883user-ms ± 2% -0.95% (p=0.000 n=29+29) Compiler 3.85user-s ± 2% 3.84user-s ± 2% ~ (p=0.236 n=30+30) Flate 167user-ms ± 2% 166user-ms ± 4% ~ (p=0.511 n=28+30) GoParser 211user-ms ± 4% 210user-ms ± 3% ~ (p=0.287 n=29+30) Reflect 539user-ms ± 3% 536user-ms ± 2% -0.59% (p=0.034 n=29+30) Tar 154user-ms ± 3% 155user-ms ± 4% ~ (p=0.786 n=30+30) XML 289user-ms ± 3% 288user-ms ± 4% ~ (p=0.249 n=30+26) name old alloc/op new alloc/op delta Template 40.7MB ± 0% 40.8MB ± 0% +0.09% (p=0.001 n=30+30) Unicode 30.8MB ± 0% 30.8MB ± 0% ~ (p=0.112 n=30+30) GoTypes 123MB ± 0% 124MB ± 0% +0.09% (p=0.000 n=30+30) Compiler 473MB ± 0% 473MB ± 0% +0.05% (p=0.000 n=30+30) Flate 26.5MB ± 0% 26.5MB ± 0% ~ (p=0.186 n=29+30) GoParser 32.3MB ± 0% 32.4MB ± 0% +0.07% (p=0.021 n=28+30) Reflect 84.4MB ± 0% 84.6MB ± 0% +0.21% (p=0.000 n=30+30) Tar 27.3MB ± 0% 27.3MB ± 0% +0.09% (p=0.010 n=30+28) XML 44.7MB ± 0% 44.7MB ± 0% +0.07% (p=0.002 n=30+30) name old allocs/op new allocs/op delta Template 401k ± 1% 400k ± 1% ~ (p=0.321 n=30+30) Unicode 331k ± 1% 331k ± 1% ~ (p=0.357 n=30+28) GoTypes 1.24M ± 0% 1.24M ± 1% -0.19% (p=0.001 n=30+30) Compiler 4.27M ± 0% 4.27M ± 0% -0.13% (p=0.000 n=30+30) Flate 252k ± 1% 251k ± 1% -0.30% (p=0.005 n=30+30) GoParser 325k ± 1% 325k ± 1% ~ (p=0.224 n=28+30) Reflect 1.06M ± 0% 1.05M ± 0% -0.34% (p=0.000 n=30+30) Tar 266k ± 1% 266k ± 1% ~ (p=0.333 n=30+30) XML 416k ± 1% 415k ± 1% ~ (p=0.144 n=30+29) Change-Id: I6ba67a9203516373062a2618122306da73333d98 Reviewed-on: https://go-review.googlesource.com/36211 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-01-14 23:43:26 -08:00
}
if e&avarinit != 0 {
be.avarinit.Set(pos)
}
}
}
}
// Solve the liveness dataflow equations.
func (lv *Liveness) solve() {
// These temporary bitvectors exist to avoid successive allocations and
// frees within the loop.
newlivein := bvalloc(int32(len(lv.vars)))
newliveout := bvalloc(int32(len(lv.vars)))
any := bvalloc(int32(len(lv.vars)))
all := bvalloc(int32(len(lv.vars)))
// Push avarinitall, avarinitany forward.
// avarinitall says the addressed var is initialized along all paths reaching the block exit.
// avarinitany says the addressed var is initialized along some path reaching the block exit.
for _, b := range lv.f.Blocks {
be := lv.blockEffects(b)
if b == lv.f.Entry {
be.avarinitall.Copy(be.avarinit)
} else {
be.avarinitall.Clear()
be.avarinitall.Not()
}
be.avarinitany.Copy(be.avarinit)
}
// Walk blocks in the general direction of propagation (RPO
// for avarinit{any,all}, and PO for live{in,out}). This
// improves convergence.
po := lv.f.Postorder()
for change := true; change; {
change = false
for i := len(po) - 1; i >= 0; i-- {
b := po[i]
be := lv.blockEffects(b)
lv.avarinitanyall(b, any, all)
any.AndNot(any, be.varkill)
all.AndNot(all, be.varkill)
any.Or(any, be.avarinit)
all.Or(all, be.avarinit)
if !any.Eq(be.avarinitany) {
change = true
be.avarinitany.Copy(any)
}
if !all.Eq(be.avarinitall) {
change = true
be.avarinitall.Copy(all)
}
}
}
// Iterate through the blocks in reverse round-robin fashion. A work
// queue might be slightly faster. As is, the number of iterations is
// so low that it hardly seems to be worth the complexity.
for change := true; change; {
change = false
for _, b := range po {
be := lv.blockEffects(b)
newliveout.Clear()
switch b.Kind {
case ssa.BlockRet:
for _, pos := range lv.cache.retuevar {
newliveout.Set(pos)
}
case ssa.BlockRetJmp:
for _, pos := range lv.cache.tailuevar {
newliveout.Set(pos)
}
case ssa.BlockExit:
// nothing to do
default:
// A variable is live on output from this block
// if it is live on input to some successor.
//
// out[b] = \bigcup_{s \in succ[b]} in[s]
newliveout.Copy(lv.blockEffects(b.Succs[0].Block()).livein)
for _, succ := range b.Succs[1:] {
newliveout.Or(newliveout, lv.blockEffects(succ.Block()).livein)
}
}
if !be.liveout.Eq(newliveout) {
change = true
be.liveout.Copy(newliveout)
}
// A variable is live on input to this block
// if it is live on output from this block and
// not set by the code in this block.
//
// in[b] = uevar[b] \cup (out[b] \setminus varkill[b])
newlivein.AndNot(be.liveout, be.varkill)
be.livein.Or(newlivein, be.uevar)
}
}
}
// Visits all instructions in a basic block and computes a bit vector of live
// variables at each safe point locations.
func (lv *Liveness) epilogue() {
nvars := int32(len(lv.vars))
liveout := bvalloc(nvars)
any := bvalloc(nvars)
all := bvalloc(nvars)
livedefer := bvalloc(nvars) // always-live variables
// If there is a defer (that could recover), then all output
// parameters are live all the time. In addition, any locals
// that are pointers to heap-allocated output parameters are
// also always live (post-deferreturn code needs these
// pointers to copy values back to the stack).
// TODO: if the output parameter is heap-allocated, then we
// don't need to keep the stack copy live?
if lv.fn.Func.HasDefer() {
for i, n := range lv.vars {
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if n.Class() == PPARAMOUT {
if n.IsOutputParamHeapAddr() {
// Just to be paranoid. Heap addresses are PAUTOs.
Fatalf("variable %v both output param and heap output param", n)
}
if n.Name.Param.Heapaddr != nil {
// If this variable moved to the heap, then
// its stack copy is not live.
continue
}
// Note: zeroing is handled by zeroResults in walk.go.
livedefer.Set(int32(i))
}
if n.IsOutputParamHeapAddr() {
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
n.Name.SetNeedzero(true)
livedefer.Set(int32(i))
}
}
}
{
// Reserve an entry for function entry.
live := bvalloc(nvars)
for _, pos := range lv.cache.textavarinit {
live.Set(pos)
}
lv.livevars = append(lv.livevars, live)
}
for _, b := range lv.f.Blocks {
be := lv.blockEffects(b)
// Compute avarinitany and avarinitall for entry to block.
// This duplicates information known during Liveness.solve
// but avoids storing two more vectors for each block.
lv.avarinitanyall(b, any, all)
// Walk forward through the basic block instructions and
// allocate liveness maps for those instructions that need them.
// Seed the maps with information about the addrtaken variables.
for _, v := range b.Values {
pos, e := lv.valueEffects(v)
if e&varkill != 0 {
cmd/compile: make liveness more efficient When the number of variables in a function is very large, liveness analysis gets less efficient, since every bit vector is O(number of variables). Improve the situation by returning a sparse representation from progeffects. In all scenarios, progeffects either returns a slice that is shared function-wide, and which is usually small, or a slice that is guaranteed to have at most three values. Reduces compilation time for the code in #8225 Comment 1 by ~10%. Minor effects on regular packages (below). Passes toolstash -cmp. Updates #8225 name old time/op new time/op delta Template 215ms ± 2% 212ms ± 4% -1.31% (p=0.001 n=30+30) Unicode 98.3ms ± 3% 98.4ms ± 5% ~ (p=0.971 n=30+30) GoTypes 657ms ± 3% 651ms ± 2% -0.98% (p=0.001 n=30+27) Compiler 2.78s ± 2% 2.77s ± 2% -0.60% (p=0.006 n=30+30) Flate 130ms ± 4% 130ms ± 4% ~ (p=0.712 n=29+30) GoParser 159ms ± 5% 158ms ± 3% ~ (p=0.331 n=29+30) Reflect 406ms ± 3% 404ms ± 3% -0.69% (p=0.041 n=29+30) Tar 117ms ± 4% 117ms ± 3% ~ (p=0.886 n=30+29) XML 219ms ± 2% 217ms ± 2% ~ (p=0.091 n=29+24) name old user-ns/op new user-ns/op delta Template 272user-ms ± 3% 270user-ms ± 3% -1.03% (p=0.004 n=30+30) Unicode 138user-ms ± 2% 138user-ms ± 3% ~ (p=0.902 n=29+29) GoTypes 891user-ms ± 2% 883user-ms ± 2% -0.95% (p=0.000 n=29+29) Compiler 3.85user-s ± 2% 3.84user-s ± 2% ~ (p=0.236 n=30+30) Flate 167user-ms ± 2% 166user-ms ± 4% ~ (p=0.511 n=28+30) GoParser 211user-ms ± 4% 210user-ms ± 3% ~ (p=0.287 n=29+30) Reflect 539user-ms ± 3% 536user-ms ± 2% -0.59% (p=0.034 n=29+30) Tar 154user-ms ± 3% 155user-ms ± 4% ~ (p=0.786 n=30+30) XML 289user-ms ± 3% 288user-ms ± 4% ~ (p=0.249 n=30+26) name old alloc/op new alloc/op delta Template 40.7MB ± 0% 40.8MB ± 0% +0.09% (p=0.001 n=30+30) Unicode 30.8MB ± 0% 30.8MB ± 0% ~ (p=0.112 n=30+30) GoTypes 123MB ± 0% 124MB ± 0% +0.09% (p=0.000 n=30+30) Compiler 473MB ± 0% 473MB ± 0% +0.05% (p=0.000 n=30+30) Flate 26.5MB ± 0% 26.5MB ± 0% ~ (p=0.186 n=29+30) GoParser 32.3MB ± 0% 32.4MB ± 0% +0.07% (p=0.021 n=28+30) Reflect 84.4MB ± 0% 84.6MB ± 0% +0.21% (p=0.000 n=30+30) Tar 27.3MB ± 0% 27.3MB ± 0% +0.09% (p=0.010 n=30+28) XML 44.7MB ± 0% 44.7MB ± 0% +0.07% (p=0.002 n=30+30) name old allocs/op new allocs/op delta Template 401k ± 1% 400k ± 1% ~ (p=0.321 n=30+30) Unicode 331k ± 1% 331k ± 1% ~ (p=0.357 n=30+28) GoTypes 1.24M ± 0% 1.24M ± 1% -0.19% (p=0.001 n=30+30) Compiler 4.27M ± 0% 4.27M ± 0% -0.13% (p=0.000 n=30+30) Flate 252k ± 1% 251k ± 1% -0.30% (p=0.005 n=30+30) GoParser 325k ± 1% 325k ± 1% ~ (p=0.224 n=28+30) Reflect 1.06M ± 0% 1.05M ± 0% -0.34% (p=0.000 n=30+30) Tar 266k ± 1% 266k ± 1% ~ (p=0.333 n=30+30) XML 416k ± 1% 415k ± 1% ~ (p=0.144 n=30+29) Change-Id: I6ba67a9203516373062a2618122306da73333d98 Reviewed-on: https://go-review.googlesource.com/36211 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-01-14 23:43:26 -08:00
any.Unset(pos)
all.Unset(pos)
}
if e&avarinit != 0 {
cmd/compile: make liveness more efficient When the number of variables in a function is very large, liveness analysis gets less efficient, since every bit vector is O(number of variables). Improve the situation by returning a sparse representation from progeffects. In all scenarios, progeffects either returns a slice that is shared function-wide, and which is usually small, or a slice that is guaranteed to have at most three values. Reduces compilation time for the code in #8225 Comment 1 by ~10%. Minor effects on regular packages (below). Passes toolstash -cmp. Updates #8225 name old time/op new time/op delta Template 215ms ± 2% 212ms ± 4% -1.31% (p=0.001 n=30+30) Unicode 98.3ms ± 3% 98.4ms ± 5% ~ (p=0.971 n=30+30) GoTypes 657ms ± 3% 651ms ± 2% -0.98% (p=0.001 n=30+27) Compiler 2.78s ± 2% 2.77s ± 2% -0.60% (p=0.006 n=30+30) Flate 130ms ± 4% 130ms ± 4% ~ (p=0.712 n=29+30) GoParser 159ms ± 5% 158ms ± 3% ~ (p=0.331 n=29+30) Reflect 406ms ± 3% 404ms ± 3% -0.69% (p=0.041 n=29+30) Tar 117ms ± 4% 117ms ± 3% ~ (p=0.886 n=30+29) XML 219ms ± 2% 217ms ± 2% ~ (p=0.091 n=29+24) name old user-ns/op new user-ns/op delta Template 272user-ms ± 3% 270user-ms ± 3% -1.03% (p=0.004 n=30+30) Unicode 138user-ms ± 2% 138user-ms ± 3% ~ (p=0.902 n=29+29) GoTypes 891user-ms ± 2% 883user-ms ± 2% -0.95% (p=0.000 n=29+29) Compiler 3.85user-s ± 2% 3.84user-s ± 2% ~ (p=0.236 n=30+30) Flate 167user-ms ± 2% 166user-ms ± 4% ~ (p=0.511 n=28+30) GoParser 211user-ms ± 4% 210user-ms ± 3% ~ (p=0.287 n=29+30) Reflect 539user-ms ± 3% 536user-ms ± 2% -0.59% (p=0.034 n=29+30) Tar 154user-ms ± 3% 155user-ms ± 4% ~ (p=0.786 n=30+30) XML 289user-ms ± 3% 288user-ms ± 4% ~ (p=0.249 n=30+26) name old alloc/op new alloc/op delta Template 40.7MB ± 0% 40.8MB ± 0% +0.09% (p=0.001 n=30+30) Unicode 30.8MB ± 0% 30.8MB ± 0% ~ (p=0.112 n=30+30) GoTypes 123MB ± 0% 124MB ± 0% +0.09% (p=0.000 n=30+30) Compiler 473MB ± 0% 473MB ± 0% +0.05% (p=0.000 n=30+30) Flate 26.5MB ± 0% 26.5MB ± 0% ~ (p=0.186 n=29+30) GoParser 32.3MB ± 0% 32.4MB ± 0% +0.07% (p=0.021 n=28+30) Reflect 84.4MB ± 0% 84.6MB ± 0% +0.21% (p=0.000 n=30+30) Tar 27.3MB ± 0% 27.3MB ± 0% +0.09% (p=0.010 n=30+28) XML 44.7MB ± 0% 44.7MB ± 0% +0.07% (p=0.002 n=30+30) name old allocs/op new allocs/op delta Template 401k ± 1% 400k ± 1% ~ (p=0.321 n=30+30) Unicode 331k ± 1% 331k ± 1% ~ (p=0.357 n=30+28) GoTypes 1.24M ± 0% 1.24M ± 1% -0.19% (p=0.001 n=30+30) Compiler 4.27M ± 0% 4.27M ± 0% -0.13% (p=0.000 n=30+30) Flate 252k ± 1% 251k ± 1% -0.30% (p=0.005 n=30+30) GoParser 325k ± 1% 325k ± 1% ~ (p=0.224 n=28+30) Reflect 1.06M ± 0% 1.05M ± 0% -0.34% (p=0.000 n=30+30) Tar 266k ± 1% 266k ± 1% ~ (p=0.333 n=30+30) XML 416k ± 1% 415k ± 1% ~ (p=0.144 n=30+29) Change-Id: I6ba67a9203516373062a2618122306da73333d98 Reviewed-on: https://go-review.googlesource.com/36211 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-01-14 23:43:26 -08:00
any.Set(pos)
all.Set(pos)
}
if !lv.issafepoint(v) {
continue
}
// Annotate ambiguously live variables so that they can
// be zeroed at function entry and at VARKILL points.
// liveout is dead here and used as a temporary.
liveout.AndNot(any, all)
if !liveout.IsEmpty() {
for pos := int32(0); pos < liveout.n; pos++ {
if !liveout.Get(pos) {
continue
}
all.Set(pos) // silence future warnings in this block
n := lv.vars[pos]
if !n.Name.Needzero() {
n.Name.SetNeedzero(true)
if debuglive >= 1 {
Warnl(v.Pos, "%v: %L is ambiguously live", lv.fn.Func.Nname, n)
}
}
}
}
// Live stuff first.
live := bvalloc(nvars)
live.Copy(any)
lv.livevars = append(lv.livevars, live)
}
be.lastbitmapindex = len(lv.livevars) - 1
}
for _, b := range lv.f.Blocks {
be := lv.blockEffects(b)
// walk backward, construct maps at each safe point
index := int32(be.lastbitmapindex)
if index < 0 {
// the first block we encounter should have the ATEXT so
// at no point should pos ever be less than zero.
Fatalf("livenessepilogue")
}
liveout.Copy(be.liveout)
for i := len(b.Values) - 1; i >= 0; i-- {
v := b.Values[i]
if lv.issafepoint(v) {
// Found an interesting instruction, record the
// corresponding liveness information.
live := lv.livevars[index]
live.Or(live, liveout)
live.Or(live, livedefer) // only for non-entry safe points
index--
}
// Update liveness information.
pos, e := lv.valueEffects(v)
if e&varkill != 0 {
liveout.Unset(pos)
}
if e&uevar != 0 {
liveout.Set(pos)
}
}
if b == lv.f.Entry {
if index != 0 {
Fatalf("bad index for entry point: %v", index)
}
// Record live variables.
live := lv.livevars[index]
live.Or(live, liveout)
}
}
// Useful sanity check: on entry to the function,
// the only things that can possibly be live are the
// input parameters.
for j, n := range lv.vars {
cmd/compile: move Node.Class to flags Put it at position zero, since it is fairly hot. This shrinks gc.Node into a smaller size class on 64 bit systems. name old time/op new time/op delta Template 193ms ± 5% 192ms ± 3% ~ (p=0.353 n=94+93) Unicode 86.1ms ± 5% 85.0ms ± 4% -1.23% (p=0.000 n=95+98) GoTypes 546ms ± 3% 544ms ± 4% -0.40% (p=0.007 n=94+97) Compiler 2.56s ± 3% 2.54s ± 3% -0.67% (p=0.000 n=99+97) SSA 5.13s ± 2% 5.10s ± 3% -0.55% (p=0.000 n=94+98) Flate 122ms ± 6% 121ms ± 4% -0.75% (p=0.002 n=97+95) GoParser 144ms ± 5% 144ms ± 4% ~ (p=0.298 n=98+97) Reflect 348ms ± 4% 349ms ± 4% ~ (p=0.350 n=98+97) Tar 105ms ± 5% 104ms ± 5% ~ (p=0.154 n=96+98) XML 200ms ± 5% 198ms ± 4% -0.71% (p=0.015 n=97+98) [Geo mean] 330ms 328ms -0.52% name old user-time/op new user-time/op delta Template 229ms ±11% 224ms ± 7% -2.16% (p=0.001 n=100+87) Unicode 109ms ± 5% 109ms ± 6% ~ (p=0.897 n=96+91) GoTypes 712ms ± 4% 709ms ± 4% ~ (p=0.085 n=96+98) Compiler 3.41s ± 3% 3.36s ± 3% -1.43% (p=0.000 n=98+98) SSA 7.46s ± 3% 7.31s ± 3% -2.02% (p=0.000 n=100+99) Flate 145ms ± 6% 143ms ± 6% -1.11% (p=0.001 n=99+97) GoParser 177ms ± 5% 176ms ± 5% -0.78% (p=0.018 n=95+95) Reflect 432ms ± 7% 435ms ± 9% ~ (p=0.296 n=100+100) Tar 121ms ± 7% 121ms ± 5% ~ (p=0.072 n=100+95) XML 241ms ± 4% 239ms ± 5% ~ (p=0.085 n=97+99) [Geo mean] 413ms 410ms -0.73% name old alloc/op new alloc/op delta Template 38.4MB ± 0% 37.7MB ± 0% -1.85% (p=0.008 n=5+5) Unicode 30.1MB ± 0% 28.8MB ± 0% -4.09% (p=0.008 n=5+5) GoTypes 112MB ± 0% 110MB ± 0% -1.69% (p=0.008 n=5+5) Compiler 470MB ± 0% 461MB ± 0% -1.91% (p=0.008 n=5+5) SSA 1.13GB ± 0% 1.11GB ± 0% -1.70% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.6MB ± 0% -1.67% (p=0.008 n=5+5) GoParser 31.6MB ± 0% 31.1MB ± 0% -1.66% (p=0.008 n=5+5) Reflect 77.1MB ± 0% 75.8MB ± 0% -1.69% (p=0.008 n=5+5) Tar 26.3MB ± 0% 25.7MB ± 0% -2.06% (p=0.008 n=5+5) XML 41.9MB ± 0% 41.1MB ± 0% -1.93% (p=0.008 n=5+5) [Geo mean] 73.5MB 72.0MB -2.03% name old allocs/op new allocs/op delta Template 383k ± 0% 383k ± 0% ~ (p=0.690 n=5+5) Unicode 343k ± 0% 343k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.310 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% -0.17% (p=0.008 n=5+5) SSA 9.85M ± 0% 9.85M ± 0% ~ (p=0.310 n=5+5) Flate 236k ± 0% 236k ± 1% ~ (p=0.841 n=5+5) GoParser 320k ± 0% 320k ± 0% ~ (p=0.421 n=5+5) Reflect 988k ± 0% 987k ± 0% ~ (p=0.690 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.095 n=5+5) XML 399k ± 0% 399k ± 0% ~ (p=1.000 n=5+5) [Geo mean] 741k 740k -0.07% Change-Id: I9e952b58a98e30a12494304db9ce50d0a85e459c Reviewed-on: https://go-review.googlesource.com/41797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
2017-04-25 18:14:12 -07:00
if n.Class() != PPARAM && lv.livevars[0].Get(int32(j)) {
Fatalf("internal error: %v %L recorded as live on entry", lv.fn.Func.Nname, n)
}
}
}
func (lv *Liveness) clobber() {
// The clobberdead experiment inserts code to clobber all the dead variables (locals and args)
// before and after every safepoint. This experiment is useful for debugging the generation
// of live pointer bitmaps.
if objabi.Clobberdead_enabled == 0 {
return
}
var varSize int64
for _, n := range lv.vars {
varSize += n.Type.Size()
}
if len(lv.stackMaps) > 1000 || varSize > 10000 {
// Be careful to avoid doing too much work.
// Bail if >1000 safepoints or >10000 bytes of variables.
// Otherwise, giant functions make this experiment generate too much code.
return
}
if h := os.Getenv("GOCLOBBERDEADHASH"); h != "" {
// Clobber only functions where the hash of the function name matches a pattern.
// Useful for binary searching for a miscompiled function.
hstr := ""
for _, b := range sha1.Sum([]byte(lv.fn.funcname())) {
hstr += fmt.Sprintf("%08b", b)
}
if !strings.HasSuffix(hstr, h) {
return
}
fmt.Printf("\t\t\tCLOBBERDEAD %s\n", lv.fn.funcname())
}
if lv.f.Name == "forkAndExecInChild" {
// forkAndExecInChild calls vfork (on linux/amd64, anyway).
// The code we add here clobbers parts of the stack in the child.
// When the parent resumes, it is using the same stack frame. But the
// child has clobbered stack variables that the parent needs. Boom!
// In particular, the sys argument gets clobbered.
// Note to self: GOCLOBBERDEADHASH=011100101110
return
}
var oldSched []*ssa.Value
for _, b := range lv.f.Blocks {
// Copy block's values to a temporary.
oldSched = append(oldSched[:0], b.Values...)
b.Values = b.Values[:0]
// Clobber all dead variables at entry.
if b == lv.f.Entry {
for len(oldSched) > 0 && len(oldSched[0].Args) == 0 {
// Skip argless ops. We need to skip at least
// the lowered ClosurePtr op, because it
// really wants to be first. This will also
// skip ops like InitMem and SP, which are ok.
b.Values = append(b.Values, oldSched[0])
oldSched = oldSched[1:]
}
clobber(lv, b, lv.stackMaps[0])
}
// Copy values into schedule, adding clobbering around safepoints.
for _, v := range oldSched {
if !lv.issafepoint(v) {
b.Values = append(b.Values, v)
continue
}
before := true
if v.Op.IsCall() && v.Aux != nil && v.Aux.(*obj.LSym) == typedmemmove {
// Can't put clobber code before the call to typedmemmove.
// The variable to-be-copied is marked as dead
// at the callsite. That is ok, though, as typedmemmove
// is marked as nosplit, and the first thing it does
// is to call memmove (also nosplit), after which
// the source value is dead.
// See issue 16026.
before = false
}
if before {
clobber(lv, b, lv.stackMaps[lv.livenessMap.Get(v).stackMapIndex])
}
b.Values = append(b.Values, v)
clobber(lv, b, lv.stackMaps[lv.livenessMap.Get(v).stackMapIndex])
}
}
}
// clobber generates code to clobber all dead variables (those not marked in live).
// Clobbering instructions are added to the end of b.Values.
func clobber(lv *Liveness, b *ssa.Block, live bvec) {
for i, n := range lv.vars {
if !live.Get(int32(i)) {
clobberVar(b, n)
}
}
}
// clobberVar generates code to trash the pointers in v.
// Clobbering instructions are added to the end of b.Values.
func clobberVar(b *ssa.Block, v *Node) {
clobberWalk(b, v, 0, v.Type)
}
// b = block to which we append instructions
// v = variable
// offset = offset of (sub-portion of) variable to clobber (in bytes)
// t = type of sub-portion of v.
func clobberWalk(b *ssa.Block, v *Node, offset int64, t *types.Type) {
if !types.Haspointers(t) {
return
}
switch t.Etype {
case TPTR32,
TPTR64,
TUNSAFEPTR,
TFUNC,
TCHAN,
TMAP:
clobberPtr(b, v, offset)
case TSTRING:
// struct { byte *str; int len; }
clobberPtr(b, v, offset)
case TINTER:
// struct { Itab *tab; void *data; }
// or, when isnilinter(t)==true:
// struct { Type *type; void *data; }
// Note: the first word isn't a pointer. See comment in plive.go:onebitwalktype1.
clobberPtr(b, v, offset+int64(Widthptr))
case TSLICE:
// struct { byte *array; int len; int cap; }
clobberPtr(b, v, offset)
case TARRAY:
for i := int64(0); i < t.NumElem(); i++ {
clobberWalk(b, v, offset+i*t.Elem().Size(), t.Elem())
}
case TSTRUCT:
for _, t1 := range t.Fields().Slice() {
clobberWalk(b, v, offset+t1.Offset, t1.Type)
}
default:
Fatalf("clobberWalk: unexpected type, %v", t)
}
}
// clobberPtr generates a clobber of the pointer at offset offset in v.
// The clobber instruction is added at the end of b.
func clobberPtr(b *ssa.Block, v *Node, offset int64) {
b.NewValue0IA(src.NoXPos, ssa.OpClobber, types.TypeVoid, offset, v)
}
func (lv *Liveness) avarinitanyall(b *ssa.Block, any, all bvec) {
if len(b.Preds) == 0 {
any.Clear()
all.Clear()
for _, pos := range lv.cache.textavarinit {
any.Set(pos)
all.Set(pos)
}
return
}
be := lv.blockEffects(b.Preds[0].Block())
any.Copy(be.avarinitany)
all.Copy(be.avarinitall)
for _, pred := range b.Preds[1:] {
be := lv.blockEffects(pred.Block())
any.Or(any, be.avarinitany)
all.And(all, be.avarinitall)
}
}
// FNV-1 hash function constants.
const (
H0 = 2166136261
Hp = 16777619
)
func hashbitmap(h uint32, bv bvec) uint32 {
n := int((bv.n + 31) / 32)
for i := 0; i < n; i++ {
w := bv.b[i]
h = (h * Hp) ^ (w & 0xff)
h = (h * Hp) ^ ((w >> 8) & 0xff)
h = (h * Hp) ^ ((w >> 16) & 0xff)
h = (h * Hp) ^ ((w >> 24) & 0xff)
}
return h
}
// Compact liveness information by coalescing identical per-call-site bitmaps.
// The merging only happens for a single function, not across the entire binary.
//
// There are actually two lists of bitmaps, one list for the local variables and one
// list for the function arguments. Both lists are indexed by the same PCDATA
// index, so the corresponding pairs must be considered together when
// merging duplicates. The argument bitmaps change much less often during
// function execution than the local variable bitmaps, so it is possible that
// we could introduce a separate PCDATA index for arguments vs locals and
// then compact the set of argument bitmaps separately from the set of
// local variable bitmaps. As of 2014-04-02, doing this to the godoc binary
// is actually a net loss: we save about 50k of argument bitmaps but the new
// PCDATA tables cost about 100k. So for now we keep using a single index for
// both bitmap lists.
func (lv *Liveness) compact() {
// Linear probing hash table of bitmaps seen so far.
// The hash table has 4n entries to keep the linear
// scan short. An entry of -1 indicates an empty slot.
n := len(lv.livevars)
tablesize := 4 * n
table := make([]int, tablesize)
for i := range table {
table[i] = -1
}
// remap[i] = the new index of the old bit vector #i.
remap := make([]int, n)
for i := range remap {
remap[i] = -1
}
// Consider bit vectors in turn.
// If new, assign next number using uniq,
// record in remap, record in lv.livevars
// under the new index, and add entry to hash table.
// If already seen, record earlier index in remap.
Outer:
for i, live := range lv.livevars {
h := hashbitmap(H0, live) % uint32(tablesize)
for {
j := table[h]
if j < 0 {
break
}
jlive := lv.stackMaps[j]
if live.Eq(jlive) {
remap[i] = j
continue Outer
}
h++
if h == uint32(tablesize) {
h = 0
}
}
table[h] = len(lv.stackMaps)
remap[i] = len(lv.stackMaps)
lv.stackMaps = append(lv.stackMaps, live)
}
// Clear lv.livevars to allow GC of duplicate maps and to
// prevent accidental use.
lv.livevars = nil
// Record compacted stack map indexes for each value.
// These will later become PCDATA instructions.
lv.showlive(nil, lv.stackMaps[0])
pos := 1
lv.livenessMap = LivenessMap{make(map[*ssa.Value]LivenessIndex)}
for _, b := range lv.f.Blocks {
for _, v := range b.Values {
if lv.issafepoint(v) {
lv.showlive(v, lv.stackMaps[remap[pos]])
lv.livenessMap.m[v] = LivenessIndex{remap[pos]}
pos++
}
}
}
}
func (lv *Liveness) showlive(v *ssa.Value, live bvec) {
if debuglive == 0 || lv.fn.funcname() == "init" || strings.HasPrefix(lv.fn.funcname(), ".") {
return
}
if !(v == nil || v.Op.IsCall()) {
// Historically we only printed this information at
// calls. Keep doing so.
return
}
if live.IsEmpty() {
return
}
pos := lv.fn.Func.Nname.Pos
if v != nil {
pos = v.Pos
}
s := "live at "
if v == nil {
s += fmt.Sprintf("entry to %s:", lv.fn.funcname())
} else if sym, ok := v.Aux.(*obj.LSym); ok {
fn := sym.Name
if pos := strings.Index(fn, "."); pos >= 0 {
fn = fn[pos+1:]
}
s += fmt.Sprintf("call to %s:", fn)
} else {
s += "indirect call:"
}
for j, n := range lv.vars {
if live.Get(int32(j)) {
s += fmt.Sprintf(" %v", n)
}
}
Warnl(pos, s)
}
func (lv *Liveness) printbvec(printed bool, name string, live bvec) bool {
started := false
for i, n := range lv.vars {
if !live.Get(int32(i)) {
continue
}
if !started {
if !printed {
fmt.Printf("\t")
} else {
fmt.Printf(" ")
}
started = true
printed = true
fmt.Printf("%s=", name)
} else {
fmt.Printf(",")
}
fmt.Printf("%s", n.Sym.Name)
}
return printed
}
// printeffect is like printbvec, but for a single variable.
func (lv *Liveness) printeffect(printed bool, name string, pos int32, x bool) bool {
if !x {
return printed
}
if !printed {
fmt.Printf("\t")
} else {
fmt.Printf(" ")
}
fmt.Printf("%s=%s", name, lv.vars[pos].Sym.Name)
return true
}
// Prints the computed liveness information and inputs, for debugging.
// This format synthesizes the information used during the multiple passes
// into a single presentation.
func (lv *Liveness) printDebug() {
fmt.Printf("liveness: %s\n", lv.fn.funcname())
pcdata := 0
for i, b := range lv.f.Blocks {
if i > 0 {
fmt.Printf("\n")
}
// bb#0 pred=1,2 succ=3,4
fmt.Printf("bb#%d pred=", b.ID)
for j, pred := range b.Preds {
if j > 0 {
fmt.Printf(",")
}
fmt.Printf("%d", pred.Block().ID)
}
fmt.Printf(" succ=")
for j, succ := range b.Succs {
if j > 0 {
fmt.Printf(",")
}
fmt.Printf("%d", succ.Block().ID)
}
fmt.Printf("\n")
be := lv.blockEffects(b)
// initial settings
printed := false
printed = lv.printbvec(printed, "uevar", be.uevar)
printed = lv.printbvec(printed, "livein", be.livein)
if printed {
fmt.Printf("\n")
}
// program listing, with individual effects listed
if b == lv.f.Entry {
live := lv.stackMaps[pcdata]
fmt.Printf("(%s) function entry\n", linestr(lv.fn.Func.Nname.Pos))
fmt.Printf("\tlive=")
printed = false
for j, n := range lv.vars {
if !live.Get(int32(j)) {
continue
}
if printed {
fmt.Printf(",")
}
fmt.Printf("%v", n)
printed = true
}
fmt.Printf("\n")
}
for _, v := range b.Values {
fmt.Printf("(%s) %v\n", linestr(v.Pos), v.LongString())
if pos := lv.livenessMap.Get(v); pos.Valid() {
pcdata = pos.stackMapIndex
}
pos, effect := lv.valueEffects(v)
printed = false
printed = lv.printeffect(printed, "uevar", pos, effect&uevar != 0)
printed = lv.printeffect(printed, "varkill", pos, effect&varkill != 0)
printed = lv.printeffect(printed, "avarinit", pos, effect&avarinit != 0)
if printed {
fmt.Printf("\n")
}
if !lv.issafepoint(v) {
continue
}
live := lv.stackMaps[pcdata]
fmt.Printf("\tlive=")
printed = false
for j, n := range lv.vars {
if !live.Get(int32(j)) {
continue
}
if printed {
fmt.Printf(",")
}
fmt.Printf("%v", n)
printed = true
}
fmt.Printf("\n")
}
// bb bitsets
fmt.Printf("end\n")
printed = false
printed = lv.printbvec(printed, "varkill", be.varkill)
printed = lv.printbvec(printed, "liveout", be.liveout)
printed = lv.printbvec(printed, "avarinit", be.avarinit)
printed = lv.printbvec(printed, "avarinitany", be.avarinitany)
printed = lv.printbvec(printed, "avarinitall", be.avarinitall)
if printed {
fmt.Printf("\n")
}
}
fmt.Printf("\n")
}
// Dumps a slice of bitmaps to a symbol as a sequence of uint32 values. The
// first word dumped is the total number of bitmaps. The second word is the
// length of the bitmaps. All bitmaps are assumed to be of equal length. The
// remaining bytes are the raw bitmaps.
func (lv *Liveness) emit(argssym, livesym *obj.LSym) {
cmd/compile: shrink liveness maps The GC maps don't care about trailing non-pointers in args. Work harder to eliminate them. This should provide a slight speedup to everything that reads these maps, mainly GC and stack copying. The non-ptr-y runtime benchmarks happen to go from having a non-empty args map to an empty args map, so they have a significant speedup. name old time/op new time/op delta StackCopyPtr-8 80.2ms ± 4% 79.7ms ± 2% -0.63% (p=0.001 n=94+91) StackCopy-8 63.3ms ± 3% 59.2ms ± 3% -6.45% (p=0.000 n=98+97) StackCopyNoCache-8 107ms ± 3% 98ms ± 3% -8.00% (p=0.000 n=95+88) It also shrinks object files a tiny bit: name old object-bytes new object-bytes delta Template 476kB ± 0% 476kB ± 0% -0.03% (p=0.008 n=5+5) Unicode 218kB ± 0% 218kB ± 0% -0.09% (p=0.008 n=5+5) GoTypes 1.58MB ± 0% 1.58MB ± 0% -0.03% (p=0.008 n=5+5) Compiler 6.25MB ± 0% 6.24MB ± 0% -0.06% (p=0.008 n=5+5) SSA 15.9MB ± 0% 15.9MB ± 0% -0.06% (p=0.008 n=5+5) Flate 304kB ± 0% 303kB ± 0% -0.29% (p=0.008 n=5+5) GoParser 370kB ± 0% 370kB ± 0% +0.02% (p=0.008 n=5+5) Reflect 1.27MB ± 0% 1.27MB ± 0% -0.07% (p=0.008 n=5+5) Tar 421kB ± 0% 421kB ± 0% -0.05% (p=0.008 n=5+5) XML 518kB ± 0% 517kB ± 0% -0.06% (p=0.008 n=5+5) [Geo mean] 934kB 933kB -0.07% Note that some object files do grow; this can happen because some maps that were duplicates of each others must be stored separately. Change-Id: Ie076891bd8e9d269ff2ff5435d5d25c721e0e31d Reviewed-on: https://go-review.googlesource.com/104175 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-04-02 14:21:27 -07:00
// Size args bitmaps to be just large enough to hold the largest pointer.
// First, find the largest Xoffset node we care about.
// (Nodes without pointers aren't in lv.vars; see livenessShouldTrack.)
var maxArgNode *Node
for _, n := range lv.vars {
switch n.Class() {
case PPARAM, PPARAMOUT:
if maxArgNode == nil || n.Xoffset > maxArgNode.Xoffset {
maxArgNode = n
}
}
}
// Next, find the offset of the largest pointer in the largest node.
var maxArgs int64
if maxArgNode != nil {
maxArgs = maxArgNode.Xoffset + typeptrdata(maxArgNode.Type)
}
// Size locals bitmaps to be stkptrsize sized.
// We cannot shrink them to only hold the largest pointer,
// because their size is used to calculate the beginning
// of the local variables frame.
// Further discussion in https://golang.org/cl/104175.
// TODO: consider trimming leading zeros.
// This would require shifting all bitmaps.
maxLocals := lv.stkptrsize
args := bvalloc(int32(maxArgs / int64(Widthptr)))
aoff := duint32(argssym, 0, uint32(len(lv.stackMaps))) // number of bitmaps
aoff = duint32(argssym, aoff, uint32(args.n)) // number of bits in each bitmap
cmd/compile: shrink liveness maps The GC maps don't care about trailing non-pointers in args. Work harder to eliminate them. This should provide a slight speedup to everything that reads these maps, mainly GC and stack copying. The non-ptr-y runtime benchmarks happen to go from having a non-empty args map to an empty args map, so they have a significant speedup. name old time/op new time/op delta StackCopyPtr-8 80.2ms ± 4% 79.7ms ± 2% -0.63% (p=0.001 n=94+91) StackCopy-8 63.3ms ± 3% 59.2ms ± 3% -6.45% (p=0.000 n=98+97) StackCopyNoCache-8 107ms ± 3% 98ms ± 3% -8.00% (p=0.000 n=95+88) It also shrinks object files a tiny bit: name old object-bytes new object-bytes delta Template 476kB ± 0% 476kB ± 0% -0.03% (p=0.008 n=5+5) Unicode 218kB ± 0% 218kB ± 0% -0.09% (p=0.008 n=5+5) GoTypes 1.58MB ± 0% 1.58MB ± 0% -0.03% (p=0.008 n=5+5) Compiler 6.25MB ± 0% 6.24MB ± 0% -0.06% (p=0.008 n=5+5) SSA 15.9MB ± 0% 15.9MB ± 0% -0.06% (p=0.008 n=5+5) Flate 304kB ± 0% 303kB ± 0% -0.29% (p=0.008 n=5+5) GoParser 370kB ± 0% 370kB ± 0% +0.02% (p=0.008 n=5+5) Reflect 1.27MB ± 0% 1.27MB ± 0% -0.07% (p=0.008 n=5+5) Tar 421kB ± 0% 421kB ± 0% -0.05% (p=0.008 n=5+5) XML 518kB ± 0% 517kB ± 0% -0.06% (p=0.008 n=5+5) [Geo mean] 934kB 933kB -0.07% Note that some object files do grow; this can happen because some maps that were duplicates of each others must be stored separately. Change-Id: Ie076891bd8e9d269ff2ff5435d5d25c721e0e31d Reviewed-on: https://go-review.googlesource.com/104175 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-04-02 14:21:27 -07:00
locals := bvalloc(int32(maxLocals / int64(Widthptr)))
loff := duint32(livesym, 0, uint32(len(lv.stackMaps))) // number of bitmaps
loff = duint32(livesym, loff, uint32(locals.n)) // number of bits in each bitmap
for _, live := range lv.stackMaps {
args.Clear()
locals.Clear()
lv.pointerMap(live, lv.vars, args, locals)
aoff = dbvec(argssym, aoff, args)
loff = dbvec(livesym, loff, locals)
}
cmd/internal/obj: rework gclocals handling The compiler handled gcargs and gclocals LSyms unusually. It generated placeholder symbols (makefuncdatasym), filled them in, and then renamed them for content-addressability. This is an important binary size optimization; the same locals information occurs over and over. This CL continues to treat these LSyms unusually, but in a slightly more explicit way, and importantly for concurrent compilation, in a way that does not require concurrent modification of Ctxt.Hash. Instead of creating gcargs and gclocals in the usual way, by creating a types.Sym and then an obj.LSym, we add them directly to obj.FuncInfo, initialize them in obj.InitTextSym, and deduplicate and add them to ctxt.Data at the end. Then the backend's job is simply to fill them in and rename them appropriately. Updates #15756 name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.7MB ± 0% -0.22% (p=0.016 n=5+5) Unicode 29.8MB ± 0% 29.8MB ± 0% ~ (p=0.690 n=5+5) GoTypes 113MB ± 0% 113MB ± 0% -0.24% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.24GB ± 0% -0.39% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.2MB ± 0% -0.43% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 31.7MB ± 0% -0.22% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 77.6MB ± 0% -0.80% (p=0.008 n=5+5) Tar 26.6MB ± 0% 26.3MB ± 0% -0.85% (p=0.008 n=5+5) XML 42.4MB ± 0% 41.9MB ± 0% -1.04% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 378k ± 0% 377k ± 1% ~ (p=0.151 n=5+5) Unicode 321k ± 1% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% -0.47% (p=0.016 n=5+5) SSA 9.71M ± 0% 9.67M ± 0% -0.33% (p=0.008 n=5+5) Flate 233k ± 1% 232k ± 1% ~ (p=0.151 n=5+5) GoParser 316k ± 0% 315k ± 0% -0.49% (p=0.016 n=5+5) Reflect 979k ± 0% 972k ± 0% -0.75% (p=0.008 n=5+5) Tar 250k ± 0% 247k ± 1% -0.92% (p=0.008 n=5+5) XML 392k ± 1% 389k ± 0% -0.67% (p=0.008 n=5+5) Change-Id: Idc36186ca9d2f8214b5f7720bbc27b6bb22fdc48 Reviewed-on: https://go-review.googlesource.com/40697 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-14 06:35:53 -07:00
// Give these LSyms content-addressable names,
// so that they can be de-duplicated.
// This provides significant binary size savings.
// It is safe to rename these LSyms because
// they are tracked separately from ctxt.hash.
cmd/internal/obj: rework gclocals handling The compiler handled gcargs and gclocals LSyms unusually. It generated placeholder symbols (makefuncdatasym), filled them in, and then renamed them for content-addressability. This is an important binary size optimization; the same locals information occurs over and over. This CL continues to treat these LSyms unusually, but in a slightly more explicit way, and importantly for concurrent compilation, in a way that does not require concurrent modification of Ctxt.Hash. Instead of creating gcargs and gclocals in the usual way, by creating a types.Sym and then an obj.LSym, we add them directly to obj.FuncInfo, initialize them in obj.InitTextSym, and deduplicate and add them to ctxt.Data at the end. Then the backend's job is simply to fill them in and rename them appropriately. Updates #15756 name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.7MB ± 0% -0.22% (p=0.016 n=5+5) Unicode 29.8MB ± 0% 29.8MB ± 0% ~ (p=0.690 n=5+5) GoTypes 113MB ± 0% 113MB ± 0% -0.24% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.24GB ± 0% -0.39% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.2MB ± 0% -0.43% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 31.7MB ± 0% -0.22% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 77.6MB ± 0% -0.80% (p=0.008 n=5+5) Tar 26.6MB ± 0% 26.3MB ± 0% -0.85% (p=0.008 n=5+5) XML 42.4MB ± 0% 41.9MB ± 0% -1.04% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 378k ± 0% 377k ± 1% ~ (p=0.151 n=5+5) Unicode 321k ± 1% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% -0.47% (p=0.016 n=5+5) SSA 9.71M ± 0% 9.67M ± 0% -0.33% (p=0.008 n=5+5) Flate 233k ± 1% 232k ± 1% ~ (p=0.151 n=5+5) GoParser 316k ± 0% 315k ± 0% -0.49% (p=0.016 n=5+5) Reflect 979k ± 0% 972k ± 0% -0.75% (p=0.008 n=5+5) Tar 250k ± 0% 247k ± 1% -0.92% (p=0.008 n=5+5) XML 392k ± 1% 389k ± 0% -0.67% (p=0.008 n=5+5) Change-Id: Idc36186ca9d2f8214b5f7720bbc27b6bb22fdc48 Reviewed-on: https://go-review.googlesource.com/40697 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-14 06:35:53 -07:00
argssym.Name = fmt.Sprintf("gclocals·%x", md5.Sum(argssym.P))
livesym.Name = fmt.Sprintf("gclocals·%x", md5.Sum(livesym.P))
}
// Entry pointer for liveness analysis. Solves for the liveness of
// pointer variables in the function and emits a runtime data
// structure read by the garbage collector.
// Returns a map from GC safe points to their corresponding stack map index.
func liveness(e *ssafn, f *ssa.Func) LivenessMap {
// Construct the global liveness state.
cmd/compile: use a map to track liveness variable indices It is not safe to modify Node.Opt in the backend. Instead of using Node.Opt to store liveness variable indices, use a map. This simplifies the code and makes it much more clearly race-free. There are generally few such variables, so the maps are not a significant source of allocations; this also remove some allocations from putting int32s into interfaces. Because map lookups are more expensive than interface value extraction, reorder valueEffects to do the map lookup last. The only remaining use of Node.Opt is now in esc.go. Passes toolstash-check. Fixes #20144 name old alloc/op new alloc/op delta Template 37.8MB ± 0% 37.9MB ± 0% ~ (p=0.548 n=5+5) Unicode 28.9MB ± 0% 28.9MB ± 0% ~ (p=0.548 n=5+5) GoTypes 110MB ± 0% 110MB ± 0% +0.16% (p=0.008 n=5+5) Compiler 461MB ± 0% 462MB ± 0% +0.08% (p=0.008 n=5+5) SSA 1.11GB ± 0% 1.11GB ± 0% +0.11% (p=0.008 n=5+5) Flate 24.7MB ± 0% 24.7MB ± 0% ~ (p=0.690 n=5+5) GoParser 31.1MB ± 0% 31.1MB ± 0% ~ (p=0.841 n=5+5) Reflect 73.7MB ± 0% 73.8MB ± 0% +0.23% (p=0.008 n=5+5) Tar 25.8MB ± 0% 25.7MB ± 0% ~ (p=0.690 n=5+5) XML 41.2MB ± 0% 41.2MB ± 0% ~ (p=0.841 n=5+5) [Geo mean] 71.9MB 71.9MB +0.06% name old allocs/op new allocs/op delta Template 385k ± 0% 384k ± 0% ~ (p=0.548 n=5+5) Unicode 344k ± 0% 343k ± 1% ~ (p=0.421 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.690 n=5+5) Compiler 4.43M ± 0% 4.42M ± 0% ~ (p=0.095 n=5+5) SSA 9.86M ± 0% 9.84M ± 0% -0.19% (p=0.008 n=5+5) Flate 238k ± 0% 238k ± 0% ~ (p=1.000 n=5+5) GoParser 321k ± 0% 320k ± 0% ~ (p=0.310 n=5+5) Reflect 956k ± 0% 956k ± 0% ~ (p=1.000 n=5+5) Tar 252k ± 0% 251k ± 0% ~ (p=0.056 n=5+5) XML 402k ± 1% 400k ± 1% -0.57% (p=0.032 n=5+5) [Geo mean] 740k 739k -0.19% Change-Id: Id5916c9def76add272e89c59fe10968f0a6bb01d Reviewed-on: https://go-review.googlesource.com/42135 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-27 16:27:47 -07:00
vars, idx := getvariables(e.curfn)
lv := newliveness(e.curfn, f, vars, idx, e.stkptrsize)
// Run the dataflow framework.
lv.prologue()
lv.solve()
lv.epilogue()
lv.compact()
lv.clobber()
if debuglive >= 2 {
lv.printDebug()
}
// Emit the live pointer map data structures
cmd/internal/obj: rework gclocals handling The compiler handled gcargs and gclocals LSyms unusually. It generated placeholder symbols (makefuncdatasym), filled them in, and then renamed them for content-addressability. This is an important binary size optimization; the same locals information occurs over and over. This CL continues to treat these LSyms unusually, but in a slightly more explicit way, and importantly for concurrent compilation, in a way that does not require concurrent modification of Ctxt.Hash. Instead of creating gcargs and gclocals in the usual way, by creating a types.Sym and then an obj.LSym, we add them directly to obj.FuncInfo, initialize them in obj.InitTextSym, and deduplicate and add them to ctxt.Data at the end. Then the backend's job is simply to fill them in and rename them appropriately. Updates #15756 name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.7MB ± 0% -0.22% (p=0.016 n=5+5) Unicode 29.8MB ± 0% 29.8MB ± 0% ~ (p=0.690 n=5+5) GoTypes 113MB ± 0% 113MB ± 0% -0.24% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.24GB ± 0% -0.39% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.2MB ± 0% -0.43% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 31.7MB ± 0% -0.22% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 77.6MB ± 0% -0.80% (p=0.008 n=5+5) Tar 26.6MB ± 0% 26.3MB ± 0% -0.85% (p=0.008 n=5+5) XML 42.4MB ± 0% 41.9MB ± 0% -1.04% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 378k ± 0% 377k ± 1% ~ (p=0.151 n=5+5) Unicode 321k ± 1% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% -0.47% (p=0.016 n=5+5) SSA 9.71M ± 0% 9.67M ± 0% -0.33% (p=0.008 n=5+5) Flate 233k ± 1% 232k ± 1% ~ (p=0.151 n=5+5) GoParser 316k ± 0% 315k ± 0% -0.49% (p=0.016 n=5+5) Reflect 979k ± 0% 972k ± 0% -0.75% (p=0.008 n=5+5) Tar 250k ± 0% 247k ± 1% -0.92% (p=0.008 n=5+5) XML 392k ± 1% 389k ± 0% -0.67% (p=0.008 n=5+5) Change-Id: Idc36186ca9d2f8214b5f7720bbc27b6bb22fdc48 Reviewed-on: https://go-review.googlesource.com/40697 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-14 06:35:53 -07:00
if ls := e.curfn.Func.lsym; ls != nil {
lv.emit(&ls.Func.GCArgs, &ls.Func.GCLocals)
cmd/internal/obj: rework gclocals handling The compiler handled gcargs and gclocals LSyms unusually. It generated placeholder symbols (makefuncdatasym), filled them in, and then renamed them for content-addressability. This is an important binary size optimization; the same locals information occurs over and over. This CL continues to treat these LSyms unusually, but in a slightly more explicit way, and importantly for concurrent compilation, in a way that does not require concurrent modification of Ctxt.Hash. Instead of creating gcargs and gclocals in the usual way, by creating a types.Sym and then an obj.LSym, we add them directly to obj.FuncInfo, initialize them in obj.InitTextSym, and deduplicate and add them to ctxt.Data at the end. Then the backend's job is simply to fill them in and rename them appropriately. Updates #15756 name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.7MB ± 0% -0.22% (p=0.016 n=5+5) Unicode 29.8MB ± 0% 29.8MB ± 0% ~ (p=0.690 n=5+5) GoTypes 113MB ± 0% 113MB ± 0% -0.24% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.24GB ± 0% -0.39% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.2MB ± 0% -0.43% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 31.7MB ± 0% -0.22% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 77.6MB ± 0% -0.80% (p=0.008 n=5+5) Tar 26.6MB ± 0% 26.3MB ± 0% -0.85% (p=0.008 n=5+5) XML 42.4MB ± 0% 41.9MB ± 0% -1.04% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 378k ± 0% 377k ± 1% ~ (p=0.151 n=5+5) Unicode 321k ± 1% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% -0.47% (p=0.016 n=5+5) SSA 9.71M ± 0% 9.67M ± 0% -0.33% (p=0.008 n=5+5) Flate 233k ± 1% 232k ± 1% ~ (p=0.151 n=5+5) GoParser 316k ± 0% 315k ± 0% -0.49% (p=0.016 n=5+5) Reflect 979k ± 0% 972k ± 0% -0.75% (p=0.008 n=5+5) Tar 250k ± 0% 247k ± 1% -0.92% (p=0.008 n=5+5) XML 392k ± 1% 389k ± 0% -0.67% (p=0.008 n=5+5) Change-Id: Idc36186ca9d2f8214b5f7720bbc27b6bb22fdc48 Reviewed-on: https://go-review.googlesource.com/40697 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-14 06:35:53 -07:00
}
return lv.livenessMap
}