By-hand rebase of earlier CL, because that was easier than
letting git try to figure things out.
This will naively insert self-moves; in the case that these
involve memory, the expander detects these and removes them
and their vardefs.
Change-Id: Icf72575eb7ae4a186b0de462bc8cf0bedc84d3e9
Reviewed-on: https://go-review.googlesource.com/c/go/+/279519
Trust: David Chase <drchase@google.com>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
This is a selected copy from the register ABI experiment CL, focused
on the files and data structures that handle spilling around morestack.
Unnecessary code from the experiment was removed, other code was adapted.
Would it make sense to leave comments in the experiment as pieces are
brought over?
Experiment CL (for comparison purposes)
https://go-review.googlesource.com/c/go/+/28832
Change-Id: I92136f070351d4fcca1407b52ecf9b80898fed95
Reviewed-on: https://go-review.googlesource.com/c/go/+/279520
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
It's currently hard to automate refactorings around the Value.Aux
field, because we don't have any static typing information for it.
Adding a tag interface will make subsequent CLs easier and safer.
Passes buildall w/ toolstash -cmp.
Updates #42982.
Change-Id: I41ae8e411a66bda3195a0957b60c2fe8a8002893
Reviewed-on: https://go-review.googlesource.com/c/go/+/275756
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Matthew Dempsky <mdempsky@google.com>
Still needs to generate the calls that will need lowering.
Change-Id: Ifd4e510193441a5e27c462c1f1d704f07bf6dec3
Reviewed-on: https://go-review.googlesource.com/c/go/+/242359
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
It was not necessarily consistent before, we were just lucky.
Change-Id: I3a92dc724e0af7b4d810a6a0b7b1d58844eb8f87
Reviewed-on: https://go-review.googlesource.com/c/go/+/251440
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Turns out if your failure is in a function with a name like "Reset()"
there will be a lot of hits on the same hashcode. Adding package sensitivity
solves this problem.
In additionm, it turned out that in the case that a logfile was specified
for the GOSSAHASH logging, that it was opened in create mode, which meant
that multiple compiler invocations would reset the file to zero length.
Opening in append mode works better; the automated harness
(github.com/dr2chase/gossahash) takes care of truncating the file before use.
Change-Id: I5601bc280faa94cbd507d302448831849db6c842
Reviewed-on: https://go-review.googlesource.com/c/go/+/246937
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
I noticed that there is a Todo comment here. This variable is only used for filename when dump a function's ssa passes result in details. It is no problem to print a function alone, but may be edited by not only one goroutine if dump multiple functions at the same time. Although it looks only dump one function's ssa passes now. As far as I am concerned this variable can be a member variable of the struct Func. I'm not sure if this change is necessary. Looking forward to your advices, thank you very much.
Change-Id: I35dd7247889e0cc7f19c0b400b597206592dee75
Reviewed-on: https://go-review.googlesource.com/c/go/+/244918
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Taking the live variable set from the last return point is problematic.
See #40629 for details, but there may not be a return point, or it may
be before the final defer.
Additionally, keeping track of the last call as a *Value doesn't quite
work. If it is dead-code eliminated, the storage for the Value is reused
for some other random instruction. Its live variable information,
if it is available at all, is wrong.
Instead, just mark all the open-defer argument slots as live
throughout the function. (They are already zero-initialized.)
Fixes#40629
Change-Id: Ie456c7db3082d0de57eaa5234a0f32525a1cce13
Reviewed-on: https://go-review.googlesource.com/c/go/+/247522
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dan Scales <danscales@google.com>
Once defined, a stack slot holding an open-coded defer arg should always be marked
live, since it may be used at any time if there is a panic. These stack slots are
typically kept live naturally by the open-defer code inlined at each return/exit point.
However, we need to do extra work to make sure that they are kept live if a
function has an infinite loop or a panic exit.
For this fix, only in the case of a function that is using open-coded defers, we
compute the set of blocks (most often empty) that cannot reach a return or a
BlockExit (panic) because of an infinite loop. Then, for each block b which
cannot reach a return or BlockExit or is a BlockExit block, we mark each defer arg
slot as live, as long as the definition of the defer arg slot dominates block b.
For this change, had to export (*Func).sdom (-> Sdom) and SparseTree.isAncestorEq
(-> IsAncestorEq)
Updates #35277
Change-Id: I7b53c9bd38ba384a3794386dd0eb94e4cbde4eb1
Reviewed-on: https://go-review.googlesource.com/c/go/+/204802
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).
When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.
In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).
I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().
The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).
Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ]
With normal (stack-allocated) defers only: 35.4 ns/op
With open-coded defers: 5.6 ns/op
Cost of function call alone (remove defer keyword): 4.4 ns/op
Text size increase (including funcdata) for go binary without/with open-coded defers: 0.09%
The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.
The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:
Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
Without open-coded defers: 62.0 ns/op
With open-coded defers: 255 ns/op
A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:
CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
Without open-coded defers: 443 ns/op
With open-coded defers: 347 ns/op
Updates #14939 (defer performance)
Updates #34481 (design doc)
Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/202340
Reviewed-by: Austin Clements <austin@google.com>
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).
When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.
In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).
I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().
The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).
Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ]
With normal (stack-allocated) defers only: 35.4 ns/op
With open-coded defers: 5.6 ns/op
Cost of function call alone (remove defer keyword): 4.4 ns/op
Text size increase (including funcdata) for go cmd without/with open-coded defers: 0.09%
The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.
The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:
Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
Without open-coded defers: 62.0 ns/op
With open-coded defers: 255 ns/op
A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:
CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
Without open-coded defers: 443 ns/op
With open-coded defers: 347 ns/op
Updates #14939 (defer performance)
Updates #34481 (design doc)
Change-Id: I51a389860b9676cfa1b84722f5fb84d3c4ee9e28
Reviewed-on: https://go-review.googlesource.com/c/go/+/190098
Reviewed-by: Austin Clements <austin@google.com>
This reduces allocations and also resolves some
lurking inliner/inlinee line-number match problems.
However, it does add about 1.5% to compile time.
This fixes compiler OOMs seen compiling some large protobuf-
derived inputs. For compiling the compiler itself,
compilebench -pkg cmd/compile/internal/ssa -memprofile withcl.prof
the numberlines-related memory consumption is reduced from 129MB
to 29MB (about a 5% overall reduction in allocation).
Additionally modified after going over changes with Austin
to remove unused code (nobody called size()) and correct
the cache-clearing code.
I've attempted to speed this up by not using maps, and have
not succeeded. I'd rather get correct code in now, speed it
up later if I can.
Updates #27739.
Fixes#29279.
Change-Id: I098005de4e45196a5f5b10c0886a49f88e9f8fd5
Reviewed-on: https://go-review.googlesource.com/c/go/+/154617
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
noteRule is useful when you're trying to debug
a particular rule, or get a general sense for
how often a rule fires overall.
It is less useful if you're trying to figure
out which functions might be useful to benchmark
to ascertain the impact of a newly added rule.
Enter countRule. You use it like noteRule,
except that you get per-function summaries.
Sample output:
# runtime
(*mspan).sweep: idx1=1
evacuate_faststr: idx1=1
evacuate_fast32: idx1=1
evacuate: idx1=2
evacuate_fast64: idx1=1
sweepone: idx1=1
purgecachedstats: idx1=1
mProf_Free: idx1=1
This suggests that the map benchmarks
might be good to run for this added rule.
Change-Id: Id471c3231f1736165f2020f6979ff01c29677808
Reviewed-on: https://go-review.googlesource.com/c/go/+/167088
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
A few examples (for accessing a slice of length 3):
s[-1] runtime error: index out of range [-1]
s[3] runtime error: index out of range [3] with length 3
s[-1:0] runtime error: slice bounds out of range [-1:]
s[3:0] runtime error: slice bounds out of range [3:0]
s[3:-1] runtime error: slice bounds out of range [:-1]
s[3:4] runtime error: slice bounds out of range [:4] with capacity 3
s[0:3:4] runtime error: slice bounds out of range [::4] with capacity 3
Note that in cases where there are multiple things wrong with the
indexes (e.g. s[3:-1]), we report one of those errors kind of
arbitrarily, currently the rightmost one.
An exhaustive set of examples is in issue30116[u].out in the CL.
The message text has the same prefix as the old message text. That
leads to slightly awkward phrasing but hopefully minimizes the chance
that code depending on the error text will break.
Increases the size of the go binary by 0.5% (amd64). The panic functions
take arguments in registers in order to keep the size of the compiled code
as small as possible.
Fixes#30116
Change-Id: Idb99a827b7888822ca34c240eca87b7e44a04fdd
Reviewed-on: https://go-review.googlesource.com/c/go/+/161477
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
This CL adds CFGs to ssa.html.
It execs dot to generate SVG,
which then gets inlined into the html.
Some standard naming and javascript hacks
enable integration with the rest of ssa.html.
Clicking on blocks highlights the relevant
part of the CFG, and vice versa.
Sample output and screenshots can be seen in #28177.
CFGs can be turned on with the suffix mask:
:* - dump CFG for every phase
:lower - just the lower phase
:lower-layout - lower through layout
:w,x-y - phases w and x through y
Calling dot after every pass is noticeably slow,
instead use the range of phases.
Dead blocks are not displayed on CFG.
User can zoom and pan individual CFG
when the automatic adjustment has failed.
Dot-related errors are reported
without bringing down the process.
Fixes#28177
Change-Id: Id52c42d86c4559ca737288aa10561b67a119c63d
Reviewed-on: https://go-review.googlesource.com/c/142517
Run-TryBot: Yury Smolsky <yury@smolsky.by>
Reviewed-by: David Chase <drchase@google.com>
Go documentation style for boolean funcs is to say:
// Foo reports whether ...
func Foo() bool
(rather than "returns true if")
This CL also replaces 4 uses of "iff" with the same "reports whether"
wording, which doesn't lose any meaning, and will prevent people from
sending typo fixes when they don't realize it's "if and only if". In
the past I think we've had the typo CLs updated to just say "reports
whether". So do them all at once.
(Inspired by the addition of another "returns true if" in CL 146938
in fd_plan9.go)
Created with:
$ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true iff" | grep -v vendor)
$ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true if" | grep -v vendor)
Change-Id: Ided502237f5ab0d25cb625dbab12529c361a8b9f
Reviewed-on: https://go-review.googlesource.com/c/147037
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This restores the printing of vXX and bYY in the left-hand
edge of the last column of ssa.html, where the generated
progs appear.
Change-Id: I81ab9b2fa5ae28e6e5de1b77665cfbed8d14e000
Reviewed-on: https://go-review.googlesource.com/c/141277
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Yury Smolsky <yury@smolsky.by>
Lack of a well-defined order between VarDef and related
address operations sometimes causes problems with store order
and write barrier transformations; glitches in the order are
made irreparable (by later optimizations) if the two parts of
the glitch straddle a split in the original block caused by
insertion of a write barrier diamond.
Fix this by creating a LocalAddr for addresses of locals
(what VarDef matters for) that takes a memory input to
help make the order explicit. Addr is modified to only
be legal for SB operand, so there is no overlap between
Addr and LocalAddr uses (there may be some downstream
cleanup from this).
Changes to generic.rules and rewrite.go ensure that codegen
tests continue to pass; CSE of LocalAddr is impaired, not
quite sure of the cost.
Fixes#26105.
Change-Id: Id4192b4440aa4e9d7ba54a465c456df9b530b515
Reviewed-on: https://go-review.googlesource.com/122483
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
This modifies issafepoint in liveness analysis to report almost every
operation as a safe point. There are four things we don't mark as
safe-points:
1. Runtime code (other than at calls).
2. go:nosplit functions (other than at calls).
3. Instructions between the load of the write barrier-enabled flag and
the write.
4. Instructions leading up to a uintptr -> unsafe.Pointer conversion.
We'll optimize this in later CLs:
name old time/op new time/op delta
Template 185ms ± 2% 190ms ± 2% +2.95% (p=0.000 n=10+10)
Unicode 96.3ms ± 3% 96.4ms ± 1% ~ (p=0.905 n=10+9)
GoTypes 658ms ± 0% 669ms ± 1% +1.72% (p=0.000 n=10+9)
Compiler 3.14s ± 1% 3.18s ± 1% +1.56% (p=0.000 n=9+10)
SSA 7.41s ± 2% 7.59s ± 1% +2.48% (p=0.000 n=9+10)
Flate 126ms ± 1% 128ms ± 1% +2.08% (p=0.000 n=10+10)
GoParser 153ms ± 1% 157ms ± 2% +2.38% (p=0.000 n=10+10)
Reflect 437ms ± 1% 442ms ± 1% +0.98% (p=0.001 n=10+10)
Tar 178ms ± 1% 179ms ± 1% +0.67% (p=0.035 n=10+9)
XML 223ms ± 1% 229ms ± 1% +2.58% (p=0.000 n=10+10)
[Geo mean] 394ms 401ms +1.75%
No effect on binary size because we're not yet emitting these extra
safe points.
For #24543.
Change-Id: I16a1eebb9183cad7cef9d53c0fd21a973cad6859
Reviewed-on: https://go-review.googlesource.com/109348
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
In prove, reuse posets between different functions by storing them
in the per-worker cache.
Allocation count regression caused by prove improvements is down
from 5% to 3% after this CL.
Updates #25179
Change-Id: I6d14003109833d9b3ef5165fdea00aa9c9e952e8
Reviewed-on: https://go-review.googlesource.com/110455
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
A new pass run after ssa building (before any other
optimization) identifies the "first" ssa node for each
statement. Other "noise" nodes are tagged as being never
appropriate for a statement boundary (e.g., VarKill, VarDef,
Phi).
Rewrite, deadcode, cse, and nilcheck are modified to move
the statement boundaries forward whenever possible if a
boundary-tagged ssa value is removed; never-boundary nodes
are ignored in this search (some operations involving
constants are also tagged as never-boundary and also ignored
because they are likely to be moved or removed during
optimization).
Code generation treats all nodes except those explicitly
marked as statement boundaries as "not statement" nodes,
and floats statement boundaries to the beginning of each
same-line run of instructions found within a basic block.
Line number html conversion was modified to make statement
boundary nodes a bit more obvious by prepending a "+".
The code in fuse.go that glued together the value slices
of two blocks produced a result that depended on the
former capacities (not lengths) of the two slices. This
causes differences in the 386 bootstrap, and also can
sometimes put values into an order that does a worse job
of preserving statement boundaries when values are removed.
Portions of two delve tests that had caught problems were
incorporated into ssa/debug_test.go. There are some
opportunities to do better with optimized code, but the
next-ing is not lying or overly jumpy.
Over 4 CLs, compilebench geomean measured binary size
increase of 3.5% and compile user time increase of 3.8%
(this is after optimization to reuse a sparse map instead
of creating multiple maps.)
This CL worsens the optimized-debugging experience with
Delve; we need to work with the delve team so that
they can use the is_stmt marks that we're emitting now.
The reference output changes from time to time depending
on other changes in the compiler, sometimes better,
sometimes worse.
This CL now includes a test ensuring that 99+% of the lines
in the Go command itself (a handy optimized binary) include
is_stmt markers.
Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a
Reviewed-on: https://go-review.googlesource.com/102435
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
ssa's pos parameter on the Const* funcs is unused, so remove it.
ld's alloc parameter on elfnote is always true, so remove the arguments
and simplify the code.
Finally, arm's addpltreloc never has its return parameter used, so
remove it.
Change-Id: I63387ecf6ab7b5f7c20df36be823322bb98427b8
Reviewed-on: https://go-review.googlesource.com/104456
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Allow the compiler to generate code like CMPQ 16(AX), $7
It's tricky because it's difficult to spill such a comparison during
flagalloc, because the same memory state might not be available at
the restore locations.
Solve this problem by decomposing the compare+load back into its parts
if it needs to be spilled.
The big win is that the write barrier test goes from:
MOVL runtime.writeBarrier(SB), CX
TESTL CX, CX
JNE 60
to
CMPL runtime.writeBarrier(SB), $0
JNE 59
It's one instruction and one byte smaller.
Fixes#19485Fixes#15245
Update #22460
Binaries are about 0.15% smaller.
Change-Id: I4fd8d1111b6b9924d52f9a0901ca1b2e5cce0836
Reviewed-on: https://go-review.googlesource.com/86035
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
The ssa backend is aggressive about placing constants and
certain other values in the Entry block. It's implausible
that the original line numbers for these constants makes
any sort of sense when it appears to a user stepping in a
debugger, and they're also not that useful in dumps since
entry-block instructions tend to be constants (i.e.,
unlikely to be the cause of a crash).
Therefore, use src.NoXPos for any values that are explicitly
inserted into a function's entry block.
Passes all tests, including ssa/debug_test.go with both
gdb and a fairly recent dlv. Hand-verified that it solves
the reported problem; constructed a test that reproduced
a problem, and fixed it.
Modified test harness to allow injection of slightly more
interesting inputs.
Fixes#22558.
Change-Id: I4476927067846bc4366da7793d2375c111694c55
Reviewed-on: https://go-review.googlesource.com/81215
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Most write barrier calls are inserted by SSA, but copy and append are
lowered to runtime.typedslicecopy during walk. Fix these to set
Func.WBPos and emit the "write barrier" warning, as done for the write
barriers inserted by SSA. As part of this, we refactor setting WBPos
and emitting this warning into the frontend so it can be shared by
both walk and SSA.
Change-Id: I5fe9997d9bdb55e03e01dd58aee28908c35f606b
Reviewed-on: https://go-review.googlesource.com/73411
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
When we remove a nil check, add it back to the free Value pool immediately.
Fixes#18732
Change-Id: I8d644faabbfb52157d3f2d071150ff0342ac28dc
Reviewed-on: https://go-review.googlesource.com/58810
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This reduces the size of the ssa export data
by 10%, from 76154 to 67886.
It doesn't appear that #20084, which would do this automatically,
is going to be fixed soon. Do it manually for now.
This speeds up compiling cmd/compile/internal/amd64
and presumably its comrades as well:
name old time/op new time/op delta
CompileAMD64 89.6ms ± 6% 86.7ms ± 5% -3.29% (p=0.000 n=49+47)
name old user-time/op new user-time/op delta
CompileAMD64 116ms ± 5% 112ms ± 5% -3.51% (p=0.000 n=45+42)
name old alloc/op new alloc/op delta
CompileAMD64 26.7MB ± 0% 25.8MB ± 0% -3.26% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
CompileAMD64 223k ± 0% 213k ± 0% -4.46% (p=0.008 n=5+5)
Updates #20084
Change-Id: I49e8951c5bfce63ad2b7f4fc3bfa0868c53114f9
Reviewed-on: https://go-review.googlesource.com/41493
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Prior to this CL, the SSA backend reported violations
of the //go:nowritebarrier annotation immediately.
This necessitated emitting errors during SSA compilation,
which is not compatible with a concurrent backend.
Instead, check for such violations later.
We already save the data required to do a late check
for violations of the //go:nowritebarrierrec annotation.
Use the same data, and check //go:nowritebarrier at the same time.
One downside to doing this is that now only a single
violation will be reported per function.
Given that this is for the runtime only,
and violations are rare, this seems an acceptable cost.
While we are here, remove several 'nerrors != 0' checks
that are rendered pointless.
Updates #15756Fixes#19250 (as much as it ever can be)
Change-Id: Ia44c4ad5b6fd6f804d9f88d9571cec8d23665cb3
Reviewed-on: https://go-review.googlesource.com/38973
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Suggested by mdempsky in CL 38232.
This allows us to use the Frontend field
to associate frontend state and information
with a function.
See the following CL in the series for examples.
This is a giant CL, but it is almost entirely routine refactoring.
The ssa test API is starting to feel a bit unwieldy.
I will clean it up separately, once the dust has settled.
Passes toolstash -cmp.
Updates #15756
Change-Id: I71c573bd96ff7251935fce1391b06b1f133c3caf
Reviewed-on: https://go-review.googlesource.com/38327
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
This makes ssa.Func, ssa.Cache, and ssa.Config fulfill
the roles laid out for them in CL 38160.
The only non-trivial change in this CL is how cached
values and blocks get IDs. Prior to this CL, their IDs were
assigned as part of resetting the cache, and only modified
IDs were reset. This required knowing how many values and
blocks were modified, which required a tight coupling between
ssa.Func and ssa.Config. To eliminate that coupling,
we now zero values and blocks during reset,
and assign their IDs when they are used.
Since unused values and blocks have ID == 0,
we can efficiently find the last used value/block,
to avoid zeroing everything.
Bulk zeroing is efficient, but not efficient enough
to obviate the need to avoid zeroing everything every time.
As a happy side-effect, ssa.Func.Free is no longer necessary.
DebugHashMatch and friends now belong in func.go.
They have been left in place for clarity and review.
I will move them in a subsequent CL.
Passes toolstash -cmp. No compiler performance impact.
No change in 'go test cmd/compile/internal/ssa' execution time.
Change-Id: I2eb7af58da067ef6a36e815a6f386cfe8634d098
Reviewed-on: https://go-review.googlesource.com/38167
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Remove size AuxInt in Store, and alignment in Move/Zero. We still
pass size AuxInt to Move/Zero, as it is used for partial Move/Zero
lowering (e.g. cmd/compile/internal/ssa/gen/386.rules:288).
SizeAndAlign is gone.
Passes "toolstash -cmp" on std.
Change-Id: I1ca34652b65dd30de886940e789fcf41d521475d
Reviewed-on: https://go-review.googlesource.com/38150
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Now that the write barrier insertion is moved to SSA, the SSA
building code can be simplified.
Updates #17583.
Change-Id: I5cacc034b11aa90b0abe6f8dd97e4e3994e2bc25
Reviewed-on: https://go-review.googlesource.com/36840
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
The line between ssa.Func and ssa.Config has blurred.
Concurrent compilation in the backend will require more precision.
This CL lays out an (aspirational) organization.
The implementation will come in follow-up CLs,
once the organization is settled.
ssa.Config holds basic compiler configuration,
mostly arch-specific information.
It is configured once, early on, and is readonly,
so it is safe for concurrent use.
ssa.Func is a single-shot object used for
compiling a single Func. It is not concurrency-safe
and not re-usable.
ssa.Cache is a multi-use object used to avoid
expensive allocations during compilation.
Each ssa.Func is given an ssa.Cache to use.
ssa.Cache is not concurrency-safe.
Change-Id: Id02809b6f3541541cac6c27bbb598834888ce1cc
Reviewed-on: https://go-review.googlesource.com/38160
Reviewed-by: Keith Randall <khr@golang.org>
Previously we always issued a spill right after the op
that was being spilled. This CL pushes spills father away
from the generator, hopefully pushing them into unlikely branches.
For example:
x = ...
if unlikely {
call ...
}
... use x ...
Used to compile to
x = ...
spill x
if unlikely {
call ...
restore x
}
It now compiles to
x = ...
if unlikely {
spill x
call ...
restore x
}
This is particularly useful for code which appends, as the only
call is an unlikely call to growslice. It also helps for the
spills needed around write barrier calls.
The basic algorithm is walk down the dominator tree following a
path where the block still dominates all of the restores. We're
looking for a block that:
1) dominates all restores
2) has the value being spilled in a register
3) has a loop depth no deeper than the value being spilled
The walking-down code is iterative. I was forced to limit it to
searching 100 blocks so it doesn't become O(n^2). Maybe one day
we'll find a better way.
I had to delete most of David's code which pushed spills out of loops.
I suspect this CL subsumes most of the cases that his code handled.
Generally positive performance improvements, but hard to tell for sure
with all the noise. (compilebench times are unchanged.)
name old time/op new time/op delta
BinaryTree17-12 2.91s ±15% 2.80s ±12% ~ (p=0.063 n=10+10)
Fannkuch11-12 3.47s ± 0% 3.30s ± 4% -4.91% (p=0.000 n=9+10)
FmtFprintfEmpty-12 48.0ns ± 1% 47.4ns ± 1% -1.32% (p=0.002 n=9+9)
FmtFprintfString-12 85.6ns ±11% 79.4ns ± 3% -7.27% (p=0.005 n=10+10)
FmtFprintfInt-12 91.8ns ±10% 85.9ns ± 4% ~ (p=0.203 n=10+9)
FmtFprintfIntInt-12 135ns ±13% 127ns ± 1% -5.72% (p=0.025 n=10+9)
FmtFprintfPrefixedInt-12 167ns ± 1% 168ns ± 2% ~ (p=0.580 n=9+10)
FmtFprintfFloat-12 249ns ±11% 230ns ± 1% -7.32% (p=0.000 n=10+10)
FmtManyArgs-12 504ns ± 7% 506ns ± 1% ~ (p=0.198 n=9+9)
GobDecode-12 6.95ms ± 1% 7.04ms ± 1% +1.37% (p=0.001 n=10+10)
GobEncode-12 6.32ms ±13% 6.04ms ± 1% ~ (p=0.063 n=10+10)
Gzip-12 233ms ± 1% 235ms ± 0% +1.01% (p=0.000 n=10+9)
Gunzip-12 40.1ms ± 1% 39.6ms ± 0% -1.12% (p=0.000 n=10+8)
HTTPClientServer-12 227µs ± 9% 221µs ± 5% ~ (p=0.114 n=9+8)
JSONEncode-12 16.1ms ± 2% 15.8ms ± 1% -2.09% (p=0.002 n=9+8)
JSONDecode-12 61.8ms ±11% 57.9ms ± 1% -6.30% (p=0.000 n=10+9)
Mandelbrot200-12 4.30ms ± 3% 4.28ms ± 1% ~ (p=0.203 n=10+8)
GoParse-12 3.18ms ± 2% 3.18ms ± 2% ~ (p=0.579 n=10+10)
RegexpMatchEasy0_32-12 76.7ns ± 1% 77.5ns ± 1% +0.92% (p=0.002 n=9+8)
RegexpMatchEasy0_1K-12 239ns ± 3% 239ns ± 1% ~ (p=0.204 n=10+10)
RegexpMatchEasy1_32-12 71.4ns ± 1% 70.6ns ± 0% -1.15% (p=0.000 n=10+9)
RegexpMatchEasy1_1K-12 383ns ± 2% 390ns ±10% ~ (p=0.181 n=8+9)
RegexpMatchMedium_32-12 114ns ± 0% 113ns ± 1% -0.88% (p=0.000 n=9+8)
RegexpMatchMedium_1K-12 36.3µs ± 1% 36.8µs ± 1% +1.59% (p=0.000 n=10+8)
RegexpMatchHard_32-12 1.90µs ± 1% 1.90µs ± 1% ~ (p=0.341 n=10+10)
RegexpMatchHard_1K-12 59.4µs ±11% 57.8µs ± 1% ~ (p=0.968 n=10+9)
Revcomp-12 461ms ± 1% 462ms ± 1% ~ (p=1.000 n=9+9)
Template-12 67.5ms ± 1% 66.3ms ± 1% -1.77% (p=0.000 n=10+8)
TimeParse-12 314ns ± 3% 309ns ± 0% -1.56% (p=0.000 n=9+8)
TimeFormat-12 340ns ± 2% 331ns ± 1% -2.79% (p=0.000 n=10+10)
The go binary is 0.2% larger. Not really sure why the size
would change.
Change-Id: Ia5116e53a3aeb025ef350ffc51c14ae5cc17871c
Reviewed-on: https://go-review.googlesource.com/34822
Reviewed-by: David Chase <drchase@google.com>
Minor fix, because it's the right thing to do.
No significant impact.
Change-Id: I2138285d397494daa9a88c414149c2a7860edd7e
Reviewed-on: https://go-review.googlesource.com/38001
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Values have an Aux and an AuxInt.
We're setting AuxInt, not Aux.
Say so.
Change-Id: I41aa783273bb7e1ba47c941aa4233f818e37dadd
Reviewed-on: https://go-review.googlesource.com/37997
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Rather than collecting static data nodes to be written out later, just
write them out immediately.
Change-Id: I51708b690e94bc3e288b4d6ba3307bf738a80f64
Reviewed-on: https://go-review.googlesource.com/36352
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>