Only return a pointer p to the new slices backing array from makeslice.
Makeslice callers then construct sliceheader{p, len, cap} explictly
instead of makeslice returning the slice.
Reduces go binary size by ~0.2%.
Removes 92 (~3.5%) panicindex calls from go binary.
Change-Id: I29b7c3b5fe8b9dcec96e2c43730575071cfe8a94
Reviewed-on: https://go-review.googlesource.com/c/141822
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Avoids allocating an ONAME for OLABEL, OGOTO, and named OBREAK and
OCONTINUE nodes.
Passes toolstash-check.
Change-Id: I359142cd48e8987b5bf29ac100752f8c497261c1
Reviewed-on: https://go-review.googlesource.com/c/145200
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
Named constants are represented as OLITERAL with n.Sym != nil.
Change-Id: If6bc8c507ef8c3e4e47f586d86fd1d0f20bf8974
Reviewed-on: https://go-review.googlesource.com/c/145198
Reviewed-by: Robert Griesemer <gri@golang.org>
The goal of this change is to move work from walk to SSA,
and simplify things along the way.
This is hard to accomplish cleanly with small incremental changes,
so this large commit message aims to provide a roadmap to the diff.
High level description:
Prior to this change, walk was responsible for constructing (most of) the stack for function calls.
ascompatte gathered variadic arguments into a slice.
It also rewrote n.List from a list of arguments to a list of assignments to stack slots.
ascompatte was called multiple times to handle the receiver in a method call.
reorder1 then introduced temporaries into n.List as needed to avoid smashing the stack.
adjustargs then made extra stack space for go/defer args as needed.
Node to SSA construction evaluated all the statements in n.List,
and issued the function call, assuming that the stack was correctly constructed.
Intrinsic calls had to dig around inside n.List to extract the arguments,
since intrinsics don't use the stack to make function calls.
This change moves stack construction to the SSA construction phase.
ascompatte, now called walkParams, does all the work that ascompatte and reorder1 did.
It handles variadic arguments, inserts the method receiver if needed, and allocates temporaries.
It does not, however, make any assignments to stack slots.
Instead, it moves the function arguments to n.Rlist, leaving assignments to temporaries in n.List.
(It would be better to use Ninit instead of List; future work.)
During SSA construction, after doing all the temporary assignments in n.List,
the function arguments are assigned to stack slots by
constructing the appropriate SSA Value, using (*state).storeArg.
SSA construction also now handles adjustments for go/defer args.
This change also simplifies intrinsic calls, since we no longer need to undo walk's work.
Along the way, we simplify nodarg by pushing the fp==1 case to its callers, where it fits nicely.
Generated code differences:
There were a few optimizations applied along the way, the old way.
f(g()) was rewritten to do a block copy of function results to function arguments.
And reorder1 avoided introducing the final "save the stack" temporary in n.List.
The f(g()) block copy optimization never actually triggered; the order pass rewrote away g(), so that has been removed.
SSA optimizations mostly obviated the need for reorder1's optimization of avoiding the final temporary.
The exception was when the temporary's type was not SSA-able;
in that case, we got a Move into an autotmp and then an immediate Move onto the stack,
with the autotmp never read or used again.
This change introduces a new rewrite rule to detect such pointless double Moves
and collapse them into a single Move.
This is actually more powerful than the original optimization,
since the original optimization relied on the imprecise Node.HasCall calculation.
The other significant difference in the generated code is that the stack is now constructed
completely in SP-offset order. Prior to this change, the stack was constructed somewhat
haphazardly: first the final argument that Node.HasCall deemed to require a temporary,
then other arguments, then the method receiver, then the defer/go args.
SP-offset is probably a good default order. See future work.
There are a few minor object file size changes as a result of this change.
I investigated some regressions in early versions of this change.
One regression (in archive/tar) was the addition of a single CMPQ instruction,
which would be eliminated were this TODO from flagalloc to be done:
// TODO: Remove original instructions if they are never used.
One regression (in text/template) was an ADDQconstmodify that is now
a regular MOVQLoad+ADDQconst+MOVQStore, due to an unlucky change
in the order in which arguments are written. The argument change
order can also now be luckier, so this appears to be a wash.
All in all, though there will be minor winners and losers,
this change appears to be performance neutral.
Future work:
Move loading the result of function calls to SSA construction; eliminate OINDREGSP.
Consider pushing stack construction deeper into SSA world, perhaps in an arch-specific pass.
Among other benefits, this would make it easier to transition to a new calling convention.
This would require rethinking the handling of stack conflicts and is non-trivial.
Figure out some clean way to indicate that stack construction Stores/Moves
do not alias each other, so that subsequent passes may do things like
CSE+tighten shared stack setup, do DSE using non-first Stores, etc.
This would allow us to eliminate the minor text/template regression.
Possibly make assignments to stack slots not treated as statements by DWARF.
Compiler benchmarks:
name old time/op new time/op delta
Template 182ms ± 2% 179ms ± 2% -1.69% (p=0.000 n=47+48)
Unicode 86.3ms ± 5% 85.1ms ± 4% -1.36% (p=0.001 n=50+50)
GoTypes 646ms ± 1% 642ms ± 1% -0.63% (p=0.000 n=49+48)
Compiler 2.89s ± 1% 2.86s ± 2% -1.36% (p=0.000 n=48+50)
SSA 8.47s ± 1% 8.37s ± 2% -1.22% (p=0.000 n=47+50)
Flate 122ms ± 2% 121ms ± 2% -0.66% (p=0.000 n=47+45)
GoParser 147ms ± 2% 146ms ± 2% -0.53% (p=0.006 n=46+49)
Reflect 406ms ± 2% 403ms ± 2% -0.76% (p=0.000 n=48+43)
Tar 162ms ± 3% 162ms ± 4% ~ (p=0.191 n=46+50)
XML 223ms ± 2% 222ms ± 2% -0.37% (p=0.031 n=45+49)
[Geo mean] 382ms 378ms -0.89%
name old user-time/op new user-time/op delta
Template 219ms ± 3% 216ms ± 3% -1.56% (p=0.000 n=50+48)
Unicode 109ms ± 6% 109ms ± 5% ~ (p=0.190 n=50+49)
GoTypes 836ms ± 2% 828ms ± 2% -0.96% (p=0.000 n=49+48)
Compiler 3.87s ± 2% 3.80s ± 1% -1.81% (p=0.000 n=49+46)
SSA 12.0s ± 1% 11.8s ± 1% -2.01% (p=0.000 n=48+50)
Flate 142ms ± 3% 141ms ± 3% -0.85% (p=0.003 n=50+48)
GoParser 178ms ± 4% 175ms ± 4% -1.66% (p=0.000 n=48+46)
Reflect 520ms ± 2% 512ms ± 2% -1.44% (p=0.000 n=45+48)
Tar 200ms ± 3% 198ms ± 4% -0.61% (p=0.037 n=47+50)
XML 277ms ± 3% 275ms ± 3% -0.85% (p=0.000 n=49+48)
[Geo mean] 482ms 476ms -1.23%
name old alloc/op new alloc/op delta
Template 36.1MB ± 0% 35.3MB ± 0% -2.18% (p=0.008 n=5+5)
Unicode 29.8MB ± 0% 29.3MB ± 0% -1.58% (p=0.008 n=5+5)
GoTypes 125MB ± 0% 123MB ± 0% -2.13% (p=0.008 n=5+5)
Compiler 531MB ± 0% 513MB ± 0% -3.40% (p=0.008 n=5+5)
SSA 2.00GB ± 0% 1.93GB ± 0% -3.34% (p=0.008 n=5+5)
Flate 24.5MB ± 0% 24.3MB ± 0% -1.18% (p=0.008 n=5+5)
GoParser 29.4MB ± 0% 28.7MB ± 0% -2.34% (p=0.008 n=5+5)
Reflect 87.1MB ± 0% 86.0MB ± 0% -1.33% (p=0.008 n=5+5)
Tar 35.3MB ± 0% 34.8MB ± 0% -1.44% (p=0.008 n=5+5)
XML 47.9MB ± 0% 47.1MB ± 0% -1.86% (p=0.008 n=5+5)
[Geo mean] 82.8MB 81.1MB -2.08%
name old allocs/op new allocs/op delta
Template 352k ± 0% 347k ± 0% -1.32% (p=0.008 n=5+5)
Unicode 342k ± 0% 339k ± 0% -0.66% (p=0.008 n=5+5)
GoTypes 1.29M ± 0% 1.27M ± 0% -1.30% (p=0.008 n=5+5)
Compiler 4.98M ± 0% 4.87M ± 0% -2.14% (p=0.008 n=5+5)
SSA 15.7M ± 0% 15.2M ± 0% -2.86% (p=0.008 n=5+5)
Flate 233k ± 0% 231k ± 0% -0.83% (p=0.008 n=5+5)
GoParser 296k ± 0% 291k ± 0% -1.54% (p=0.016 n=5+4)
Reflect 1.05M ± 0% 1.04M ± 0% -0.65% (p=0.008 n=5+5)
Tar 343k ± 0% 339k ± 0% -0.97% (p=0.008 n=5+5)
XML 432k ± 0% 426k ± 0% -1.19% (p=0.008 n=5+5)
[Geo mean] 815k 804k -1.35%
name old object-bytes new object-bytes delta
Template 505kB ± 0% 505kB ± 0% -0.01% (p=0.008 n=5+5)
Unicode 224kB ± 0% 224kB ± 0% ~ (all equal)
GoTypes 1.82MB ± 0% 1.83MB ± 0% +0.06% (p=0.008 n=5+5)
Flate 324kB ± 0% 324kB ± 0% +0.00% (p=0.008 n=5+5)
GoParser 402kB ± 0% 402kB ± 0% +0.04% (p=0.008 n=5+5)
Reflect 1.39MB ± 0% 1.39MB ± 0% -0.01% (p=0.008 n=5+5)
Tar 449kB ± 0% 449kB ± 0% -0.02% (p=0.008 n=5+5)
XML 598kB ± 0% 597kB ± 0% -0.05% (p=0.008 n=5+5)
Change-Id: Ifc9d5c1bd01f90171414b8fb18ffe2290d271143
Reviewed-on: https://go-review.googlesource.com/c/114797
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Interface and string comparisons don't need separate Ops any more than
struct or array comparisons do.
Removing them requires shuffling some code around in walk (and a
little in order), but overall allows simplifying things a bit.
Passes toolstash-check.
Change-Id: I084b8a6c089b768dc76d220379f4daed8a35db15
Reviewed-on: https://go-review.googlesource.com/c/141637
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The previous CL introduced stack objects. This CL removes the old
ambiguously live liveness analysis. After this CL we're relying
on stack objects exclusively.
Update a bunch of liveness tests to reflect the new world.
Fixes#22350
Change-Id: I739b26e015882231011ce6bc1a7f426049e59f31
Reviewed-on: https://go-review.googlesource.com/c/134156
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Currently, range loops over slices and arrays are compiled roughly
like:
for i, x := range s { b }
⇓
for i, _n, _p := 0, len(s), &s[0]; i < _n; i, _p = i+1, _p + unsafe.Sizeof(s[0]) { b }
⇓
i, _n, _p := 0, len(s), &s[0]
goto cond
body:
{ b }
i, _p = i+1, _p + unsafe.Sizeof(s[0])
cond:
if i < _n { goto body } else { goto end }
end:
The problem with this lowering is that _p may temporarily point past
the end of the allocation the moment before the loop terminates. Right
now this isn't a problem because there's never a safe-point during
this brief moment.
We're about to introduce safe-points everywhere, so this bad pointer
is going to be a problem. We could mark the increment as an unsafe
block, but this inhibits reordering opportunities and could result in
infrequent safe-points if the body is short.
Instead, this CL fixes this by changing how we compile range loops to
never produce this past-the-end pointer. It changes the lowering to
roughly:
i, _n, _p := 0, len(s), &s[0]
if i < _n { goto body } else { goto end }
top:
_p += unsafe.Sizeof(s[0])
body:
{ b }
i++
if i < _n { goto top } else { goto end }
end:
Notably, the increment is split into two parts: we increment the index
before checking the condition, but increment the pointer only *after*
the condition check has succeeded.
The implementation builds on the OFORUNTIL construct that was
introduced during the loop preemption experiments, since OFORUNTIL
places the increment and condition after the loop body. To support the
extra "late increment" step, we further define OFORUNTIL's "List"
field to contain the late increment statements. This makes all of this
a relatively small change.
This depends on the improvements to the prove pass in CL 102603. With
the current lowering, bounds-check elimination knows that i < _n in
the body because the body block is dominated by the cond block. In the
new lowering, deriving this fact requires detecting that i < _n on
*both* paths into body and hence is true in body. CL 102603 made prove
able to detect this.
The code size effect of this is minimal. The cmd/go binary on
linux/amd64 increases by 0.17%. Performance-wise, this actually
appears to be a net win, though it's mostly noise:
name old time/op new time/op delta
BinaryTree17-12 2.80s ± 0% 2.61s ± 1% -6.88% (p=0.000 n=20+18)
Fannkuch11-12 2.41s ± 0% 2.42s ± 0% +0.05% (p=0.005 n=20+20)
FmtFprintfEmpty-12 41.6ns ± 5% 41.4ns ± 6% ~ (p=0.765 n=20+19)
FmtFprintfString-12 69.4ns ± 3% 69.3ns ± 1% ~ (p=0.084 n=19+17)
FmtFprintfInt-12 76.1ns ± 1% 77.3ns ± 1% +1.57% (p=0.000 n=19+19)
FmtFprintfIntInt-12 122ns ± 2% 123ns ± 3% +0.95% (p=0.015 n=20+20)
FmtFprintfPrefixedInt-12 153ns ± 2% 151ns ± 3% -1.27% (p=0.013 n=20+20)
FmtFprintfFloat-12 215ns ± 0% 216ns ± 0% +0.47% (p=0.000 n=20+16)
FmtManyArgs-12 486ns ± 1% 498ns ± 0% +2.40% (p=0.000 n=20+17)
GobDecode-12 6.43ms ± 0% 6.50ms ± 0% +1.08% (p=0.000 n=18+19)
GobEncode-12 5.43ms ± 1% 5.47ms ± 0% +0.76% (p=0.000 n=20+20)
Gzip-12 218ms ± 1% 218ms ± 1% ~ (p=0.883 n=20+20)
Gunzip-12 38.8ms ± 0% 38.9ms ± 0% ~ (p=0.644 n=19+19)
HTTPClientServer-12 76.2µs ± 1% 76.4µs ± 2% ~ (p=0.218 n=20+20)
JSONEncode-12 12.2ms ± 0% 12.3ms ± 1% +0.45% (p=0.000 n=19+19)
JSONDecode-12 54.2ms ± 1% 53.3ms ± 0% -1.67% (p=0.000 n=20+20)
Mandelbrot200-12 3.71ms ± 0% 3.71ms ± 0% ~ (p=0.143 n=19+20)
GoParse-12 3.22ms ± 0% 3.19ms ± 1% -0.72% (p=0.000 n=20+20)
RegexpMatchEasy0_32-12 76.7ns ± 1% 75.8ns ± 1% -1.19% (p=0.000 n=20+17)
RegexpMatchEasy0_1K-12 245ns ± 1% 243ns ± 0% -0.72% (p=0.000 n=18+17)
RegexpMatchEasy1_32-12 71.9ns ± 0% 71.7ns ± 1% -0.39% (p=0.006 n=12+18)
RegexpMatchEasy1_1K-12 358ns ± 1% 354ns ± 1% -1.13% (p=0.000 n=20+19)
RegexpMatchMedium_32-12 105ns ± 2% 105ns ± 1% -0.63% (p=0.007 n=19+20)
RegexpMatchMedium_1K-12 31.9µs ± 1% 31.9µs ± 1% ~ (p=1.000 n=17+17)
RegexpMatchHard_32-12 1.51µs ± 1% 1.52µs ± 2% +0.46% (p=0.042 n=18+18)
RegexpMatchHard_1K-12 45.3µs ± 1% 45.5µs ± 2% +0.44% (p=0.029 n=18+19)
Revcomp-12 388ms ± 1% 385ms ± 0% -0.57% (p=0.000 n=19+18)
Template-12 63.0ms ± 1% 63.3ms ± 0% +0.50% (p=0.000 n=19+20)
TimeParse-12 309ns ± 1% 307ns ± 0% -0.62% (p=0.000 n=20+20)
TimeFormat-12 328ns ± 0% 333ns ± 0% +1.35% (p=0.000 n=19+19)
[Geo mean] 47.0µs 46.9µs -0.20%
(https://perf.golang.org/search?q=upload:20180326.1)
For #10958.
For #24543.
Change-Id: Icbd52e711fdbe7938a1fea3e6baca1104b53ac3a
Reviewed-on: https://go-review.googlesource.com/102604
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Insert appropriate race/msan calls before each memory operation during
SSA construction.
This is conceptually simple, but subtle because we need to be careful
that inserted instrumentation calls don't clobber arguments that are
currently being prepared for a user function call.
reorder1 already handles introducing temporary variables for arguments
in some cases. This CL changes it to use them for all arguments when
instrumenting.
Also, we can't SSA struct types with more than one field while
instrumenting. Otherwise, concurrent uses of disjoint fields within an
SSA-able struct can introduce false races.
This is both somewhat better and somewhat worse than the old racewalk
instrumentation pass. We're now able to easily recognize cases like
constructing non-escaping closures on the stack or accessing closure
variables don't need instrumentation calls. On the other hand,
spilling escaping parameters to the heap now results in an
instrumentation call.
Overall, this CL results in a small net reduction in the number of
instrumentation calls, but a small net increase in binary size for
instrumented executables. cmd/go ends up with 5.6% fewer calls, but a
2.4% larger binary.
Fixes#19054.
Change-Id: I70d1dd32ad6340e6fdb691e6d5a01452f58e97f3
Reviewed-on: https://go-review.googlesource.com/102817
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Inl, Inldcl, and InlCost are only applicable to functions with bodies
that can be inlined, so pull them out into a separate Inline type to
make understanding them easier.
A side benefit is that we can check if a function can be inlined by
just checking if n.Func.Inl is non-nil, which simplifies handling of
empty function bodies.
While here, remove some unnecessary Curfn twiddling, and make imported
functions use Inl.Dcl instead of Func.Dcl for consistency for local
functions.
Passes toolstash-check.
Change-Id: Ifd4a80349d85d9e8e4484952b38ec4a63182e81f
Reviewed-on: https://go-review.googlesource.com/104756
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
The main thing is we now eagerly create the ODCLFUNC node for
closures, immediately cross-link them, and assign fields (e.g., Nbody,
Dcl, Parents, Marks) directly on the ODCLFUNC (previously they were
assigned on the OCLOSURE and later moved to the ODCLFUNC).
This allows us to set Curfn to the ODCLFUNC instead of the OCLOSURE,
which makes things more consistent with normal function declarations.
(Notably, this means Cvars now hang off the ODCLFUNC instead of the
OCLOSURE.)
Assignment of xfunc symbol names also now happens before typechecking
their body, which means debugging output now provides a more helpful
name than "<S>".
In golang.org/cl/66810, we changed "x := y" statements to avoid
creating false closure variables for x, but we still create them for
struct literals like "s{f: x}". Update comment in capturevars
accordingly.
More opportunity for cleanups still, but this makes some substantial
progress, IMO.
Passes toolstash-check.
Change-Id: I65a4efc91886e3dcd1000561348af88297775cd7
Reviewed-on: https://go-review.googlesource.com/100197
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
Add helper methods that validate n.Op and convert to/from the
appropriate type.
Notably, there was a lot of code in walk.go that thought setting
Etype=1 on an OADDR node affected escape analysis.
Passes toolstash-check.
TBR=marvin
Change-Id: Ieae7c67225c1459c9719f9e6a748a25b975cf758
Reviewed-on: https://go-review.googlesource.com/99535
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
There were only two large classes of use for these variables:
1) Testing "funcdepth != 0" or "funcdepth > 0", which is equivalent to
checking "Curfn != nil".
2) In oldname, detecting whether a closure variable has been created
for the current function, which can be handled by instead testing
"n.Name.Curfn != Curfn".
Lastly, merge funcstart into funchdr, since it's only called once, and
it better matches up with funcbody now.
Passes toolstash-check.
Change-Id: I8fe159a9d37ef7debc4cd310354cea22a8b23394
Reviewed-on: https://go-review.googlesource.com/99076
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
changedVars was functionally a set, but couldn't be iterated over
efficiently. In functions with many variables, the wasted iteration was
costly. Use a sparseSet instead.
(*gc.Node).String() is very expensive: it calls Sprintf, which does
reflection, etc, etc. Instead, just look at .Sym.Name, which is all we
care about.
Change-Id: Ib61cd7b5c796e1813b8859135e85da5bfe2ac686
Reviewed-on: https://go-review.googlesource.com/92402
Reviewed-by: David Chase <drchase@google.com>
The NoFramePointer function flag is no longer used, so this CL
eliminates it. This cleans up some confusion between the compiler's
NoFramePointer flag and obj's NOFRAME flag. NoFramePointer was
intended to eliminate the saved base pointer on x86, but it was
translated into obj's NOFRAME flag. On x86, NOFRAME does mean to omit
the saved base pointer, but on ppc64 and s390x it has a more general
meaning of omitting *everything* from the frame, including the saved
LR and ppc64's "fixed frame". Hence, on ppc64 and s390x there are far
fewer situations where it is safe to set this flag.
Change-Id: If68991310b4d00638128c296bdd57f4ed731b46d
Reviewed-on: https://go-review.googlesource.com/92036
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Compiler and linker changes to support DWARF inlined instances,
see https://go.googlesource.com/proposal/+/HEAD/design/22080-dwarf-inlining.md
for design details.
This functionality is gated via the cmd/compile option -gendwarfinl=N,
where N={0,1,2}, where a value of 0 disables dwarf inline generation,
a value of 1 turns on dwarf generation without tracking of formal/local
vars from inlined routines, and a value of 2 enables inlines with
variable tracking.
Updates #22080
Change-Id: I69309b3b815d9fed04aebddc0b8d33d0dbbfad6e
Reviewed-on: https://go-review.googlesource.com/75550
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Now possible, since stringer just got the -trimprefix flag added.
While at it, simplify a few Op stringifications since we can now use %v,
and no longer have to worry about o<len(opnames).
Passes toolstash -cmp on std cmd.
Fixes#15462.
Change-Id: Icdcde0b0a5eb165d18488918175024da274f782b
Reviewed-on: https://go-review.googlesource.com/76790
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Dave Cheney <dave@cheney.net>
Previously, anytime we exported a function or method declaration
(which includes methods for every type transitively exported), we
included the inline function bodies, if any. However, in many cases,
it's impossible (or at least very unlikely) for the importing package
to call the method.
For example:
package p
type T int
func (t T) M() { t.u() }
func (t T) u() {}
func (t T) v() {}
T.M and T.u are inlineable, and they're both reachable through calls
to T.M, which is exported. However, t.v is also inlineable, but cannot
be reached.
Exception: if p.T is embedded in another type q.U, p.T.v will be
promoted to q.U.v, and the generated wrapper function could have
inlined the call to p.T.v. However, in practice, this doesn't happen,
and a missed inlining opportunity doesn't affect correctness.
To implement this, this CL introduces an extra flood fill pass before
exporting to mark inline bodies that are actually reachable, so the
exporter can skip over methods like t.v.
This reduces Kubernetes build time (as measured by "time go build -a
k8s.io/kubernetes/cmd/...") on an HP Z620 measurably:
== before ==
real 0m44.658s
user 11m19.136s
sys 0m53.844s
== after ==
real 0m41.702s
user 10m29.732s
sys 0m50.908s
It also significantly cuts down the cost of enabling mid-stack
inlining (-l=4):
== before (-l=4) ==
real 1m19.236s
user 20m6.528s
sys 1m17.328s
== after (-l=4) ==
real 0m59.100s
user 13m12.808s
sys 0m58.776s
Updates #19348.
Change-Id: Iade58233ca42af823a1630517a53848b5d3c7a7e
Reviewed-on: https://go-review.googlesource.com/74110
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
Most write barrier calls are inserted by SSA, but copy and append are
lowered to runtime.typedslicecopy during walk. Fix these to set
Func.WBPos and emit the "write barrier" warning, as done for the write
barriers inserted by SSA. As part of this, we refactor setting WBPos
and emitting this warning into the frontend so it can be shared by
both walk and SSA.
Change-Id: I5fe9997d9bdb55e03e01dd58aee28908c35f606b
Reviewed-on: https://go-review.googlesource.com/73411
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
The current go:nowritebarrierrec checker has two problems that limit
its coverage:
1. It doesn't understand that systemstack calls its argument, which
means there are several cases where we fail to detect prohibited write
barriers.
2. It only observes calls in the AST, so calls constructed during
lowering by SSA aren't followed.
This CL completely rewrites this checker to address these issues.
The current checker runs entirely after walk and uses visitBottomUp,
which introduces several problems for checking across systemstack.
First, visitBottomUp itself doesn't understand systemstack calls, so
the callee may be ordered after the caller, causing the checker to
fail to propagate constraints. Second, many systemstack calls are
passed a closure, which is quite difficult to resolve back to the
function definition after transformclosure and walk have run. Third,
visitBottomUp works exclusively on the AST, so it can't observe calls
created by SSA.
To address these problems, this commit splits the check into two
phases and rewrites it to use a call graph generated during SSA
lowering. The first phase runs before transformclosure/walk and simply
records systemstack arguments when they're easy to get. Then, it
modifies genssa to record static call edges at the point where we're
lowering to Progs (which is the latest point at which position
information is conveniently available). Finally, the second phase runs
after all functions have been lowered and uses a direct BFS walk of
the call graph (combining systemstack calls with static calls) to find
prohibited write barriers and construct nice error messages.
Fixes#22384.
For #22460.
Change-Id: I39668f7f2366ab3c1ab1a71eaf25484d25349540
Reviewed-on: https://go-review.googlesource.com/72773
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Not perfect, but it's a start.
Change-Id: I3283385223a39ea73567b0130290bfe4de69d018
Reviewed-on: https://go-review.googlesource.com/73552
Reviewed-by: Robert Griesemer <gri@golang.org>
Eliminates lots of ad hoc code for recognizing the same thing in
different ways.
Passes toolstash-check.
Change-Id: Ic0bb005308e96331b4ef30f455b860e476725b61
Reviewed-on: https://go-review.googlesource.com/73190
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
racewalk's "foreach" function applies a function to all of a Node's
immediate children, but with a non-idiomatic signature.
This CL reworks it to recursively iterate over the entire subtree
rooted at Node and provides a way to short-circuit iteration.
Passes toolstash -cmp for std cmd with -race.
Change-Id: I738b73953d608709802c97945b7e0f4e4940d3f4
Reviewed-on: https://go-review.googlesource.com/71911
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
Calls to a closure held in a local, non-escaping,
variable can be inlined, provided the closure body
can be inlined and the variable is never written to.
The current implementation has the following limitations:
- closures with captured variables are not inlined because
doing so naively triggers invariant violation in the SSA
phase
- re-assignment check is currently approximated by checking
the Addrtaken property of the variable which should be safe
but may miss optimization opportunities if the address is
not used for a write before the invocation
Updates #15561
Change-Id: I508cad5d28f027bd7e933b1f793c14dcfef8b5a1
Reviewed-on: https://go-review.googlesource.com/65071
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hugues Bruant <hugues.bruant@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
OTYPESW Op comment says
// List = Left.(type)
this seems to be wrong. Adding
fmt.Printf("n: %v\n", cond)
fmt.Printf(" n.List: %v\n", cond.List)
fmt.Printf(" n.Left: %v\n", cond.Left)
fmt.Printf(" n.Right: %v\n", cond.Right)
to (s *typeSwitch) walk(sw *Node), and compiling the following code
snippet
var y interface{}
switch x := y.(type) {
default:
println(x)
}
prints
n: <node TYPESW>
n.List:
n.Left: x
n.Right: y
The correct OTYPESW Node field positions are
// Left = Right.(type)
This is confirmed by the fact that, further in the code,
typeSwitch.walk() checks that Right (and not Left) is of type
interface:
cond.Right = walkexpr(cond.Right, &sw.Ninit)
if !cond.Right.Type.IsInterface() {
yyerror("type switch must be on an interface")
return
}
This patch fixes the OTYPESW comment.
Change-Id: Ief1e409cfabb7640d7f7b0d4faabbcffaf605450
Reviewed-on: https://go-review.googlesource.com/69112
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Previously, we used OXFALL vs OFALL to distinguish fallthrough
statements that had been validated. Because in the Node AST we flatten
statement blocks, OXCASE and OXFALL needed to keep track of their
block scopes for this purpose.
Now that we have an AST that keeps these separate, we can just perform
the validation earlier.
Passes toolstash-check.
Fixes#14540.
Change-Id: I8421eaba16c2b3b72c9c5483b5cf20b14261385e
Reviewed-on: https://go-review.googlesource.com/61130
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Change compiler and linker to emit DWARF lexical blocks in .debug_info
section when compiling with -N -l.
Version of debug_info is updated from DWARF v2 to DWARF v3 since
version 2 does not allow lexical blocks with discontinuous PC ranges.
Remaining open problems:
- scope information is removed from inlined functions
- variables records do not have DW_AT_start_scope attributes so a
variable will shadow other variables with the same name as soon as its
containing scope begins, even before its declaration.
Updates #6913.
Updates #12899.
Change-Id: Idc6808788512ea20e7e45bcf782453acb416fb49
Reviewed-on: https://go-review.googlesource.com/40095
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Generated hash and eq routines don't need nil checks.
Prior to this CL, this was accomplished by
temporarily incrementing the global variable disable_checknil.
However, that increment lasted only the lifetime of the
call to funccompile. After CL 41503, funccompile may
do nothing but enqueue the function for compilation,
resulting in nil checks being generated.
Fix this by adding an explicit flag to a function
indicating whether nil checks should be disabled
for that function.
While we're here, allow concurrent compilation
with the -w and -W flags, since that was needed
to investigate this issue.
Fixes#20242
Change-Id: Ib9140c22c49e9a09e62fa3cf350f5d3eff18e2bd
Reviewed-on: https://go-review.googlesource.com/42591
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
Node.Used was written to from the backend
concurrently with reads of Node.Class
for the same ONAME Nodes.
I do not know why it was not failing consistently
under the race detector, but it is a race.
This is likely also a problem with Node.HasVal and Node.HasOpt.
They will be handled in a separate CL.
Fix Used by moving it to gc.Name and making it a separate bool.
There was one non-Name use of Used, marking OLABELs as used.
That is no longer needed, now that goto and label checking
happens early in the front end.
Leave the getters and setters in place,
to ease changing the representation in the future
(or changing to an interface!).
Updates #20144
Change-Id: I9bec7c6d33dcb129a4cfa9d338462ea33087f9f7
Reviewed-on: https://go-review.googlesource.com/42015
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Node.Walkdef is 0, 1, or 2, so it only requires two bits.
Add support for 2-bit values to bitset,
and use it for Node.Walkdef.
Class, Embedded, Typecheck, and Initorder will follow suit
in subsequent CLs.
The multi-bit flags will go at the beginning,
since that generates (marginally) more efficient code.
Change-Id: Id6e2e66e437f10aaa05b8a6e1652efb327d06128
Reviewed-on: https://go-review.googlesource.com/41791
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
In addition to being more compact,
this makes the code a lot clearer.
Change-Id: Ibcb70526c2e5913dcf34904fda194e3585228c3f
Reviewed-on: https://go-review.googlesource.com/41761
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
node.Likely may once have held -1/0/+1,
but it is now only 0/1.
With improved SSA heuristics,
it may someday go away entirely.
Change-Id: I6451d17fd7fb47e67fea4d39df302b6db00ea57b
Reviewed-on: https://go-review.googlesource.com/41760
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This reverts commit c8b889cc48.
Reason for revert: broke noopt build, compiler performance regression, new Curfn uses
Let's fix those and then try this again.
Change-Id: Icc3cad1365d04cac8fd09da9dbb0bbf55c13ef44
Reviewed-on: https://go-review.googlesource.com/39991
Reviewed-by: Robert Griesemer <gri@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Change compiler and linker to emit DWARF lexical blocks in debug_info.
Version of debug_info is updated from DWARF v.2 to DWARF v.3 since version 2
does not allow lexical blocks with discontinuous ranges.
Second attempt at https://go-review.googlesource.com/#/c/29591/
Remaining open problems:
- scope information is removed from inlined functions
- variables in debug_info do not have DW_AT_start_scope attributes so a
variable will shadow other variables with the same name as soon as its
containing scope begins, before its declaration.
Updates golang/go#12899, golang/go#6913
Change-Id: I0e260a45b564d14a87b88974eb16c5387cb410a5
Reviewed-on: https://go-review.googlesource.com/36879
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
- created new package cmd/compile/internal/types
- moved Pkg, Sym, Type to new package
- to break cycles, for now we need the (ugly) types/utils.go
file which contains a handful of functions that must be installed
early by the gc frontend
- to break cycles, for now we need two functions to convert between
*gc.Node and *types.Node (the latter is a dummy type)
- adjusted the gc's code to use the new package and the conversion
functions as needed
- made several Pkg, Sym, and Type methods functions as needed
- renamed constructors typ, typPtr, typArray, etc. to types.New,
types.NewPtr, types.NewArray, etc.
Passes toolstash-check -all.
Change-Id: I8adfa5e85c731645d0a7fd2030375ed6ebf54b72
Reviewed-on: https://go-review.googlesource.com/39855
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
A few gc.Node ops may be shared across functions.
The compiler is (mostly) already careful to avoid mutating them.
However, from a concurrency perspective, replacing (say)
an empty list with an empty list still counts as a mutation.
One place this occurs is orderinit. Avoid it.
This requires fixing one spot where shared nodes were mutated.
It doesn't result in any functional or performance changes.
Passes toolstash-check.
Updates #15756
Change-Id: I63c93b31baeeac62d7574804acb6b7f2bc9d14a9
Reviewed-on: https://go-review.googlesource.com/39196
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>