It's no longer conditional.
Change-Id: I697bb0e9ffe9644ec4d2766f7e8be8b82d3b0638
Reviewed-on: https://go-review.googlesource.com/c/go/+/286013
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
This was already documented as always being an ONAME, so it just
needed a few type assertion changes.
Passes buildall w/ toolstash -cmp.
Updates #42982.
Change-Id: I61f4b6ebd57c43b41977f4b37b81fe94fb11a723
Reviewed-on: https://go-review.googlesource.com/c/go/+/275757
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Trust: Matthew Dempsky <mdempsky@google.com>
The plan is to introduce a Node interface that replaces the old *Node pointer-to-struct.
The previous CL defined an interface INode modeling a *Node.
This CL:
- Changes all references outside internal/ir to use INode,
along with many references inside internal/ir as well.
- Renames Node to node.
- Renames INode to Node
So now ir.Node is an interface implemented by *ir.node, which is otherwise inaccessible,
and the code outside package ir is now (clearly) using only the interface.
The usual rule is never to redefine an existing name with a new meaning,
so that old code that hasn't been updated gets a "unknown name" error
instead of more mysterious errors or silent misbehavior. That rule would
caution against replacing Node-the-struct with Node-the-interface,
as in this CL, because code that says *Node would now be using a pointer
to an interface. But this CL is being landed at the same time as another that
moves Node from gc to ir. So the net effect is to replace *gc.Node with ir.Node,
which does follow the rule: any lingering references to gc.Node will be told
it's gone, not silently start using pointers to interfaces. So the rule is followed
by the CL sequence, just not this specific CL.
Overall, the loss of inlining caused by using interfaces cuts the compiler speed
by about 6%, a not insignificant amount. However, as we convert the representation
to concrete structs that are not the giant Node over the next weeks, that speed
should come back as more of the compiler starts operating directly on concrete types
and the memory taken up by the graph of Nodes drops due to the more precise
structs. Honestly, I was expecting worse.
% benchstat bench.old bench.new
name old time/op new time/op delta
Template 168ms ± 4% 182ms ± 2% +8.34% (p=0.000 n=9+9)
Unicode 72.2ms ±10% 82.5ms ± 6% +14.38% (p=0.000 n=9+9)
GoTypes 563ms ± 8% 598ms ± 2% +6.14% (p=0.006 n=9+9)
Compiler 2.89s ± 4% 3.04s ± 2% +5.37% (p=0.000 n=10+9)
SSA 6.45s ± 4% 7.25s ± 5% +12.41% (p=0.000 n=9+10)
Flate 105ms ± 2% 115ms ± 1% +9.66% (p=0.000 n=10+8)
GoParser 144ms ±10% 152ms ± 2% +5.79% (p=0.011 n=9+8)
Reflect 345ms ± 9% 370ms ± 4% +7.28% (p=0.001 n=10+9)
Tar 149ms ± 9% 161ms ± 5% +8.05% (p=0.001 n=10+9)
XML 190ms ± 3% 209ms ± 2% +9.54% (p=0.000 n=9+8)
LinkCompiler 327ms ± 2% 325ms ± 2% ~ (p=0.382 n=8+8)
ExternalLinkCompiler 1.77s ± 4% 1.73s ± 6% ~ (p=0.113 n=9+10)
LinkWithoutDebugCompiler 214ms ± 4% 211ms ± 2% ~ (p=0.360 n=10+8)
StdCmd 14.8s ± 3% 15.9s ± 1% +6.98% (p=0.000 n=10+9)
[Geo mean] 480ms 510ms +6.31%
name old user-time/op new user-time/op delta
Template 223ms ± 3% 237ms ± 3% +6.16% (p=0.000 n=9+10)
Unicode 103ms ± 6% 113ms ± 3% +9.53% (p=0.000 n=9+9)
GoTypes 758ms ± 8% 800ms ± 2% +5.55% (p=0.003 n=10+9)
Compiler 3.95s ± 2% 4.12s ± 2% +4.34% (p=0.000 n=10+9)
SSA 9.43s ± 1% 9.74s ± 4% +3.25% (p=0.000 n=8+10)
Flate 132ms ± 2% 141ms ± 2% +6.89% (p=0.000 n=9+9)
GoParser 177ms ± 9% 183ms ± 4% ~ (p=0.050 n=9+9)
Reflect 467ms ±10% 495ms ± 7% +6.17% (p=0.029 n=10+10)
Tar 183ms ± 9% 197ms ± 5% +7.92% (p=0.001 n=10+10)
XML 249ms ± 5% 268ms ± 4% +7.82% (p=0.000 n=10+9)
LinkCompiler 544ms ± 5% 544ms ± 6% ~ (p=0.863 n=9+9)
ExternalLinkCompiler 1.79s ± 4% 1.75s ± 6% ~ (p=0.075 n=10+10)
LinkWithoutDebugCompiler 248ms ± 6% 246ms ± 2% ~ (p=0.965 n=10+8)
[Geo mean] 483ms 504ms +4.41%
[git-generate]
cd src/cmd/compile/internal/ir
: # We need to do the conversion in multiple steps, so we introduce
: # a temporary type alias that will start out meaning the pointer-to-struct
: # and then change to mean the interface.
rf '
mv Node OldNode
add node.go \
type Node = *OldNode
'
: # It should work to do this ex in ir, but it misses test files, due to a bug in rf.
: # Run the command in gc to handle gc's tests, and then again in ssa for ssa's tests.
cd ../gc
rf '
ex . ../arm ../riscv64 ../arm64 ../mips64 ../ppc64 ../mips ../wasm {
import "cmd/compile/internal/ir"
*ir.OldNode -> ir.Node
}
'
cd ../ssa
rf '
ex {
import "cmd/compile/internal/ir"
*ir.OldNode -> ir.Node
}
'
: # Back in ir, finish conversion clumsily with sed,
: # because type checking and circular aliases do not mix.
cd ../ir
sed -i '' '
/type Node = \*OldNode/d
s/\*OldNode/Node/g
s/^func (n Node)/func (n *OldNode)/
s/OldNode/node/g
s/type INode interface/type Node interface/
s/var _ INode = (Node)(nil)/var _ Node = (*node)(nil)/
' *.go
gofmt -w *.go
sed -i '' '
s/{Func{}, 136, 248}/{Func{}, 152, 280}/
s/{Name{}, 32, 56}/{Name{}, 44, 80}/
s/{Param{}, 24, 48}/{Param{}, 44, 88}/
s/{node{}, 76, 128}/{node{}, 88, 152}/
' sizeof_test.go
cd ../ssa
sed -i '' '
s/{LocalSlot{}, 28, 40}/{LocalSlot{}, 32, 48}/
' sizeof_test.go
cd ../gc
sed -i '' 's/\*ir.Node/ir.Node/' mkbuiltin.go
cd ../../../..
go install std cmd
cd cmd/compile
go test -u || go test -u
Change-Id: I196bbe3b648e4701662e4a2bada40bf155e2a553
Reviewed-on: https://go-review.googlesource.com/c/go/+/272935
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
The cycle hacks existed because gc needed to import ssa
which need to know about gc.Node. But now that's ir.Node,
and there's no cycle anymore.
Don't know how much it matters but LocalSlot is now
one word shorter than before, because it holds a pointer
instead of an interface for the *Node. That won't last long.
Now that they're not necessary for interface satisfaction,
IsSynthetic and IsAutoTmp can move to top-level ir functions.
Change-Id: Ie511e93466cfa2b17d9a91afc4bd8d53fdb80453
Reviewed-on: https://go-review.googlesource.com/c/go/+/272931
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
As it says, delay expanpsion of OpArg to the expand_calls phase,
to enable (eventually) interprocedural SSA optimizations, and
(sooner) change to a register ABI.
Includes a round of cleanup to function names and comments,
largely to match the expanded scope of the functions.
This CL removes the per-function dependence on GOSSAHASH,
but the go116lateCallExpansion kill switch remains (and was
tested locally to ensure it worked).
Two functions in expand_calls.go that performed overlapping
things were combined into a single function that is called
twice.
Fixes#42236.
For #40724.
Change-Id: Icbb78947eaa39f17f2c1210d5c2caef20abd6571
Reviewed-on: https://go-review.googlesource.com/c/go/+/262117
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This adds a pass to detect common selection operations,
to avoid generating duplicates. Duplicate offsets are
also detected.
All aggregate types are now handled; there is some freedom in where
expand_calls is run, though it must run before softfloat.
Debug-name-maintenance is now incremental both in decompose builtin
and in expand_calls; it might be good to push this into all the
decompose passes.
(this is a smash of 5 CLs that rewrote some of the same code several
times to deal with phase-ordering problems, and included an abandoned
attempt.)
For #40724.
Change-Id: I2a0c32f20660bf8b99e2bcecd33545d97d2bd3c6
Reviewed-on: https://go-review.googlesource.com/c/go/+/249458
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Two part fix:
1) bring the type "correction" forward from a later CL in the expand calls series
2) when a leaf-selwect is rewritten in place, update the type (it might have been
changed by the type correction in 1).
Fixes#41736.
Change-Id: Id097efd10481bf0ad92aaead81a7207221c144b5
Reviewed-on: https://go-review.googlesource.com/c/go/+/259203
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
My last 387 CL. So sad ... ... ... ... not!
Fixes#40255
Change-Id: I8d4ddb744b234b8adc735db2f7c3c7b6d8bbdfa4
Reviewed-on: https://go-review.googlesource.com/c/go/+/258957
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Not a fix, but things will work while I fix it.
Credit @andybons "for we revert switches for scary stuff".
Updates #41736
Change-Id: I55f90860eae919765aac4f6d9f108a54139027e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/258897
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
This change incorporates the decision that it should be possible to
run call expansion relatively late in the optimization chain, so that
(1) calls themselves can be exposed to useful optimizations
(2) the effect of selectors on aggregates is seen at the rewrite,
so that assignment of parts into registers is less complicated
(at least I hope it works that way).
That means that selectors feeding into SelectN need to be processed,
and Make* feeding into call parameters need to be processed.
This does however require that call expansion run before decompose
builtins.
This doesn't yet handle rewrites of strings, slices, interfaces,
and complex numbers.
Passes run.bash and race.bash
Change-Id: I71ff23d3c491043beb30e926949970c4f63ef1a4
Reviewed-on: https://go-review.googlesource.com/c/go/+/245133
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Introduce GOOS=ios for iOS systems. GOOS=ios matches "darwin"
build tag, like GOOS=android matches "linux" and GOOS=illumos
matches "solaris". Only ios/arm64 is supported (ios/amd64 is
not).
GOOS=ios and GOOS=darwin remain essentially the same at this
point. They will diverge at later time, to differentiate macOS
and iOS.
Uses of GOOS=="darwin" are changed to (GOOS=="darwin" || GOOS=="ios"),
except if it clearly means macOS (e.g. GOOS=="darwin" && GOARCH=="amd64"),
it remains GOOS=="darwin".
Updates #38485.
Change-Id: I4faacdc1008f42434599efb3c3ad90763a83b67c
Reviewed-on: https://go-review.googlesource.com/c/go/+/254740
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Turns out if your failure is in a function with a name like "Reset()"
there will be a lot of hits on the same hashcode. Adding package sensitivity
solves this problem.
In additionm, it turned out that in the case that a logfile was specified
for the GOSSAHASH logging, that it was opened in create mode, which meant
that multiple compiler invocations would reset the file to zero length.
Opening in append mode works better; the automated harness
(github.com/dr2chase/gossahash) takes care of truncating the file before use.
Change-Id: I5601bc280faa94cbd507d302448831849db6c842
Reviewed-on: https://go-review.googlesource.com/c/go/+/246937
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Right now the Aux and AuxInt fields of ssa.Values are typed as
interface{} and int64, respectively. Each rule that uses these values
must cast them to the type they actually are (*obj.LSym, or int32, or
ValAndOff, etc.), use them, and then cast them back to interface{} or
int64.
We know for each opcode what the types of the Aux and AuxInt fields
should be. So let's modify the rule generator to declare the types to
be what we know they should be, autoconverting to and from the generic
types for us. That way we can make the rules more type safe.
It's difficult to make a single CL for this, so I've coopted the "=>"
token to indicate a rule that is strongly typed. "->" rules are
processed as before. That will let us migrate a few rules at a time in
separate CLs. Hopefully we can reach a state where all rules are
strongly typed and we can drop the distinction.
This CL changes just a few rules to get a feel for what this
transition would look like.
I've decided not to put explicit types in the rules. I think it
makes the rules somewhat clearer, but definitely more verbose.
In particular, the passthrough rules that don't modify the fields
in question are verbose for no real reason.
Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d
Reviewed-on: https://go-review.googlesource.com/c/go/+/190197
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Based on riscv-go port.
Updates #27532
Change-Id: Ia329daa243db63ff334053b8807ea96b97ce3acf
Reviewed-on: https://go-review.googlesource.com/c/go/+/204631
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
CL 137156 introduces an intrinsic on AMD64 that executes vfmadd231sd
when feature detection is successful. However, because floating-point
isn't allowed in note handler, the builder disables SSE instructions,
and fails when attempting to execute this instruction. This change
disables FMA on plan9 to immediately use the software fallback.
Fixes#35063.
Change-Id: I87d8f0995bd2f15013d203e618938f5079c9eed2
Reviewed-on: https://go-review.googlesource.com/c/go/+/202617
Reviewed-by: Keith Randall <khr@golang.org>
Part 1: CL 199499 (GOOS nacl)
Part 2: CL 200077 (amd64p32 files, toolchain)
Part 3: stuff that arguably should've been part of Part 2, but I forgot
one of my grep patterns when splitting the original CL up into
two parts.
This one might also have interesting stuff to resurrect for any future
x32 ABI support.
Updates #30439
Change-Id: I2b4143374a253a003666f3c69e776b7e456bdb9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/200318
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This is part two if the nacl removal. Part 1 was CL 199499.
This CL removes amd64p32 support, which might be useful in the future
if we implement the x32 ABI. It also removes the nacl bits in the
toolchain, and some remaining nacl bits.
Updates #30439
Change-Id: I2475d5bb066d1b474e00e40d95b520e7c2e286e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/200077
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Before this change, wasm only used float variables with a size of 64 bit
and applied rounding to 32 bit precision where necessary. This change
adds proper 32 bit float variables.
Reduces the size of pkg/js_wasm by 254 bytes.
Change-Id: Ieabe846a8cb283d66def3cdf11e2523b3b31f345
Reviewed-on: https://go-review.googlesource.com/c/go/+/195117
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Flagalloc has the unenviable task of splitting
flag-generating ops that have been merged with loads
when the flags need to "spilled" (i.e. regenerated).
Since there weren't very many of them, there was a hard-coded list
of ops and bespoke code written to split them.
This change migrates load splitting into rewrite rules,
to make them easier to maintain.
Change-Id: I7750eafb888a802206c410f9c341b3133e7748b8
Reviewed-on: https://go-review.googlesource.com/c/go/+/166978
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Go documentation style for boolean funcs is to say:
// Foo reports whether ...
func Foo() bool
(rather than "returns true if")
This CL also replaces 4 uses of "iff" with the same "reports whether"
wording, which doesn't lose any meaning, and will prevent people from
sending typo fixes when they don't realize it's "if and only if". In
the past I think we've had the typo CLs updated to just say "reports
whether". So do them all at once.
(Inspired by the addition of another "returns true if" in CL 146938
in fd_plan9.go)
Created with:
$ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true iff" | grep -v vendor)
$ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true if" | grep -v vendor)
Change-Id: Ided502237f5ab0d25cb625dbab12529c361a8b9f
Reviewed-on: https://go-review.googlesource.com/c/147037
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The goal of this change is to move work from walk to SSA,
and simplify things along the way.
This is hard to accomplish cleanly with small incremental changes,
so this large commit message aims to provide a roadmap to the diff.
High level description:
Prior to this change, walk was responsible for constructing (most of) the stack for function calls.
ascompatte gathered variadic arguments into a slice.
It also rewrote n.List from a list of arguments to a list of assignments to stack slots.
ascompatte was called multiple times to handle the receiver in a method call.
reorder1 then introduced temporaries into n.List as needed to avoid smashing the stack.
adjustargs then made extra stack space for go/defer args as needed.
Node to SSA construction evaluated all the statements in n.List,
and issued the function call, assuming that the stack was correctly constructed.
Intrinsic calls had to dig around inside n.List to extract the arguments,
since intrinsics don't use the stack to make function calls.
This change moves stack construction to the SSA construction phase.
ascompatte, now called walkParams, does all the work that ascompatte and reorder1 did.
It handles variadic arguments, inserts the method receiver if needed, and allocates temporaries.
It does not, however, make any assignments to stack slots.
Instead, it moves the function arguments to n.Rlist, leaving assignments to temporaries in n.List.
(It would be better to use Ninit instead of List; future work.)
During SSA construction, after doing all the temporary assignments in n.List,
the function arguments are assigned to stack slots by
constructing the appropriate SSA Value, using (*state).storeArg.
SSA construction also now handles adjustments for go/defer args.
This change also simplifies intrinsic calls, since we no longer need to undo walk's work.
Along the way, we simplify nodarg by pushing the fp==1 case to its callers, where it fits nicely.
Generated code differences:
There were a few optimizations applied along the way, the old way.
f(g()) was rewritten to do a block copy of function results to function arguments.
And reorder1 avoided introducing the final "save the stack" temporary in n.List.
The f(g()) block copy optimization never actually triggered; the order pass rewrote away g(), so that has been removed.
SSA optimizations mostly obviated the need for reorder1's optimization of avoiding the final temporary.
The exception was when the temporary's type was not SSA-able;
in that case, we got a Move into an autotmp and then an immediate Move onto the stack,
with the autotmp never read or used again.
This change introduces a new rewrite rule to detect such pointless double Moves
and collapse them into a single Move.
This is actually more powerful than the original optimization,
since the original optimization relied on the imprecise Node.HasCall calculation.
The other significant difference in the generated code is that the stack is now constructed
completely in SP-offset order. Prior to this change, the stack was constructed somewhat
haphazardly: first the final argument that Node.HasCall deemed to require a temporary,
then other arguments, then the method receiver, then the defer/go args.
SP-offset is probably a good default order. See future work.
There are a few minor object file size changes as a result of this change.
I investigated some regressions in early versions of this change.
One regression (in archive/tar) was the addition of a single CMPQ instruction,
which would be eliminated were this TODO from flagalloc to be done:
// TODO: Remove original instructions if they are never used.
One regression (in text/template) was an ADDQconstmodify that is now
a regular MOVQLoad+ADDQconst+MOVQStore, due to an unlucky change
in the order in which arguments are written. The argument change
order can also now be luckier, so this appears to be a wash.
All in all, though there will be minor winners and losers,
this change appears to be performance neutral.
Future work:
Move loading the result of function calls to SSA construction; eliminate OINDREGSP.
Consider pushing stack construction deeper into SSA world, perhaps in an arch-specific pass.
Among other benefits, this would make it easier to transition to a new calling convention.
This would require rethinking the handling of stack conflicts and is non-trivial.
Figure out some clean way to indicate that stack construction Stores/Moves
do not alias each other, so that subsequent passes may do things like
CSE+tighten shared stack setup, do DSE using non-first Stores, etc.
This would allow us to eliminate the minor text/template regression.
Possibly make assignments to stack slots not treated as statements by DWARF.
Compiler benchmarks:
name old time/op new time/op delta
Template 182ms ± 2% 179ms ± 2% -1.69% (p=0.000 n=47+48)
Unicode 86.3ms ± 5% 85.1ms ± 4% -1.36% (p=0.001 n=50+50)
GoTypes 646ms ± 1% 642ms ± 1% -0.63% (p=0.000 n=49+48)
Compiler 2.89s ± 1% 2.86s ± 2% -1.36% (p=0.000 n=48+50)
SSA 8.47s ± 1% 8.37s ± 2% -1.22% (p=0.000 n=47+50)
Flate 122ms ± 2% 121ms ± 2% -0.66% (p=0.000 n=47+45)
GoParser 147ms ± 2% 146ms ± 2% -0.53% (p=0.006 n=46+49)
Reflect 406ms ± 2% 403ms ± 2% -0.76% (p=0.000 n=48+43)
Tar 162ms ± 3% 162ms ± 4% ~ (p=0.191 n=46+50)
XML 223ms ± 2% 222ms ± 2% -0.37% (p=0.031 n=45+49)
[Geo mean] 382ms 378ms -0.89%
name old user-time/op new user-time/op delta
Template 219ms ± 3% 216ms ± 3% -1.56% (p=0.000 n=50+48)
Unicode 109ms ± 6% 109ms ± 5% ~ (p=0.190 n=50+49)
GoTypes 836ms ± 2% 828ms ± 2% -0.96% (p=0.000 n=49+48)
Compiler 3.87s ± 2% 3.80s ± 1% -1.81% (p=0.000 n=49+46)
SSA 12.0s ± 1% 11.8s ± 1% -2.01% (p=0.000 n=48+50)
Flate 142ms ± 3% 141ms ± 3% -0.85% (p=0.003 n=50+48)
GoParser 178ms ± 4% 175ms ± 4% -1.66% (p=0.000 n=48+46)
Reflect 520ms ± 2% 512ms ± 2% -1.44% (p=0.000 n=45+48)
Tar 200ms ± 3% 198ms ± 4% -0.61% (p=0.037 n=47+50)
XML 277ms ± 3% 275ms ± 3% -0.85% (p=0.000 n=49+48)
[Geo mean] 482ms 476ms -1.23%
name old alloc/op new alloc/op delta
Template 36.1MB ± 0% 35.3MB ± 0% -2.18% (p=0.008 n=5+5)
Unicode 29.8MB ± 0% 29.3MB ± 0% -1.58% (p=0.008 n=5+5)
GoTypes 125MB ± 0% 123MB ± 0% -2.13% (p=0.008 n=5+5)
Compiler 531MB ± 0% 513MB ± 0% -3.40% (p=0.008 n=5+5)
SSA 2.00GB ± 0% 1.93GB ± 0% -3.34% (p=0.008 n=5+5)
Flate 24.5MB ± 0% 24.3MB ± 0% -1.18% (p=0.008 n=5+5)
GoParser 29.4MB ± 0% 28.7MB ± 0% -2.34% (p=0.008 n=5+5)
Reflect 87.1MB ± 0% 86.0MB ± 0% -1.33% (p=0.008 n=5+5)
Tar 35.3MB ± 0% 34.8MB ± 0% -1.44% (p=0.008 n=5+5)
XML 47.9MB ± 0% 47.1MB ± 0% -1.86% (p=0.008 n=5+5)
[Geo mean] 82.8MB 81.1MB -2.08%
name old allocs/op new allocs/op delta
Template 352k ± 0% 347k ± 0% -1.32% (p=0.008 n=5+5)
Unicode 342k ± 0% 339k ± 0% -0.66% (p=0.008 n=5+5)
GoTypes 1.29M ± 0% 1.27M ± 0% -1.30% (p=0.008 n=5+5)
Compiler 4.98M ± 0% 4.87M ± 0% -2.14% (p=0.008 n=5+5)
SSA 15.7M ± 0% 15.2M ± 0% -2.86% (p=0.008 n=5+5)
Flate 233k ± 0% 231k ± 0% -0.83% (p=0.008 n=5+5)
GoParser 296k ± 0% 291k ± 0% -1.54% (p=0.016 n=5+4)
Reflect 1.05M ± 0% 1.04M ± 0% -0.65% (p=0.008 n=5+5)
Tar 343k ± 0% 339k ± 0% -0.97% (p=0.008 n=5+5)
XML 432k ± 0% 426k ± 0% -1.19% (p=0.008 n=5+5)
[Geo mean] 815k 804k -1.35%
name old object-bytes new object-bytes delta
Template 505kB ± 0% 505kB ± 0% -0.01% (p=0.008 n=5+5)
Unicode 224kB ± 0% 224kB ± 0% ~ (all equal)
GoTypes 1.82MB ± 0% 1.83MB ± 0% +0.06% (p=0.008 n=5+5)
Flate 324kB ± 0% 324kB ± 0% +0.00% (p=0.008 n=5+5)
GoParser 402kB ± 0% 402kB ± 0% +0.04% (p=0.008 n=5+5)
Reflect 1.39MB ± 0% 1.39MB ± 0% -0.01% (p=0.008 n=5+5)
Tar 449kB ± 0% 449kB ± 0% -0.02% (p=0.008 n=5+5)
XML 598kB ± 0% 597kB ± 0% -0.05% (p=0.008 n=5+5)
Change-Id: Ifc9d5c1bd01f90171414b8fb18ffe2290d271143
Reviewed-on: https://go-review.googlesource.com/c/114797
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
When compiling with -race, we insert calls to racefuncentry,
into every function. Add a rule that removes them in leaf functions,
without instrumented loads/stores.
Shaves ~30kb from "-race" version of go tool:
file difference:
go_old 15626192
go_new 15597520 [-28672 bytes]
section differences:
global text (code) = -24513 bytes (-0.358598%)
read-only data = -5849 bytes (-0.167064%)
Total difference -30362 bytes (-0.097928%)
Fixes#24662
Change-Id: Ia63bf1827f4cf2c25e3e28dcd097c150994ade0a
Reviewed-on: https://go-review.googlesource.com/121235
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
The test runs far too long for -short mode (4 seconds).
Also removed useless test of now-disconnected knob
(GO_SSA_PHI_LOC_CUTOFF), which cuts 4 seconds to 2 seconds (which
is still too long), and finished removing the disconnected knob.
Updates #26469.
Change-Id: I6c594227c4a5aaffee46832049bdbbf570d86e60
Reviewed-on: https://go-review.googlesource.com/125075
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
For register maps, we need a dense numbering of registers that may
contain pointers of interest to the garbage collector. Add this to
Register and compute it from the GP register set.
For #24543.
Change-Id: If6f0521effca5eca4d17895468b1fc52d67e0f32
Reviewed-on: https://go-review.googlesource.com/109351
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Initialization of t.UInt is missing from SetTypPtrs in config.go,
preventing rules that use it from matching when they should.
This adds the initialization to allow those rules to work.
Updated test/codegen/rotate.go to test for this case, which
appears in math/bits RotateLeft32 and RotateLeft64. There had been
a testcase for this in go 1.10 but that went away when asm_test.go
was removed.
Change-Id: I82fc825ad8364df6fc36a69a1e448214d2e24ed5
Reviewed-on: https://go-review.googlesource.com/112518
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This commit adds the wasm architecture to the compile command.
A later commit will contain the corresponding linker changes.
Design doc: https://docs.google.com/document/d/131vjr4DH6JFnb-blm_uRdaC0_Nv3OUwjEY5qVCxCup4
The following files are generated:
- src/cmd/compile/internal/ssa/opGen.go
- src/cmd/compile/internal/ssa/rewriteWasm.go
- src/cmd/internal/obj/wasm/anames.go
Updates #18892
Change-Id: Ifb4a96a3e427aac2362a1c97967d5667450fba3b
Reviewed-on: https://go-review.googlesource.com/103295
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This commit allows architectures to disable optimizations that need the
Avg* and Hmul* operations.
WebAssembly has no such operations, so using them as an optimization
but then having to emulate them with multiple instructions makes no
sense, especially since the WebAssembly compiler may do the same
optimizations internally.
Updates #18892
Change-Id: If57b59e3235482a9e0ec334a7312b3e3b5fc2b61
Reviewed-on: https://go-review.googlesource.com/103256
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
changedVars was functionally a set, but couldn't be iterated over
efficiently. In functions with many variables, the wasted iteration was
costly. Use a sparseSet instead.
(*gc.Node).String() is very expensive: it calls Sprintf, which does
reflection, etc, etc. Instead, just look at .Sym.Name, which is all we
care about.
Change-Id: Ib61cd7b5c796e1813b8859135e85da5bfe2ac686
Reviewed-on: https://go-review.googlesource.com/92402
Reviewed-by: David Chase <drchase@google.com>
On nacl/arm, R12 is clobbered by the RET instruction in function
that has a frame. runtime.udiv doesn't have a frame, so it does
not clobber R12.
Change-Id: I0de448749f615908f6659e92d201ba3eb2f8266d
Reviewed-on: https://go-review.googlesource.com/93116
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Now that the buffered write barrier is implemented for all
architectures, we can remove the old eager write barrier
implementation. This CL removes the implementation from the runtime,
support in the compiler for calling it, and updates some compiler
tests that relied on the old eager barrier support. It also makes sure
that all of the useful comments from the old write barrier
implementation still have a place to live.
Fixes#22460.
Updates #21640 since this fixes the layering concerns of the write
barrier (but not the other things in that issue).
Change-Id: I580f93c152e89607e0a72fe43370237ba97bae74
Reviewed-on: https://go-review.googlesource.com/92705
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Most write barrier calls are inserted by SSA, but copy and append are
lowered to runtime.typedslicecopy during walk. Fix these to set
Func.WBPos and emit the "write barrier" warning, as done for the write
barriers inserted by SSA. As part of this, we refactor setting WBPos
and emitting this warning into the frontend so it can be shared by
both walk and SSA.
Change-Id: I5fe9997d9bdb55e03e01dd58aee28908c35f606b
Reviewed-on: https://go-review.googlesource.com/73411
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
We used to have {Arg,Auto,Extern}Symbol structs with which we wrapped
a *gc.Node or *obj.LSym before storing them in the Aux field
of an ssa.Value. This let the SSA part of the compiler distinguish
between autos and args, for example. We no longer need the wrappers
as we can query the underlying objects directly.
There was also some sloppy usage, where VarDef had a *gc.Node
directly in its Aux field, whereas the use of that variable had
that *gc.Node wrapped in an AutoSymbol. Thus the Aux fields didn't
match (using ==) when they probably should.
This sloppy usage cleanup is the only thing in the CL that changes the
generated code - we can get rid of some more unused auto variables if
the matching happens reliably.
Removing this wrapper also lets us get rid of the varsyms cache
(which was used to prevent wrapping the same *gc.Node twice).
Change-Id: I0dedf8f82f84bfee413d310342b777316bd1d478
Reviewed-on: https://go-review.googlesource.com/64452
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
This adds rules to match the code in math/bits RotateLeft,
RotateLeft32, and RotateLef64 to allow them to be inlined.
The rules are complicated because the code in these function
use different types, and the non-const version of these
shifts generate Mask and Carry instructions that become
subexpressions during the match process.
Also adds a testcase to asm_test.go.
Improvement in math/bits:
BenchmarkRotateLeft-16 1.57 1.32 -15.92%
BenchmarkRotateLeft32-16 1.60 1.37 -14.37%
BenchmarkRotateLeft64-16 1.57 1.32 -15.92%
Updates #21390
Change-Id: Ib6f17669ecc9cab54f18d690be27e2225ca654a4
Reviewed-on: https://go-review.googlesource.com/59932
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
CL 54410 and CL 56250 recently added use of the MOVOstore
instruction to improve performance.
However, we can't use the MOVOstore instruction on Plan 9,
because floating point operations are not allowed in the
note handler.
This change adds a configuration flag useSSE to enable the
use of SSE instructions for non-floating point operations.
This flag is enabled by default and disabled on Plan 9.
When this flag is disabled, the MOVOstore instruction is
not used and the MOVQstoreconst instruction is used instead.
Fixes#21599
Change-Id: Ie609e5d9b82ec0092ae874bab4ce01caa5bc8fb8
Reviewed-on: https://go-review.googlesource.com/58850
Reviewed-by: Keith Randall <khr@golang.org>
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Now only cmd/asm and cmd/compile depend on cmd/internal/obj. Changing
the assembler backends no longer requires reinstalling cmd/link or
cmd/addr2line.
There's also now one canonical definition of the object file format in
cmd/internal/objabi/doc.go, with a warning to update all three
implementations.
objabi is still something of a grab bag of unrelated code (e.g., flag
and environment variable handling probably belong in a separate "tool"
package), but this is still progress.
Fixes#15165.
Fixes#20026.
Change-Id: Ic4b92fac7d0d35438e0d20c9579aad4085c5534c
Reviewed-on: https://go-review.googlesource.com/40972
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Prior to this CL, the SSA backend reported violations
of the //go:nowritebarrier annotation immediately.
This necessitated emitting errors during SSA compilation,
which is not compatible with a concurrent backend.
Instead, check for such violations later.
We already save the data required to do a late check
for violations of the //go:nowritebarrierrec annotation.
Use the same data, and check //go:nowritebarrier at the same time.
One downside to doing this is that now only a single
violation will be reported per function.
Given that this is for the runtime only,
and violations are rare, this seems an acceptable cost.
While we are here, remove several 'nerrors != 0' checks
that are rendered pointless.
Updates #15756Fixes#19250 (as much as it ever can be)
Change-Id: Ia44c4ad5b6fd6f804d9f88d9571cec8d23665cb3
Reviewed-on: https://go-review.googlesource.com/38973
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Concurrent compilation requires providing an
explicit position and curfn to temp.
This implementation of tempAt temporarily
continues to use the globals lineno and Curfn,
so as not to collide with mdempsky's
work for #19683 eliminating the Curfn dependency
from func nod.
Updates #15756
Updates #19683
Change-Id: Ib3149ca4b0740e9f6eea44babc6f34cdd63028a9
Reviewed-on: https://go-review.googlesource.com/38592
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Starting in go1.9, the minimum processor requirement for ppc64 is POWER8.
Therefore, the checks for OldArch and the code enabled by it are not necessary
anymore.
Updates #19074
Change-Id: I33d6a78b2462c80d57c5dbcba2e13424630afab4
Reviewed-on: https://go-review.googlesource.com/38404
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This reduces the number of calls back into the
gc Type routines, which will help performance
in a concurrent backend.
It also reduces the number of callsites
that must be considered in making the transition.
Passes toolstash-check -all. No compiler performance changes.
Updates #15756
Change-Id: Ic7a8f1daac7e01a21658ae61ac118b2a70804117
Reviewed-on: https://go-review.googlesource.com/38340
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Prior to this CL, the ssa.Frontend field was responsible
for providing types to the backend during compilation.
However, the types needed by the backend are few and static.
It makes more sense to use a struct for them
and to hang that struct off the ssa.Config,
which is the correct home for readonly data.
Now that Types is a struct, we can clean up the names a bit as well.
This has the added benefit of allowing early construction
of all types needed by the backend.
This will be useful for concurrent backend compilation.
Passes toolstash-check -all. No compiler performance change.
Updates #15756
Change-Id: I021658c8cf2836d6a22bbc20cc828ac38c7da08a
Reviewed-on: https://go-review.googlesource.com/38336
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Suggested by mdempsky in CL 38232.
This allows us to use the Frontend field
to associate frontend state and information
with a function.
See the following CL in the series for examples.
This is a giant CL, but it is almost entirely routine refactoring.
The ssa test API is starting to feel a bit unwieldy.
I will clean it up separately, once the dust has settled.
Passes toolstash -cmp.
Updates #15756
Change-Id: I71c573bd96ff7251935fce1391b06b1f133c3caf
Reviewed-on: https://go-review.googlesource.com/38327
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>