Commit graph

202 commits

Author SHA1 Message Date
Michael Pratt
2c6aaaea64 cmd/compile/internal/ssa: drop overwritten regalloc basic block input requirements
For the following description, consider the following basic block graph:

      b1 ───┐┌──── b2
            ││
            ││
            ▼▼
            b3

For register allocator transitions between basic blocks, there are two
key passes (significant paraphrasing):

First, each basic block is visited in some predetermined visit order.
This is the core visitOrder range loop in regAllocState.regalloc. The
specific ordering heuristics aren't important here, except that the
order guarantees that when visiting a basic block at least one of its
predecessors has already been visited.

Upon visiting a basic block, that block sets its expected starting
register state (regAllocState.startRegs) based on the ending register
state (regAlloc.State.endRegs) of one of its predecessors. (How it
chooses which predecessor to use is not important here.)

From that starting state, registers are assigned for all values in the
block, ultimately resulting in some ending register state.

After all blocks have been visited, the shuffle pass
(regAllocState.shuffle) ensures that for each edge, endRegs of the
predecessor == startRegs of the successor. That is, it makes sure that
the startRegs assumptions actually hold true for each edge. It does this
by adding moves to the end of the predecessor block to place values in
the expected register for the successor block. These may be moves from
other registers, or from memory if the value is spilled.

Now on to the actual problem:

Assume that b1 places some value v1 into register R10, and thus ends
with endRegs containing R10 = v1.

When b3 is visited, it selects b1 as its model predecessor and sets
startRegs with R10 = v1.

b2 does not have v1 in R10, so later in the shuffle pass, we will add a
move of v1 into R10 to the end of b2 to ensure it is available for b3.

This is all perfectly fine and exactly how things should work.

Now suppose that b3 does not use v1. It does need to use some other
value v2, which is not currently in a register. When assigning v2 to a
register, it finds all registers are already in use and it needs to dump
a value. Ultimately, it decides to dump v1 from R10 and replace it with
v2.

This is fine, but it has downstream effects on shuffle in b2. b3's
startRegs still state that R10 = v1, so b2 will add a move to R10 even
though b3 will unconditionally overwrite it. i.e., the move at the end
of b2 is completely useless and can result in code like:

// end of b2
MOV n(SP), R10 // R10 = v1 <-- useless
// start of b3
MOV m(SP), R10 // R10 = v2

This is precisely what happened in #58298.

This CL addresses this problem by dropping registers from startRegs if
they are never used in the basic block prior to getting dumped. This
allows the shuffle pass to avoid placing those useless values into the
register.

There is a significant limitation to this CL, which is that it only
impacts the immediate predecessors of an overwriting block. We can
discuss this by zooming out a bit on the previous graph:

b4 ───┐┌──── b5
      ││
      ││
      ▼▼
      b1 ───┐┌──── b2
            ││
            ││
            ▼▼
            b3

Here we have the same graph, except we can see the two predecessors of
b1.

Now suppose that rather than b1 assigning R10 = v1 as above, the
assignment is done in b4. b1 has startRegs R10 = v1, doesn't use the
value at all, and simply passes it through to endRegs R10 = v1.

Now the shuffle pass will require both b2 and b5 to add a move to
assigned R10 = v1, because that is specified in their successor
startRegs.

With this CL, b3 drops R10 = v1 from startRegs, but there is no
backwards propagation, so b1 still has R10 = v1 in startRegs, and b5
still needs to add a useless move.

Extending this CL with such propagation may significantly increase the
number of useless moves we can remove, though it will add complexity to
maintenance and could potentially impact build performance depending on
how efficiently we could implement the propagation (something I haven't
considered carefully).

As-is, this optimization does not impact much code. In bent .text size
geomean is -0.02%. In the container/heap test binary, 18 of ~2500
functions are impacted by this CL. Bent and sweet do not show a
noticeable performance impact one way or another, however #58298 does
show a case where this can have impact if the useless instructions end
up in the hot path of a tight loop.

For #58298.

Change-Id: I2fcef37c955159d068fa0725f995a1848add8a5f
Reviewed-on: https://go-review.googlesource.com/c/go/+/471158
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2023-03-17 19:01:54 +00:00
Keith Randall
4a0e84a1be cmd/compile: improve register overwrite decision for resultInArg0 ops
When we're compiling a resultInArg0 op, we need to clobber the
register containing the input value. So we first make a register copy
of the input value. We can then clobber either of the two registers
the value is in and still have the original input value in a register
for future uses.

Before this CL, we always clobbered the original, not the copy.
But that's not always the right decision - if the original is already
in a specific register that it needs to be in later (typically, a
return value register), clobber the copy instead.

This optimization can remove a mov instruction. It saves 1376 bytes
of instructions in cmd/go.

Redo of CL 460656, reverted at CL 463475, with a fix for s390x.

The new code just ensures that the copied value is in a register
which is a valid input register for the instruction.

Change-Id: Id570b8a60a6d2da9090de80a90b6bb0266e9e38a
Reviewed-on: https://go-review.googlesource.com/c/go/+/463221
Auto-Submit: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-01-26 00:07:40 +00:00
Keith Randall
2bf0f54bbd Revert "cmd/compile: improve register overwrite decision for resultInArg0 ops"
This reverts CL 460656
Reason for revert: This breaks s390x.

Change-Id: I8fada14fabc90593b8033ed11188c04963d2da75
Reviewed-on: https://go-review.googlesource.com/c/go/+/463475
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
2023-01-25 19:25:48 +00:00
Keith Randall
3c7884a21b cmd/compile: improve register overwrite decision for resultInArg0 ops
When we're compiling a resultInArg0 op, we need to clobber the
register containing the input value. So we first make a register copy
of the input value. We can then clobber either of the two registers
the value is in and still have the original input value in a register
for future uses.

Before this CL, we always clobbered the original, not the copy.
But that's not always the right decision - if the original is already
in a specific register that it needs to be in later (typically, a
return value register), clobber the copy instead.

This optimization can remove a mov instruction. It saves 1376 bytes
of instructions in cmd/go.

Change-Id: I162870c84b9a180da6715bb24c296a902974fed3
Reviewed-on: https://go-review.googlesource.com/c/go/+/460656
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
2023-01-25 18:50:22 +00:00
Keith Randall
9088c691da cmd/compile: ensure temp register mask isn't empty
We need to avoid nospill registers at this point in regalloc.
Make sure that we don't restrict our register set to avoid registers
desired by other instructions, if the resulting set includes only
nospill registers.

Fixes #57846

Change-Id: I05478e4513c484755dc2e8621d73dac868e45a27
Reviewed-on: https://go-review.googlesource.com/c/go/+/461685
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-01-17 18:21:06 +00:00
Keith Randall
5f7abeca5a cmd/compile: teach regalloc about temporary registers
Temporary registers are sometimes needed for an architecture backend
which needs to use several machine instructions to implement a single
SSA instruction.

Mark such instructions so that regalloc can reserve the temporary register
for it. That way we don't have to reserve a fixed register like we do now.

Convert the temp-register-using instructions on amd64 to use this
new mechanism. Other archs can follow as needed.

Change-Id: I1d0c8588afdad5cd18b4398eb5a0f755be5dead7
Reviewed-on: https://go-review.googlesource.com/c/go/+/398556
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2022-11-17 18:53:13 +00:00
liu-xuewen
e81263c791 cmd/compile: remove issueSpill
Remove the useless issueSpill and continue directly.

Change-Id: I085e566be6f7200235e1bfe1f56a8e959316386a
GitHub-Last-Rev: 84db90cf34
GitHub-Pull-Request: golang/go#56520
Reviewed-on: https://go-review.googlesource.com/c/go/+/447195
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Auto-Submit: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2022-11-03 15:30:34 +00:00
Keith Randall
68bd383368 cmd/compile: add cache of sizeable objects so they can be reused
We kind of have this mechanism already, just normalizing it and
using it in a bunch of places. Previously a bunch of places cached
slices only for the duration of a single function compilation. Now
we can reuse slices across a whole compiler run.

Use a sync.Pool of powers-of-two sizes. This lets us use not
too much memory, and avoid holding onto memory we're no longer
using when a GC happens.

There's a few different types we need, so generate the code for it.
Generics would be useful here, but we can't use generics in the
compiler because of bootstrapping.

Change-Id: I6cf37e7b7b2e802882aaa723a0b29770511ccd82
Reviewed-on: https://go-review.googlesource.com/c/go/+/444820
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2022-10-31 21:41:20 +00:00
Keith Randall
7ddc45263c cmd/compile: separate out sparsemaps that need position
Make them a separate type, so the normal sparse maps don't
need the extra storage.

Change-Id: I3a0219487c35ea63723499723b0c742e321d97c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/444819
Reviewed-by: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2022-10-31 21:41:06 +00:00
Johan Brandhorst-Satzkorn
85196fc982 cmd/internal/ssa: correct references to _gen folder
The gen folder was renamed to _gen in CL 435472, but references in code
and docs were not updated. This updates the references.

Change-Id: Ibadc0cdcb5bed145c3257b58465a8df370487ae5
Reviewed-on: https://go-review.googlesource.com/c/go/+/444355
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Johan Brandhorst-Satzkorn <johan.brandhorst@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-10-23 17:42:11 +00:00
Keith Randall
bdd0f0f780 cmd/compile: better propagation of desired registers
This fixes two independent problems:

We normally propagate desired registers backwards through opcodes that
are marked resultInArg0. Unfortunately for the desired register
computation, ADDQconst is not marked as resultInArg0. This is because
the amd64 backend can write it out as LEAQ instead if the input and
output registers don't match. For desired register purposes, we want
to treat ADDQconst as resultInArg0, so that we get an ADDQ instead of
a LEAQ if we can.

Desired registers don't currently work for tuple-generating opcodes.
Declare that the desired register applies to the first element of the
tuple, and propagate the desired register back through Select0.

Noticed when fixing #51964

Change-Id: I83346b988882cd58c2d7e7e5b419a2b9a244ab66
Reviewed-on: https://go-review.googlesource.com/c/go/+/396035
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-03-31 20:19:48 +00:00
Joel Sing
6458b2e8db all: add support for c-archive and c-shared on linux/riscv64
This provides the runtime glue (_rt0_riscv64_linux_lib) for c-archive and c-shared
support, along with enabling both of these buildmodes on linux/riscv64.

Both misc/cgo/testcarchive and misc/cgo/testcshared now pass on this platform.

Fixes #47100

Change-Id: I7ad75b23ae1d592dbac60d15bba557668287711f
Reviewed-on: https://go-review.googlesource.com/c/go/+/334872
Trust: Joel Sing <joel@sing.id.au>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-11-03 09:23:34 +00:00
Cherry Mui
e35b5b25d9 cmd/compile: fix typo in comment in CL 358435
Change-Id: I0d8128668fc7a80b29aabc58dbc9a2929b889ec9
Reviewed-on: https://go-review.googlesource.com/c/go/+/358614
Trust: Cherry Mui <cherryyz@google.com>
Trust: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2021-10-25 21:02:43 +00:00
Cherry Mui
9012996a9a cmd/compile: don't clobber LR for tail calls
When doing a tail call the link register is live as the callee
will directly return to the caller (of the function that does the
tail call). Don't allocate or clobber the link register.

Fixes #49032.

Change-Id: I2d60f2354e5b6c14aa285c8983a9786687b90223
Reviewed-on: https://go-review.googlesource.com/c/go/+/358435
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-10-25 17:46:41 +00:00
Cuong Manh Le
f686f6a963 cmd/compile: remove Value.RemoveArg
It's only used in two places:

 - The one in regalloc.go can be replaced with v.resetArgs()
 - The one in rewrite.go can be open coded

and can cause wrong usage like the bug that CL 358117 fixed.

Change-Id: I125baf237db159d056fe4b1c73072331eea4d06a
Reviewed-on: https://go-review.googlesource.com/c/go/+/357965
Trust: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-10-25 02:59:35 +00:00
Joel Sing
dcee007aad cmd/compile: sort regalloc switch by architecture
Also tweak comment for the arm64 case.

Change-Id: I073405bd2acf901dcaaf33a034a84b6a09dd4a83
Reviewed-on: https://go-review.googlesource.com/c/go/+/334869
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Meng Zhuo <mzh@golangcn.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-08-21 08:58:07 +00:00
cuiweixie
ea8298e2f5 cmd/compile/internal/ssa: delete unused code
Fixes #46186

Change-Id: Idb0674079f9484593e07cca172dfbb19be0e594d
GitHub-Last-Rev: 615fc53655
GitHub-Pull-Request: golang/go#46185
Reviewed-on: https://go-review.googlesource.com/c/go/+/320111
Reviewed-by: Ben Shi <powerman1st@163.com>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: David Chase <drchase@google.com>
2021-08-16 14:50:20 +00:00
Than McIntosh
56af34f875 cmd/compile: place reg spills after OpArg{Int,Float}Reg ops
Tweak the register allocator to maintain the invariant that
OpArg{Int,Float}Reg values are placed together at the start of the
entry block, before any other non-pseudo-op values. Without this
change, when the register allocator adds spills we can wind up with an
interleaving of OpArg*Reg and stores, which complicates debug location
analysis.

Updates #40724.

Change-Id: Icf30dd814a9e25263ecbea2e48feb840a6e7f2bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/322630
Trust: Than McIntosh <thanm@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-05-27 18:47:37 +00:00
Cherry Zhang
5b328c4a2f cmd/compile: use desired register only if it satisfies register mask
In the register allocator, if possible, we allocate a value to its
desired register (the ideal register for its next use). In some
cases the desired register does not satisfies the value's output
register mask. We should not use the register in this case.

In the following example, v33 is going to be returned as a
function result, so it is allocated to its desired register AX.
However, its Op cannot use AX as output, causing miscompilation.

v33 = CMOVQEQF <int> v24 v28 v29 : AX (~R0[int])
v35 = MakeResult <int,int,mem> v33 v26 v18
Ret v35

Change-Id: Id0f4f27c4b233ee297e83077e3c8494fe193e664
Reviewed-on: https://go-review.googlesource.com/c/go/+/314630
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
2021-04-28 16:13:40 +00:00
Cherry Zhang
6b8e3e2d06 cmd/compile: reduce redundant register moves for regabi calls
Currently, if we have AX=a and BX=b, and we want to make a call
F(1, a, b), to move arguments into the desired registers it emits

	MOVQ AX, CX
	MOVL $1, AX // AX=1
	MOVQ BX, DX
	MOVQ CX, BX // BX=a
	MOVQ DX, CX // CX=b

This has a few redundant moves.

This is because we process inputs in order. First, allocate 1 to
AX, which kicks out a (in AX) to CX (a free register at the
moment). Then, allocate a to BX, which kicks out b (in BX) to DX.
Finally, put b to CX.

Notice that if we start with allocating CX=b, then BX=a, AX=1,
we will not have redundant moves. This CL reduces redundant moves
by allocating them in different order: First, for inpouts that are
already in place, keep them there. Then allocate free registers.
Then everything else.

                             before       after
cmd/compile binary size     23703888    23609680
            text size        8565899     8533291

(with regabiargs enabled.)

Change-Id: I69e1bdf745f2c90bb791f6d7c45b37384af1e874
Reviewed-on: https://go-review.googlesource.com/c/go/+/311371
Trust: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
2021-04-19 16:21:39 +00:00
Russ Cox
95ed5c3800 internal/buildcfg: move build configuration out of cmd/internal/objabi
The go/build package needs access to this configuration,
so move it into a new package available to the standard library.

Change-Id: I868a94148b52350c76116451f4ad9191246adcff
Reviewed-on: https://go-review.googlesource.com/c/go/+/310731
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Jay Conrod <jayconrod@google.com>
2021-04-16 19:20:53 +00:00
Cherry Zhang
6ae3b70ef2 cmd/compile: add clobberdeadreg mode
When -clobberdeadreg flag is set, the compiler inserts code that
clobbers integer registers at call sites. This may be helpful for
debugging register ABI.

Only implemented on AMD64 for now.

Change-Id: Ia203d3f891c30fd95d0103489056fe01d63a2899
Reviewed-on: https://go-review.googlesource.com/c/go/+/302809
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2021-03-19 23:21:21 +00:00
Austin Clements
0c93b16d01 cmd: move experiment flags into objabi.Experiment
This moves all remaining GOEXPERIMENT flags into the objabi.Experiment
struct, drops the "_enabled" from their name, and makes them all bool
typed.

We also drop DebugFlags.Fieldtrack because the previous CL shifted the
one test that used it to use GOEXPERIMENT instead.

Change-Id: I3406fe62b1c300bb4caeaffa6ca5ce56a70497fe
Reviewed-on: https://go-review.googlesource.com/c/go/+/302389
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2021-03-18 21:27:23 +00:00
erifan01
600259b099 cmd/compile: use depth first topological sort algorithm for layout
The current layout algorithm tries to put consecutive blocks together,
so the priority of the successor block is higher than the priority of
the zero indegree block. This algorithm is beneficial for subsequent
register allocation, but will result in more branch instructions.
The depth-first topological sorting algorithm is a well-known layout
algorithm, which has applications in many languages, and it helps to
reduce branch instructions. This CL applies it to the layout pass.
The test results show that it helps to reduce the code size.

This CL also includes the following changes:
1, Removed the primary predecessor mechanism. The new layout algorithm is
  not very friendly to register allocator in some cases, in order to adapt
  to the new layout algorithm, a new primary predecessor selection strategy
  is introduced.
2, Since the new layout implementation may place non-loop blocks between
  loop blocks, some adaptive modifications have also been made to looprotate
  pass.
3, The layout also affects the results of codegen, so this CL also adjusted
  several codegen tests accordingly.

It is inevitable that this CL will cause the code size or performance of a
few functions to decrease, but the number of cases it improves is much larger
than the number of cases it drops.

Statistical data from compilecmp on linux/amd64 is as follow:
name                      old time/op       new time/op       delta
Template                        382ms ± 4%        382ms ± 4%    ~     (p=0.497 n=49+50)
Unicode                         170ms ± 9%        169ms ± 8%    ~     (p=0.344 n=48+50)
GoTypes                         2.01s ± 4%        2.01s ± 4%    ~     (p=0.628 n=50+48)
Compiler                        190ms ±10%        189ms ± 9%    ~     (p=0.734 n=50+50)
SSA                             11.8s ± 2%        11.8s ± 3%    ~     (p=0.877 n=50+50)
Flate                           241ms ± 9%        241ms ± 8%    ~     (p=0.897 n=50+49)
GoParser                        366ms ± 3%        361ms ± 4%  -1.21%  (p=0.004 n=47+50)
Reflect                         835ms ± 3%        838ms ± 3%    ~     (p=0.275 n=50+49)
Tar                             336ms ± 4%        335ms ± 3%    ~     (p=0.454 n=48+48)
XML                             433ms ± 4%        431ms ± 3%    ~     (p=0.071 n=49+48)
LinkCompiler                    706ms ± 4%        705ms ± 4%    ~     (p=0.608 n=50+49)
ExternalLinkCompiler            1.85s ± 3%        1.83s ± 2%  -1.47%  (p=0.000 n=49+48)
LinkWithoutDebugCompiler        437ms ± 5%        437ms ± 6%    ~     (p=0.953 n=49+50)
[Geo mean]                      615ms             613ms       -0.37%

name                      old alloc/op      new alloc/op      delta
Template                       38.7MB ± 1%       38.7MB ± 1%    ~     (p=0.834 n=50+50)
Unicode                        28.1MB ± 0%       28.1MB ± 0%  -0.22%  (p=0.000 n=49+50)
GoTypes                         168MB ± 1%        168MB ± 1%    ~     (p=0.054 n=47+47)
Compiler                       23.0MB ± 1%       23.0MB ± 1%    ~     (p=0.432 n=50+50)
SSA                            1.54GB ± 0%       1.54GB ± 0%  +0.21%  (p=0.000 n=50+50)
Flate                          23.6MB ± 1%       23.6MB ± 1%    ~     (p=0.153 n=43+46)
GoParser                       35.1MB ± 1%       35.1MB ± 2%    ~     (p=0.202 n=50+50)
Reflect                        84.7MB ± 1%       84.7MB ± 1%    ~     (p=0.333 n=48+49)
Tar                            34.5MB ± 1%       34.5MB ± 1%    ~     (p=0.406 n=46+49)
XML                            44.3MB ± 2%       44.2MB ± 3%    ~     (p=0.981 n=50+50)
LinkCompiler                    131MB ± 0%        128MB ± 0%  -2.74%  (p=0.000 n=50+50)
ExternalLinkCompiler            120MB ± 0%        120MB ± 0%  +0.01%  (p=0.007 n=50+50)
LinkWithoutDebugCompiler       77.3MB ± 0%       77.3MB ± 0%  -0.02%  (p=0.000 n=50+50)
[Geo mean]                     69.3MB            69.1MB       -0.22%

file      before    after     Δ        %
addr2line 4104220   4043684   -60536   -1.475%
api       5342502   5249678   -92824   -1.737%
asm       4973785   4858257   -115528  -2.323%
buildid   2667844   2625660   -42184   -1.581%
cgo       4686849   4616313   -70536   -1.505%
compile   23667431  23268406  -399025  -1.686%
cover     4959676   4874108   -85568   -1.725%
dist      3515934   3450422   -65512   -1.863%
doc       3995581   3925469   -70112   -1.755%
fix       3379202   3318522   -60680   -1.796%
link      6743249   6629913   -113336  -1.681%
nm        4047529   3991777   -55752   -1.377%
objdump   4456151   4388151   -68000   -1.526%
pack      2435040   2398072   -36968   -1.518%
pprof     13804080  13565808  -238272  -1.726%
test2json 2690043   2645987   -44056   -1.638%
trace     10418492  10232716  -185776  -1.783%
vet       7258259   7121259   -137000  -1.888%
total     113145867 111204202 -1941665 -1.716%

The situation on linux/arm64 is as follow:
name                      old time/op       new time/op       delta
Template                        280ms ± 1%        282ms ± 1%  +0.75%  (p=0.000 n=46+48)
Unicode                         124ms ± 2%        124ms ± 2%  +0.37%  (p=0.045 n=50+50)
GoTypes                         1.69s ± 1%        1.70s ± 1%  +0.56%  (p=0.000 n=49+50)
Compiler                        122ms ± 1%        123ms ± 1%  +0.93%  (p=0.000 n=50+50)
SSA                             12.6s ± 1%        12.7s ± 0%  +0.72%  (p=0.000 n=50+50)
Flate                           170ms ± 1%        172ms ± 1%  +0.97%  (p=0.000 n=49+49)
GoParser                        262ms ± 1%        263ms ± 1%  +0.39%  (p=0.000 n=49+48)
Reflect                         639ms ± 1%        650ms ± 1%  +1.63%  (p=0.000 n=49+49)
Tar                             243ms ± 1%        245ms ± 1%  +0.82%  (p=0.000 n=50+50)
XML                             324ms ± 1%        327ms ± 1%  +0.72%  (p=0.000 n=50+49)
LinkCompiler                    597ms ± 1%        596ms ± 1%  -0.27%  (p=0.001 n=48+47)
ExternalLinkCompiler            1.90s ± 1%        1.88s ± 1%  -1.00%  (p=0.000 n=50+50)
LinkWithoutDebugCompiler        364ms ± 1%        363ms ± 1%    ~     (p=0.220 n=49+50)
[Geo mean]                      485ms             488ms       +0.49%

name                      old alloc/op      new alloc/op      delta
Template                       38.7MB ± 0%       38.8MB ± 1%    ~     (p=0.093 n=43+49)
Unicode                        28.4MB ± 0%       28.4MB ± 0%  +0.03%  (p=0.000 n=49+45)
GoTypes                         169MB ± 1%        169MB ± 1%  +0.23%  (p=0.010 n=50+50)
Compiler                       23.2MB ± 1%       23.2MB ± 1%  +0.11%  (p=0.000 n=40+44)
SSA                            1.54GB ± 0%       1.55GB ± 0%  +0.45%  (p=0.000 n=47+49)
Flate                          23.8MB ± 2%       23.8MB ± 1%    ~     (p=0.543 n=50+50)
GoParser                       35.3MB ± 1%       35.4MB ± 1%    ~     (p=0.792 n=50+50)
Reflect                        85.2MB ± 1%       85.2MB ± 0%    ~     (p=0.055 n=50+47)
Tar                            34.5MB ± 1%       34.5MB ± 1%  +0.06%  (p=0.015 n=50+50)
XML                            43.8MB ± 2%       43.9MB ± 2%  +0.19%  (p=0.000 n=48+48)
LinkCompiler                    137MB ± 0%        136MB ± 0%  -0.92%  (p=0.000 n=50+50)
ExternalLinkCompiler            127MB ± 0%        127MB ± 0%    ~     (p=0.516 n=50+50)
LinkWithoutDebugCompiler       84.0MB ± 0%       84.0MB ± 0%    ~     (p=0.057 n=50+50)
[Geo mean]                     70.4MB            70.4MB       +0.01%

file      before    after     Δ        %
addr2line 4021557   4002933   -18624   -0.463%
api       5127847   5028503   -99344   -1.937%
asm       5034716   4936836   -97880   -1.944%
buildid   2608118   2594094   -14024   -0.538%
cgo       4488592   4398320   -90272   -2.011%
compile   22501129  22213592  -287537  -1.278%
cover     4742301   4713573   -28728   -0.606%
dist      3388071   3365311   -22760   -0.672%
doc       3802250   3776082   -26168   -0.688%
fix       3306147   3216939   -89208   -2.698%
link      6404483   6363699   -40784   -0.637%
nm        3941026   3921930   -19096   -0.485%
objdump   4383330   4295122   -88208   -2.012%
pack      2404547   2389515   -15032   -0.625%
pprof     12996234  12856818  -139416  -1.073%
test2json 2668500   2586788   -81712   -3.062%
trace     9816276   9609580   -206696  -2.106%
vet       6900682   6787338   -113344  -1.643%
total     108535806 107056973 -1478833 -1.363%

Change-Id: Iaec1cdcaacca8025e9babb0fb8a532fddb70c87d
Reviewed-on: https://go-review.googlesource.com/c/go/+/255239
Reviewed-by: eric fang <eric.fang@arm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: eric fang <eric.fang@arm.com>
2021-03-16 02:44:54 +00:00
Cherry Zhang
c4190fc34d cmd/compile: remove ARMv5 special case in register allocator
The register allocator has a special case that doesn't allocate
LR on ARMv5. This was necessary when softfloat expansion was done
by the assembler. Now softfloat calls are inserted by SSA, so it
works as normal. Remove this special case.

Change-Id: I5502f07597f4d4b675dc16b6b0d7cb47e1e8974b
Reviewed-on: https://go-review.googlesource.com/c/go/+/301792
Trust: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2021-03-15 18:11:32 +00:00
David Chase
9d88a9e2bf cmd/compile: implement simple register results
at least for ints and strings

includes simple test

For #40724.

Change-Id: Ib8484e5b957b08f961574a67cfd93d3d26551558
Reviewed-on: https://go-review.googlesource.com/c/go/+/295309
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-03-04 19:45:11 +00:00
David Chase
4532467c18 cmd/compile: pass register parameters to called function
still needs morestack
still needs results
lots of corner cases also not dealt with.

For #40724.

Change-Id: I03abdf1e8363d75c52969560b427e488a48cd37a
Reviewed-on: https://go-review.googlesource.com/c/go/+/293889
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
2021-03-04 03:30:55 +00:00
David Chase
9f33dc3ca1 cmd/compile: handle aggregate OpArg in registers
Also handles case where OpArg does not escape but has its address
taken.

May have exposed a lurking bug in 1.16 expandCalls,
if e.g., loading len(someArrayOfstructThing[0].secondStringField)
from a local.  Maybe.

For #40724.

Change-Id: I0298c4ad5d652b5e3d7ed6a62095d59e2d8819c7
Reviewed-on: https://go-review.googlesource.com/c/go/+/293396
Trust: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-03-03 15:00:48 +00:00
David Chase
f2df1e3c34 cmd/compile: retrieve Args from registers
in progress; doesn't fully work until they are also passed on
register on the caller side.

For #40724.

Change-Id: I29a6680e60bdbe9d132782530214f2a2b51fb8f6
Reviewed-on: https://go-review.googlesource.com/c/go/+/293394
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-03-03 03:02:39 +00:00
David Chase
e25040d162 cmd/compile: change StaticCall to return a "Results"
StaticLECall (multiple value in +mem, multiple value result +mem) ->
StaticCall (multiple ergister value in +mem,
   multiple register-sized-value result +mem) ->
ARCH CallStatic (multiple ergister value in +mem,
   multiple register-sized-value result +mem)

But the architecture-dependent stuff is indifferent to whether
it is mem->mem or (mem)->(mem) until Prog generation.

Deal with OpSelectN -> Prog in ssagen/ssa.go, others, as they
appear.

For #40724.

Change-Id: I1d0436f6371054f1881862641d8e7e418e4a6a16
Reviewed-on: https://go-review.googlesource.com/c/go/+/293391
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-02-26 02:52:33 +00:00
Matthew Dempsky
6db970e20a [dev.regabi] cmd/compile: rewrite Aux uses of ir.Node to *ir.Name [generated]
Now that the only remaining ir.Node implementation that is stored
(directly) into ssa.Aux, we can rewrite all of the conversions between
ir.Node and ssa.Aux to use *ir.Name instead.

rf doesn't have a way to rewrite the type switch case clauses, so we
just use sed instead. There's only a handful, and they're the only
times that "case ir.Node" appears anyway.

The next CL will move the tag method declarations so that ir.Node no
longer implements ssa.Aux.

Passes buildall w/ toolstash -cmp.

Updates #42982.

[git-generate]
cd src/cmd/compile/internal
sed -i -e 's/case ir.Node/case *ir.Name/' gc/plive.go */ssa.go

cd ssa
rf '
ex . ../gc {
  import "cmd/compile/internal/ir"

  var v *Value
  v.Aux.(ir.Node) -> v.Aux.(*ir.Name)

  var n ir.Node
  var asAux func(Aux)
  strict n        # only match ir.Node-typed expressions; not *ir.Name
  implicit asAux  # match implicit assignments to ssa.Aux
  asAux(n)        -> n.(*ir.Name)
}
'

Change-Id: I3206ef5f12a7cfa37c5fecc67a1ca02ea4d52b32
Reviewed-on: https://go-review.googlesource.com/c/go/+/275789
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2020-12-08 01:46:57 +00:00
Matthew Dempsky
5ffa275f3c [dev.regabi] cmd/compile: first pass at abstracting Type
Passes toolstash/buildall.

[git-generate]
cd src/cmd/compile/internal/ssa
rf '
ex . ../ir ../gc {
  import "cmd/compile/internal/types"
  var t *types.Type
  t.Etype -> t.Kind()
  t.Sym -> t.GetSym()
  t.Orig -> t.Underlying()
}
'

cd ../types
rf '
mv EType Kind
mv IRNode Object

mv Type.Etype Type.kind
mv Type.Sym Type.sym
mv Type.Orig Type.underlying
mv Type.Cache Type.cache

mv Type.GetSym Type.Sym

mv Bytetype ByteType
mv Runetype RuneType
mv Errortype ErrorType
'

cd ../gc
sed -i 's/Bytetype/ByteType/; s/Runetype/RuneType/' mkbuiltin.go

git codereview gofmt
go install cmd/compile/internal/...
go test cmd/compile -u || go test cmd/compile

Change-Id: Ibecb2d7100d3318a49238eb4a78d70acb49eedca
Reviewed-on: https://go-review.googlesource.com/c/go/+/274437
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Trust: Matthew Dempsky <mdempsky@google.com>
2020-12-01 19:24:00 +00:00
Russ Cox
41f3af9d04 [dev.regabi] cmd/compile: replace *Node type with an interface Node [generated]
The plan is to introduce a Node interface that replaces the old *Node pointer-to-struct.

The previous CL defined an interface INode modeling a *Node.

This CL:
 - Changes all references outside internal/ir to use INode,
   along with many references inside internal/ir as well.
 - Renames Node to node.
 - Renames INode to Node

So now ir.Node is an interface implemented by *ir.node, which is otherwise inaccessible,
and the code outside package ir is now (clearly) using only the interface.

The usual rule is never to redefine an existing name with a new meaning,
so that old code that hasn't been updated gets a "unknown name" error
instead of more mysterious errors or silent misbehavior. That rule would
caution against replacing Node-the-struct with Node-the-interface,
as in this CL, because code that says *Node would now be using a pointer
to an interface. But this CL is being landed at the same time as another that
moves Node from gc to ir. So the net effect is to replace *gc.Node with ir.Node,
which does follow the rule: any lingering references to gc.Node will be told
it's gone, not silently start using pointers to interfaces. So the rule is followed
by the CL sequence, just not this specific CL.

Overall, the loss of inlining caused by using interfaces cuts the compiler speed
by about 6%, a not insignificant amount. However, as we convert the representation
to concrete structs that are not the giant Node over the next weeks, that speed
should come back as more of the compiler starts operating directly on concrete types
and the memory taken up by the graph of Nodes drops due to the more precise
structs. Honestly, I was expecting worse.

% benchstat bench.old bench.new
name                      old time/op       new time/op       delta
Template                        168ms ± 4%        182ms ± 2%   +8.34%  (p=0.000 n=9+9)
Unicode                        72.2ms ±10%       82.5ms ± 6%  +14.38%  (p=0.000 n=9+9)
GoTypes                         563ms ± 8%        598ms ± 2%   +6.14%  (p=0.006 n=9+9)
Compiler                        2.89s ± 4%        3.04s ± 2%   +5.37%  (p=0.000 n=10+9)
SSA                             6.45s ± 4%        7.25s ± 5%  +12.41%  (p=0.000 n=9+10)
Flate                           105ms ± 2%        115ms ± 1%   +9.66%  (p=0.000 n=10+8)
GoParser                        144ms ±10%        152ms ± 2%   +5.79%  (p=0.011 n=9+8)
Reflect                         345ms ± 9%        370ms ± 4%   +7.28%  (p=0.001 n=10+9)
Tar                             149ms ± 9%        161ms ± 5%   +8.05%  (p=0.001 n=10+9)
XML                             190ms ± 3%        209ms ± 2%   +9.54%  (p=0.000 n=9+8)
LinkCompiler                    327ms ± 2%        325ms ± 2%     ~     (p=0.382 n=8+8)
ExternalLinkCompiler            1.77s ± 4%        1.73s ± 6%     ~     (p=0.113 n=9+10)
LinkWithoutDebugCompiler        214ms ± 4%        211ms ± 2%     ~     (p=0.360 n=10+8)
StdCmd                          14.8s ± 3%        15.9s ± 1%   +6.98%  (p=0.000 n=10+9)
[Geo mean]                      480ms             510ms        +6.31%

name                      old user-time/op  new user-time/op  delta
Template                        223ms ± 3%        237ms ± 3%   +6.16%  (p=0.000 n=9+10)
Unicode                         103ms ± 6%        113ms ± 3%   +9.53%  (p=0.000 n=9+9)
GoTypes                         758ms ± 8%        800ms ± 2%   +5.55%  (p=0.003 n=10+9)
Compiler                        3.95s ± 2%        4.12s ± 2%   +4.34%  (p=0.000 n=10+9)
SSA                             9.43s ± 1%        9.74s ± 4%   +3.25%  (p=0.000 n=8+10)
Flate                           132ms ± 2%        141ms ± 2%   +6.89%  (p=0.000 n=9+9)
GoParser                        177ms ± 9%        183ms ± 4%     ~     (p=0.050 n=9+9)
Reflect                         467ms ±10%        495ms ± 7%   +6.17%  (p=0.029 n=10+10)
Tar                             183ms ± 9%        197ms ± 5%   +7.92%  (p=0.001 n=10+10)
XML                             249ms ± 5%        268ms ± 4%   +7.82%  (p=0.000 n=10+9)
LinkCompiler                    544ms ± 5%        544ms ± 6%     ~     (p=0.863 n=9+9)
ExternalLinkCompiler            1.79s ± 4%        1.75s ± 6%     ~     (p=0.075 n=10+10)
LinkWithoutDebugCompiler        248ms ± 6%        246ms ± 2%     ~     (p=0.965 n=10+8)
[Geo mean]                      483ms             504ms        +4.41%

[git-generate]
cd src/cmd/compile/internal/ir
: # We need to do the conversion in multiple steps, so we introduce
: # a temporary type alias that will start out meaning the pointer-to-struct
: # and then change to mean the interface.
rf '
	mv Node OldNode

	add node.go \
		type Node = *OldNode
'

: # It should work to do this ex in ir, but it misses test files, due to a bug in rf.
: # Run the command in gc to handle gc's tests, and then again in ssa for ssa's tests.
cd ../gc
rf '
        ex .  ../arm ../riscv64 ../arm64 ../mips64 ../ppc64 ../mips ../wasm {
                import "cmd/compile/internal/ir"
                *ir.OldNode -> ir.Node
        }
'
cd ../ssa
rf '
        ex {
                import "cmd/compile/internal/ir"
                *ir.OldNode -> ir.Node
        }
'

: # Back in ir, finish conversion clumsily with sed,
: # because type checking and circular aliases do not mix.
cd ../ir
sed -i '' '
	/type Node = \*OldNode/d
	s/\*OldNode/Node/g
	s/^func (n Node)/func (n *OldNode)/
	s/OldNode/node/g
	s/type INode interface/type Node interface/
	s/var _ INode = (Node)(nil)/var _ Node = (*node)(nil)/
' *.go
gofmt -w *.go

sed -i '' '
	s/{Func{}, 136, 248}/{Func{}, 152, 280}/
	s/{Name{}, 32, 56}/{Name{}, 44, 80}/
	s/{Param{}, 24, 48}/{Param{}, 44, 88}/
	s/{node{}, 76, 128}/{node{}, 88, 152}/
' sizeof_test.go

cd ../ssa
sed -i '' '
	s/{LocalSlot{}, 28, 40}/{LocalSlot{}, 32, 48}/
' sizeof_test.go

cd ../gc
sed -i '' 's/\*ir.Node/ir.Node/' mkbuiltin.go

cd ../../../..
go install std cmd
cd cmd/compile
go test -u || go test -u

Change-Id: I196bbe3b648e4701662e4a2bada40bf155e2a553
Reviewed-on: https://go-review.googlesource.com/c/go/+/272935
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-25 17:30:43 +00:00
Russ Cox
048debb224 [dev.regabi] cmd/compile: remove gc ↔ ssa cycle hacks
The cycle hacks existed because gc needed to import ssa
which need to know about gc.Node. But now that's ir.Node,
and there's no cycle anymore.

Don't know how much it matters but LocalSlot is now
one word shorter than before, because it holds a pointer
instead of an interface for the *Node. That won't last long.

Now that they're not necessary for interface satisfaction,
IsSynthetic and IsAutoTmp can move to top-level ir functions.

Change-Id: Ie511e93466cfa2b17d9a91afc4bd8d53fdb80453
Reviewed-on: https://go-review.googlesource.com/c/go/+/272931
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-25 17:30:36 +00:00
Russ Cox
9e0e43d84d [dev.regabi] cmd/compile: remove uses of dummy
Per https://developers.google.com/style/inclusive-documentation,
since we are editing some of this code anyway and it is easier
to put the cleanup in a separate CL.

Change-Id: Ib6b851f43f9cc0a57676564477d4ff22abb1cee5
Reviewed-on: https://go-review.googlesource.com/c/go/+/273106
Trust: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-25 04:35:29 +00:00
eric fang
b2a8317b31 cmd/compile: use desired info when allocating registers for live values
When allocting registers for live values, use desired register if available,
this is helpful for some cases, such as (*entry).delete, which can save a
few of copies.
Besides, this patch allows more debugging information to be printed out.

Test results of compilecmp on Linux/amd64:
name                      old time/op                 new time/op                 delta
Template                    326729362.060000ns +- 3%    329227238.775510ns +- 4%  +0.76%  (p=0.038 n=50+49)
Unicode                     157671860.391304ns +- 6%    156917927.320000ns +- 6%    ~     (p=0.291 n=46+50)
GoTypes                    1065591138.304348ns +- 2%   1063695977.434783ns +- 1%    ~     (p=0.208 n=46+46)
Compiler                   5053424790.760001ns +- 2%   5052729636.551020ns +- 3%    ~     (p=0.908 n=50+49)
SSA                       12392067635.866669ns +- 2%  12319786960.460005ns +- 2%  -0.58%  (p=0.008 n=45+50)
Flate                       212609767.340000ns +- 5%    213011228.085106ns +- 5%    ~     (p=0.685 n=50+47)
GoParser                    266870495.100000ns +- 4%    266962314.280000ns +- 3%    ~     (p=0.975 n=50+50)
Reflect                     660164306.551021ns +- 2%    658284470.729167ns +- 2%    ~     (p=0.069 n=49+48)
Tar                         292805895.720000ns +- 4%    292103626.954545ns +- 2%    ~     (p=0.321 n=50+44)
XML                         386294811.700000ns +- 4%    386665088.820000ns +- 4%    ~     (p=0.786 n=50+50)
LinkCompiler                548495788.659575ns +- 5%    549359489.102041ns +- 4%    ~     (p=0.855 n=47+49)
ExternalLinkCompiler       1810414270.280000ns +- 2%   1806872224.673470ns +- 2%    ~     (p=0.313 n=50+49)
LinkWithoutDebugCompiler    340888843.795918ns +- 5%    340341541.100000ns +- 6%    ~     (p=0.735 n=49+50)
[Geo mean]                   664550174.613777ns          664090221.153575ns       -0.07%

name                      old user-time/op            new user-time/op            delta
Template                    565202800.000000ns +-16%    595351040.000000ns +-16%  +5.33%  (p=0.001 n=50+50)
Unicode                     378444740.000000ns +-14%    373825183.673469ns +-17%    ~     (p=0.458 n=50+49)
GoTypes                    2052073341.463415ns +-12%   2059679864.864865ns +- 7%    ~     (p=0.381 n=41+37)
Compiler                   9913371980.000000ns +-20%   9848836720.000002ns +-19%    ~     (p=0.781 n=50+50)
SSA                       25013846224.489799ns +-17%  24571896183.673466ns +-17%    ~     (p=0.132 n=49+49)
Flate                       314422702.127660ns +-17%    314831666.666667ns +-11%    ~     (p=0.427 n=47+45)
GoParser                    419496060.000000ns +- 9%    417403460.000000ns +-11%    ~     (p=0.512 n=50+50)
Reflect                    1233632469.387755ns +-17%   1193061073.170732ns +-13%  -3.29%  (p=0.030 n=49+41)
Tar                         509855937.500000ns +-10%    508700740.000000ns +-14%    ~     (p=0.890 n=48+50)
XML                         703511425.531915ns +-12%    694007591.836735ns +-11%    ~     (p=0.164 n=47+49)
LinkCompiler                993137687.500000ns +- 6%    991914714.285714ns +- 8%    ~     (p=0.860 n=48+49)
ExternalLinkCompiler       2193851840.000001ns +- 3%   2186672183.673470ns +- 5%    ~     (p=0.320 n=50+49)
LinkWithoutDebugCompiler    420800875.000000ns +-10%    422062640.000000ns +- 9%    ~     (p=0.840 n=48+50)
[Geo mean]                  1145156131.480097ns         1142033233.550961ns       -0.27%

name                      old alloc/op                new alloc/op                delta
Template                                36.3MB +- 0%                36.3MB +- 0%    ~     (p=0.886 n=50+49)
Unicode                                 30.1MB +- 0%                30.1MB +- 0%    ~     (p=0.792 n=50+50)
GoTypes                                  118MB +- 0%                 118MB +- 0%    ~     (p=1.000 n=47+48)
Compiler                                 562MB +- 0%                 562MB +- 0%    ~     (p=0.205 n=50+49)
SSA                                     1.42GB +- 0%                1.42GB +- 0%  -0.12%  (p=0.000 n=50+50)
Flate                                   22.8MB +- 0%                22.8MB +- 0%    ~     (p=0.384 n=50+47)
GoParser                                28.0MB +- 0%                28.0MB +- 0%  -0.02%  (p=0.013 n=50+50)
Reflect                                 78.0MB +- 0%                78.0MB +- 0%    ~     (p=0.384 n=46+48)
Tar                                     34.1MB +- 0%                34.1MB +- 0%    ~     (p=0.072 n=50+50)
XML                                     43.1MB +- 0%                43.1MB +- 0%  -0.04%  (p=0.000 n=49+50)
LinkCompiler                            98.5MB +- 0%                98.5MB +- 0%  +0.01%  (p=0.012 n=50+43)
ExternalLinkCompiler                    89.6MB +- 0%                89.6MB +- 0%    ~     (p=0.762 n=50+50)
LinkWithoutDebugCompiler                56.9MB +- 0%                56.9MB +- 0%    ~     (p=0.268 n=49+48)
[Geo mean]                               77.7MB                      77.7MB       -0.01%

name                      old allocs/op               new allocs/op               delta
Template                                  367k +- 0%                  367k +- 0%  -0.01%  (p=0.002 n=50+49)
Unicode                                   345k +- 0%                  345k +- 0%    ~     (p=0.981 n=50+50)
GoTypes                                  1.28M +- 0%                 1.28M +- 0%  -0.00%  (p=0.002 n=49+50)
Compiler                                 5.39M +- 0%                 5.39M +- 0%  -0.00%  (p=0.000 n=50+50)
SSA                                      13.9M +- 0%                 13.9M +- 0%  +0.01%  (p=0.000 n=50+50)
Flate                                     230k +- 0%                  230k +- 0%    ~     (p=0.815 n=50+50)
GoParser                                  292k +- 0%                  292k +- 0%  -0.01%  (p=0.000 n=50+50)
Reflect                                   977k +- 0%                  977k +- 0%  -0.00%  (p=0.035 n=50+50)
Tar                                       343k +- 0%                  343k +- 0%  -0.01%  (p=0.008 n=48+50)
XML                                       418k +- 0%                  418k +- 0%  -0.01%  (p=0.000 n=50+50)
LinkCompiler                              516k +- 0%                  516k +- 0%  +0.01%  (p=0.002 n=50+48)
ExternalLinkCompiler                      570k +- 0%                  570k +- 0%    ~     (p=0.430 n=46+50)
LinkWithoutDebugCompiler                  169k +- 0%                  169k +- 0%    ~     (p=0.706 n=49+49)
[Geo mean]                                 672k                        672k       -0.00%

name                      old maxRSS/op               new maxRSS/op               delta
Template                                 34.3M +- 5%                 34.7M +- 4%  +1.24%  (p=0.004 n=50+50)
Unicode                                  36.2M +- 5%                 36.1M +- 8%    ~     (p=0.785 n=50+50)
GoTypes                                  75.7M +- 7%                 76.1M +- 6%    ~     (p=0.544 n=50+50)
Compiler                                  304M +- 7%                  304M +- 7%    ~     (p=0.744 n=50+50)
SSA                                       721M +- 6%                  723M +- 7%    ~     (p=0.724 n=49+50)
Flate                                    26.1M +- 3%                 26.1M +- 5%    ~     (p=0.649 n=48+49)
GoParser                                 29.3M +- 5%                 29.3M +- 4%    ~     (p=0.809 n=50+50)
Reflect                                  56.0M +- 6%                 56.3M +- 5%    ~     (p=0.350 n=50+50)
Tar                                      34.1M +- 3%                 33.9M +- 5%    ~     (p=0.121 n=49+50)
XML                                      39.6M +- 5%                 39.9M +- 4%    ~     (p=0.109 n=50+50)
LinkCompiler                              168M +- 1%                  168M +- 1%    ~     (p=0.578 n=49+48)
ExternalLinkCompiler                      179M +- 1%                  179M +- 2%    ~     (p=0.522 n=46+46)
LinkWithoutDebugCompiler                  137M +- 3%                  137M +- 3%    ~     (p=0.463 n=41+50)
[Geo mean]                                79.3M                       79.5M       +0.20%

name                      old text-bytes              new text-bytes              delta
HelloSize                                812kB +- 0%                 811kB +- 0%  -0.05%  (p=0.000 n=50+50)

name                      old data-bytes              new data-bytes              delta
HelloSize                               13.3kB +- 0%                13.3kB +- 0%    ~     (all equal)

name                      old bss-bytes               new bss-bytes               delta
HelloSize                                206kB +- 0%                 206kB +- 0%    ~     (all equal)

name                      old exe-bytes               new exe-bytes               delta
HelloSize                               1.21MB +- 0%                1.21MB +- 0%  +0.02%  (p=0.000 n=50+50)

file      before    after     Δ       %
addr2line 4052949   4052453   -496    -0.012%
api       4948171   4947163   -1008   -0.020%
asm       4888889   4888049   -840    -0.017%
buildid   2617545   2617673   +128    +0.005%
cgo       4521681   4516801   -4880   -0.108%
compile   19139091  19137683  -1408   -0.007%
cover     4843191   4840359   -2832   -0.058%
dist      3473677   3474717   +1040   +0.030%
doc       3821592   3821552   -40     -0.001%
fix       3220587   3220059   -528    -0.016%
link      6587368   6582696   -4672   -0.071%
nm        3999858   3999186   -672    -0.017%
objdump   4409161   4408217   -944    -0.021%
pack      2394038   2393846   -192    -0.008%
pprof     13601271  13602487  +1216   +0.009%
test2json 2645148   2644604   -544    -0.021%
trace     10357878  10356862  -1016   -0.010%
vet       6779482   6778706   -776    -0.011%
total     106301577 106283113 -18464  -0.017%

Change-Id: I63ac6e224e1a4756ddc1bfc4aabbaeb92d7d4273
Reviewed-on: https://go-review.googlesource.com/c/go/+/263599
Run-TryBot: eric fang <eric.fang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: eric fang <eric.fang@arm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-21 19:08:53 +00:00
erifan01
1c0d1f4e97 cmd/compile: optimize regalloc for phi value
When allocating registers for phi value, only the primary predecessor is considered.
Taking into account the allocation status of other predecessors can help reduce
unnecessary copy or spill operations. Many such cases can be found in the standard
library, such as runtime.wirep, moveByType, etc. The test results from benchstat
also show that this change helps reduce the file size.

name                      old time/op       new time/op       delta
Template                        328ms ± 5%        326ms ± 4%    ~     (p=0.254 n=50+47)
Unicode                         156ms ± 7%        158ms ±10%    ~     (p=0.412 n=49+49)
GoTypes                         1.07s ± 3%        1.07s ± 2%    ~     (p=0.664 n=48+49)
Compiler                        4.43s ± 3%        4.44s ± 3%    ~     (p=0.758 n=48+50)
SSA                             10.3s ± 2%        10.4s ± 2%  +0.43%  (p=0.017 n=50+46)
Flate                           208ms ± 9%        209ms ± 7%    ~     (p=0.920 n=49+46)
GoParser                        260ms ± 5%        262ms ± 4%    ~     (p=0.063 n=50+48)
Reflect                         687ms ± 3%        685ms ± 2%    ~     (p=0.459 n=50+48)
Tar                             293ms ± 4%        293ms ± 5%    ~     (p=0.695 n=49+48)
XML                             391ms ± 4%        389ms ± 3%    ~     (p=0.109 n=49+46)
LinkCompiler                    570ms ± 5%        563ms ± 5%  -1.10%  (p=0.006 n=46+47)
ExternalLinkCompiler            1.57s ± 3%        1.56s ± 3%    ~     (p=0.118 n=47+46)
LinkWithoutDebugCompiler        349ms ± 6%        349ms ± 5%    ~     (p=0.726 n=49+47)
[Geo mean]                      645ms             645ms       -0.05%

name                      old user-time/op  new user-time/op  delta
Template                        507ms ±14%        513ms ±14%    ~     (p=0.398 n=48+49)
Unicode                         345ms ±29%        345ms ±38%    ~     (p=0.521 n=47+49)
GoTypes                         1.95s ±16%        1.94s ±19%    ~     (p=0.324 n=50+50)
Compiler                        8.26s ±16%        8.22s ±14%    ~     (p=0.834 n=50+50)
SSA                             19.6s ± 8%        19.2s ±15%    ~     (p=0.056 n=50+50)
Flate                           293ms ± 9%        299ms ±12%    ~     (p=0.057 n=47+50)
GoParser                        388ms ± 9%        387ms ±14%    ~     (p=0.660 n=46+50)
Reflect                         1.15s ±28%        1.12s ±18%    ~     (p=0.648 n=49+48)
Tar                             456ms ±10%        476ms ±15%  +4.48%  (p=0.001 n=46+48)
XML                             648ms ±27%        634ms ±16%    ~     (p=0.685 n=50+46)
LinkCompiler                    1.00s ± 8%        1.00s ± 8%    ~     (p=0.638 n=50+50)
ExternalLinkCompiler            1.96s ± 5%        1.96s ± 5%    ~     (p=0.792 n=50+50)
LinkWithoutDebugCompiler        443ms ±10%        442ms ±11%    ~     (p=0.813 n=50+50)
[Geo mean]                      1.05s             1.05s       -0.09%

name                      old alloc/op      new alloc/op      delta
Template                       36.0MB ± 0%       36.0MB ± 0%    ~     (p=0.599 n=49+50)
Unicode                        29.8MB ± 0%       29.8MB ± 0%    ~     (p=0.739 n=50+50)
GoTypes                         118MB ± 0%        118MB ± 0%    ~     (p=0.436 n=50+50)
Compiler                        562MB ± 0%        562MB ± 0%    ~     (p=0.693 n=50+50)
SSA                            1.42GB ± 0%       1.42GB ± 0%  -0.10%  (p=0.000 n=50+49)
Flate                          22.5MB ± 0%       22.5MB ± 0%    ~     (p=0.429 n=48+49)
GoParser                       27.7MB ± 0%       27.7MB ± 0%    ~     (p=0.705 n=49+48)
Reflect                        77.7MB ± 0%       77.7MB ± 0%  -0.01%  (p=0.043 n=50+50)
Tar                            33.8MB ± 0%       33.8MB ± 0%    ~     (p=0.241 n=49+50)
XML                            42.8MB ± 0%       42.8MB ± 0%    ~     (p=0.677 n=47+49)
LinkCompiler                   98.3MB ± 0%       98.3MB ± 0%    ~     (p=0.157 n=50+50)
ExternalLinkCompiler           89.4MB ± 0%       89.4MB ± 0%    ~     (p=0.683 n=50+50)
LinkWithoutDebugCompiler       56.7MB ± 0%       56.7MB ± 0%    ~     (p=0.155 n=49+49)
[Geo mean]                     77.3MB            77.3MB       -0.01%

name                      old allocs/op     new allocs/op     delta
Template                         367k ± 0%         367k ± 0%    ~     (p=0.863 n=50+50)
Unicode                          345k ± 0%         345k ± 0%    ~     (p=0.744 n=49+49)
GoTypes                         1.28M ± 0%        1.28M ± 0%    ~     (p=0.957 n=48+50)
Compiler                        5.39M ± 0%        5.39M ± 0%  +0.00%  (p=0.012 n=50+49)
SSA                             13.9M ± 0%        13.9M ± 0%  +0.02%  (p=0.000 n=47+49)
Flate                            230k ± 0%         230k ± 0%  -0.01%  (p=0.007 n=47+49)
GoParser                         292k ± 0%         292k ± 0%    ~     (p=0.891 n=50+49)
Reflect                          977k ± 0%         977k ± 0%    ~     (p=0.274 n=50+50)
Tar                              343k ± 0%         343k ± 0%    ~     (p=0.942 n=50+50)
XML                              418k ± 0%         418k ± 0%    ~     (p=0.374 n=50+49)
LinkCompiler                     516k ± 0%         516k ± 0%    ~     (p=0.205 n=49+47)
ExternalLinkCompiler             570k ± 0%         570k ± 0%    ~     (p=0.783 n=49+47)
LinkWithoutDebugCompiler         169k ± 0%         169k ± 0%    ~     (p=0.233 n=50+46)
[Geo mean]                       672k              672k       +0.00%

name                      old maxRSS/op     new maxRSS/op     delta
Template                        34.5M ± 3%        34.4M ± 3%    ~     (p=0.566 n=49+48)
Unicode                         36.0M ± 6%        35.9M ± 6%    ~     (p=0.736 n=50+50)
GoTypes                         75.7M ± 7%        75.4M ± 5%    ~     (p=0.412 n=50+50)
Compiler                         314M ±10%         313M ± 8%    ~     (p=0.708 n=50+50)
SSA                              730M ± 6%         735M ± 6%    ~     (p=0.324 n=50+50)
Flate                           25.8M ± 5%        25.6M ± 6%    ~     (p=0.415 n=49+50)
GoParser                        28.5M ± 3%        28.5M ± 4%    ~     (p=0.977 n=46+50)
Reflect                         57.4M ± 4%        57.2M ± 3%    ~     (p=0.173 n=50+50)
Tar                             33.3M ± 3%        33.2M ± 4%    ~     (p=0.621 n=48+50)
XML                             39.6M ± 5%        39.6M ± 4%    ~     (p=0.997 n=50+50)
LinkCompiler                     168M ± 2%         167M ± 1%    ~     (p=0.072 n=49+45)
ExternalLinkCompiler             179M ± 1%         179M ± 1%    ~     (p=0.147 n=48+50)
LinkWithoutDebugCompiler         136M ± 1%         136M ± 1%    ~     (p=0.789 n=47+49)
[Geo mean]                      79.2M             79.1M       -0.12%

name                      old text-bytes    new text-bytes    delta
HelloSize                       812kB ± 0%        811kB ± 0%  -0.06%  (p=0.000 n=50+50)

name                      old data-bytes    new data-bytes    delta
HelloSize                      13.3kB ± 0%       13.3kB ± 0%    ~     (all equal)

name                      old bss-bytes     new bss-bytes     delta
HelloSize                       206kB ± 0%        206kB ± 0%    ~     (all equal)

name                      old exe-bytes     new exe-bytes     delta
HelloSize                      1.21MB ± 0%       1.21MB ± 0%  -0.03%  (p=0.000 n=50+50)

file      before    after     Δ       %
addr2line 4057421   4056237   -1184   -0.029%
api       4952451   4946715   -5736   -0.116%
asm       4888993   4888185   -808    -0.017%
buildid   2617705   2616441   -1264   -0.048%
cgo       4521849   4520681   -1168   -0.026%
compile   19143451  19141243  -2208   -0.012%
cover     4847391   4837151   -10240  -0.211%
dist      3473877   3472565   -1312   -0.038%
doc       3821496   3820432   -1064   -0.028%
fix       3220587   3220659   +72     +0.002%
link      6587504   6582576   -4928   -0.075%
nm        4000154   3998690   -1464   -0.037%
objdump   4409449   4407625   -1824   -0.041%
pack      2398086   2393110   -4976   -0.207%
pprof     13599060  13606111  +7051   +0.052%
test2json 2645148   2645692   +544    +0.021%
trace     10355281  10355862  +581    +0.006%
vet       6780026   6779666   -360    -0.005%
total     106319929 106289641 -30288  -0.028%

Change-Id: Ia5399286958c187c8664c769bbddf7bc4c1cae99
Reviewed-on: https://go-review.googlesource.com/c/go/+/263600
Run-TryBot: eric fang <eric.fang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: eric fang <eric.fang@arm.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-10-21 19:08:11 +00:00
Keith Randall
fe2cfb74ba all: drop 387 support
My last 387 CL. So sad ... ... ... ... not!

Fixes #40255

Change-Id: I8d4ddb744b234b8adc735db2f7c3c7b6d8bbdfa4
Reviewed-on: https://go-review.googlesource.com/c/go/+/258957
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-02 00:00:51 +00:00
Keith Randall
5c2c6d3fbf runtime: framepointers are no longer an experiment - hard code them
I think they are no longer experimental status. Might as well promote
them to permanent.

Change-Id: Id1259601b3dd2061dd60df86ee48080bfb575d2f
Reviewed-on: https://go-review.googlesource.com/c/go/+/249857
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2020-08-27 21:15:47 +00:00
Michael Munday
48403b268b cmd/compile: error if register is reused when setting edge state
When setting the edge state in register allocation we should only
be setting each register once. It is not possible for a register
to hold multiple values at once.

This CL converts the runtime error seen in #38195 into an internal
compiler error (ICE). It is better for the compiler to fail than
generate an incorrect program.

The bug reported in #38195 is now exposed as:

./parserc.go:459:11: internal compiler error: 'yaml_parser_parse_node': R5 is already set (v1074/v1241)

[stack trace]

Updates #38195.

Change-Id: Id95842fd850b95494cbd472b6fd5a55513ecacec
Reviewed-on: https://go-review.googlesource.com/c/go/+/228060
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-14 19:04:38 +00:00
Michael Munday
382fe3e249 cmd/compile: fix deallocation of live value copies in regalloc
When deallocating the input register to a phi so that the phi
itself could be allocated to that register the code was also
deallocating all copies of that phi input value. Those copies
of the value could still be live and if they were the register
allocator could reuse them incorrectly to hold speculative
copies of other phi inputs. This causes strange bugs.

No test because this is a very obscure scenario that is hard
to replicate but CL 228060 adds an assertion to the compiler
that does trigger when running the std tests on linux/s390x
without this CL applied. Hopefully that assertion will prevent
future regressions.

Fixes #38195.

Change-Id: Id975dadedd731c7bb21933b9ea6b17daaa5c9e1d
Reviewed-on: https://go-review.googlesource.com/c/go/+/228061
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-14 19:04:32 +00:00
Dan Scales
1b3a1db19f cmd/compile: fix liveness for open-coded defer args for infinite loops
Once defined, a stack slot holding an open-coded defer arg should always be marked
live, since it may be used at any time if there is a panic. These stack slots are
typically kept live naturally by the open-defer code inlined at each return/exit point.
However, we need to do extra work to make sure that they are kept live if a
function has an infinite loop or a panic exit.

For this fix, only in the case of a function that is using open-coded defers, we
compute the set of blocks (most often empty) that cannot reach a return or a
BlockExit (panic) because of an infinite loop. Then, for each block b which
cannot reach a return or BlockExit or is a BlockExit block, we mark each defer arg
slot as live, as long as the definition of the defer arg slot dominates block b.

For this change, had to export (*Func).sdom (-> Sdom) and SparseTree.isAncestorEq
(-> IsAncestorEq)

Updates #35277

Change-Id: I7b53c9bd38ba384a3794386dd0eb94e4cbde4eb1
Reviewed-on: https://go-review.googlesource.com/c/go/+/204802
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-11-05 17:19:16 +00:00
Brad Fitzpatrick
07b4abd62e all: remove the nacl port (part 2, amd64p32 + toolchain)
This is part two if the nacl removal. Part 1 was CL 199499.

This CL removes amd64p32 support, which might be useful in the future
if we implement the x32 ABI. It also removes the nacl bits in the
toolchain, and some remaining nacl bits.

Updates #30439

Change-Id: I2475d5bb066d1b474e00e40d95b520e7c2e286e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/200077
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-10-09 22:34:34 +00:00
Keith Randall
72dc9ab191 cmd/compile: reuse dead register before reusing register holding constant
For commuting ops, check whether the second argument is dead before
checking if the first argument is rematerializeable. Reusing the register
holding a dead value is always best.

Fixes #33580

Change-Id: I7372cfc03d514e6774d2d9cc727a3e6bf6ce2657
Reviewed-on: https://go-review.googlesource.com/c/go/+/199559
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-10-07 15:16:26 +00:00
Michael Munday
9c2e7e8bed cmd/compile: allow multiple SSA block control values
Control values are used to choose which successor of a block is
jumped to. Typically a control value takes the form of a 'flags'
value that represents the result of a comparison. Some
architectures however use a variable in a register as a control
value.

Up until now we have managed with a single control value per block.
However some architectures (e.g. s390x and riscv64) have combined
compare-and-branch instructions that take two variables in registers
as parameters. To generate these instructions we need to support 2
control values per block.

This CL allows up to 2 control values to be used in a block in
order to support the addition of compare-and-branch instructions.
I have implemented s390x compare-and-branch instructions in a
different CL.

Passes toolstash-check -all.

Results of compilebench:

name                      old time/op       new time/op       delta
Template                        208ms ± 1%        209ms ± 1%    ~     (p=0.289 n=20+20)
Unicode                        83.7ms ± 1%       83.3ms ± 3%  -0.49%  (p=0.017 n=18+18)
GoTypes                         748ms ± 1%        748ms ± 0%    ~     (p=0.460 n=20+18)
Compiler                        3.47s ± 1%        3.48s ± 1%    ~     (p=0.070 n=19+18)
SSA                             11.5s ± 1%        11.7s ± 1%  +1.64%  (p=0.000 n=19+18)
Flate                           130ms ± 1%        130ms ± 1%    ~     (p=0.588 n=19+20)
GoParser                        160ms ± 1%        161ms ± 1%    ~     (p=0.211 n=20+20)
Reflect                         465ms ± 1%        467ms ± 1%  +0.42%  (p=0.007 n=20+20)
Tar                             184ms ± 1%        185ms ± 2%    ~     (p=0.087 n=18+20)
XML                             253ms ± 1%        253ms ± 1%    ~     (p=0.377 n=20+18)
LinkCompiler                    769ms ± 2%        774ms ± 2%    ~     (p=0.070 n=19+19)
ExternalLinkCompiler            3.59s ±11%        3.68s ± 6%    ~     (p=0.072 n=20+20)
LinkWithoutDebugCompiler        446ms ± 5%        454ms ± 3%  +1.79%  (p=0.002 n=19+20)
StdCmd                          26.0s ± 2%        26.0s ± 2%    ~     (p=0.799 n=20+20)

name                      old user-time/op  new user-time/op  delta
Template                        238ms ± 5%        240ms ± 5%    ~     (p=0.142 n=20+20)
Unicode                         105ms ±11%        106ms ±10%    ~     (p=0.512 n=20+20)
GoTypes                         876ms ± 2%        873ms ± 4%    ~     (p=0.647 n=20+19)
Compiler                        4.17s ± 2%        4.19s ± 1%    ~     (p=0.093 n=20+18)
SSA                             13.9s ± 1%        14.1s ± 1%  +1.45%  (p=0.000 n=18+18)
Flate                           145ms ±13%        146ms ± 5%    ~     (p=0.851 n=20+18)
GoParser                        185ms ± 5%        188ms ± 7%    ~     (p=0.174 n=20+20)
Reflect                         534ms ± 3%        538ms ± 2%    ~     (p=0.105 n=20+18)
Tar                             215ms ± 4%        211ms ± 9%    ~     (p=0.079 n=19+20)
XML                             295ms ± 6%        295ms ± 5%    ~     (p=0.968 n=20+20)
LinkCompiler                    832ms ± 4%        837ms ± 7%    ~     (p=0.707 n=17+20)
ExternalLinkCompiler            1.58s ± 8%        1.60s ± 4%    ~     (p=0.296 n=20+19)
LinkWithoutDebugCompiler        478ms ±12%        489ms ±10%    ~     (p=0.429 n=20+20)

name                      old object-bytes  new object-bytes  delta
Template                        559kB ± 0%        559kB ± 0%    ~     (all equal)
Unicode                         216kB ± 0%        216kB ± 0%    ~     (all equal)
GoTypes                        2.03MB ± 0%       2.03MB ± 0%    ~     (all equal)
Compiler                       8.07MB ± 0%       8.07MB ± 0%  -0.06%  (p=0.000 n=20+20)
SSA                            27.1MB ± 0%       27.3MB ± 0%  +0.89%  (p=0.000 n=20+20)
Flate                           343kB ± 0%        343kB ± 0%    ~     (all equal)
GoParser                        441kB ± 0%        441kB ± 0%    ~     (all equal)
Reflect                        1.36MB ± 0%       1.36MB ± 0%    ~     (all equal)
Tar                             487kB ± 0%        487kB ± 0%    ~     (all equal)
XML                             632kB ± 0%        632kB ± 0%    ~     (all equal)

name                      old export-bytes  new export-bytes  delta
Template                       18.5kB ± 0%       18.5kB ± 0%    ~     (all equal)
Unicode                        7.92kB ± 0%       7.92kB ± 0%    ~     (all equal)
GoTypes                        35.0kB ± 0%       35.0kB ± 0%    ~     (all equal)
Compiler                        109kB ± 0%        110kB ± 0%  +0.72%  (p=0.000 n=20+20)
SSA                             137kB ± 0%        138kB ± 0%  +0.58%  (p=0.000 n=20+20)
Flate                          4.89kB ± 0%       4.89kB ± 0%    ~     (all equal)
GoParser                       8.49kB ± 0%       8.49kB ± 0%    ~     (all equal)
Reflect                        11.4kB ± 0%       11.4kB ± 0%    ~     (all equal)
Tar                            10.5kB ± 0%       10.5kB ± 0%    ~     (all equal)
XML                            16.7kB ± 0%       16.7kB ± 0%    ~     (all equal)

name                      old text-bytes    new text-bytes    delta
HelloSize                       761kB ± 0%        761kB ± 0%    ~     (all equal)
CmdGoSize                      10.8MB ± 0%       10.8MB ± 0%    ~     (all equal)

name                      old data-bytes    new data-bytes    delta
HelloSize                      10.7kB ± 0%       10.7kB ± 0%    ~     (all equal)
CmdGoSize                       312kB ± 0%        312kB ± 0%    ~     (all equal)

name                      old bss-bytes     new bss-bytes     delta
HelloSize                       122kB ± 0%        122kB ± 0%    ~     (all equal)
CmdGoSize                       146kB ± 0%        146kB ± 0%    ~     (all equal)

name                      old exe-bytes     new exe-bytes     delta
HelloSize                      1.13MB ± 0%       1.13MB ± 0%    ~     (all equal)
CmdGoSize                      15.1MB ± 0%       15.1MB ± 0%    ~     (all equal)

Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f
Reviewed-on: https://go-review.googlesource.com/c/go/+/196557
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-02 09:56:36 +00:00
Richard Musiol
1c50fcf853 cmd/compile: add 32 bit float registers/variables on wasm
Before this change, wasm only used float variables with a size of 64 bit
and applied rounding to 32 bit precision where necessary. This change
adds proper 32 bit float variables.

Reduces the size of pkg/js_wasm by 254 bytes.

Change-Id: Ieabe846a8cb283d66def3cdf11e2523b3b31f345
Reviewed-on: https://go-review.googlesource.com/c/go/+/195117
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-09-19 20:26:22 +00:00
Ainar Garipov
0efbd10157 all: fix typos
Use the following (suboptimal) script to obtain a list of possible
typos:

  #!/usr/bin/env sh

  set -x

  git ls-files |\
    grep -e '\.\(c\|cc\|go\)$' |\
    xargs -n 1\
    awk\
    '/\/\// { gsub(/.*\/\//, ""); print; } /\/\*/, /\*\// { gsub(/.*\/\*/, ""); gsub(/\*\/.*/, ""); }' |\
    hunspell -d en_US -l |\
    grep '^[[:upper:]]\{0,1\}[[:lower:]]\{1,\}$' |\
    grep -v -e '^.\{1,4\}$' -e '^.\{16,\}$' |\
    sort -f |\
    uniq -c |\
    awk '$1 == 1 { print $2; }'

Then, go through the results manually and fix the most obvious typos in
the non-vendored code.

Change-Id: I3cb5830a176850e1a0584b8a40b47bde7b260eae
Reviewed-on: https://go-review.googlesource.com/c/go/+/193848
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-09-08 17:28:20 +00:00
Keith Randall
8a317ebc0f cmd/compile: don't eliminate all registers when restricting to desired ones
We shouldn't mask to desired registers if we haven't masked out all the
forbidden registers yet.  In this path we haven't masked out the nospill
registers yet. If the resulting mask contains only nospill registers, then
allocReg fails.

This can only happen on resultNotInArgs-marked instructions, which exist
only on the ARM64, MIPS, MIPS64, and PPC64 ports.

Maybe there's a better way to handle resultNotInArgs instructions.
But for 1.13, this is a low-risk fix.

Fixes #33355

Change-Id: I1082f78f798d1371bde65c58cc265540480e4fa4
Reviewed-on: https://go-review.googlesource.com/c/go/+/188178
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-08-01 02:15:18 +00:00
Josh Bleecher Snyder
4ae31dc8c5 cmd/compile: re-use regalloc's []valState
Updates #27739: reduces package ssa's allocated space by 3.77%.

maxrss is harder to measure, but using best-of-three-runs
as reported by /usr/bin/time -l, I see ~2% reduction in maxrss.

We still have a long way to go, though; the new maxrss is still 1.1gb.

name        old alloc/op      new alloc/op      delta
Template         38.8MB ± 0%       37.7MB ± 0%  -2.77%  (p=0.008 n=5+5)
Unicode          28.2MB ± 0%       28.1MB ± 0%  -0.20%  (p=0.008 n=5+5)
GoTypes           131MB ± 0%        127MB ± 0%  -2.94%  (p=0.008 n=5+5)
Compiler          606MB ± 0%        587MB ± 0%  -3.21%  (p=0.008 n=5+5)
SSA              2.14GB ± 0%       2.06GB ± 0%  -3.77%  (p=0.008 n=5+5)
Flate            24.0MB ± 0%       23.3MB ± 0%  -3.00%  (p=0.008 n=5+5)
GoParser         28.8MB ± 0%       28.1MB ± 0%  -2.61%  (p=0.008 n=5+5)
Reflect          83.8MB ± 0%       81.5MB ± 0%  -2.71%  (p=0.008 n=5+5)
Tar              36.4MB ± 0%       35.4MB ± 0%  -2.73%  (p=0.008 n=5+5)
XML              47.9MB ± 0%       46.7MB ± 0%  -2.49%  (p=0.008 n=5+5)
[Geo mean]       84.6MB            82.4MB       -2.65%

name        old allocs/op     new allocs/op     delta
Template           379k ± 0%         379k ± 0%  -0.05%  (p=0.008 n=5+5)
Unicode            340k ± 0%         340k ± 0%    ~     (p=0.151 n=5+5)
GoTypes           1.36M ± 0%        1.36M ± 0%  -0.06%  (p=0.008 n=5+5)
Compiler          5.49M ± 0%        5.48M ± 0%  -0.03%  (p=0.008 n=5+5)
SSA               17.5M ± 0%        17.5M ± 0%  -0.03%  (p=0.008 n=5+5)
Flate              235k ± 0%         235k ± 0%  -0.04%  (p=0.008 n=5+5)
GoParser           302k ± 0%         302k ± 0%  -0.04%  (p=0.008 n=5+5)
Reflect            976k ± 0%         975k ± 0%  -0.10%  (p=0.008 n=5+5)
Tar                352k ± 0%         352k ± 0%  -0.06%  (p=0.008 n=5+5)
XML                436k ± 0%         436k ± 0%  -0.03%  (p=0.008 n=5+5)
[Geo mean]         842k              841k       -0.04%

Change-Id: I0ab6631b5a0bb6303c291dcb0367b586a4e584fb
Reviewed-on: https://go-review.googlesource.com/c/go/+/176221
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-05-10 00:14:40 +00:00
Cherry Zhang
40df9cc606 cmd/compile: make KeepAlive work on stack object
Currently, runtime.KeepAlive applied on a stack object doesn't
actually keeps the stack object alive, and the heap object
referenced from it could be collected. This is because the
address of the stack object is rematerializeable, and we just
ignored KeepAlive on rematerializeable values. This CL fixes it.

Fixes #30476.

Change-Id: Ic1f75ee54ed94ea79bd46a8ddcd9e81d01556d1d
Reviewed-on: https://go-review.googlesource.com/c/164537
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-03-01 15:36:52 +00:00