Commit graph

225 commits

Author SHA1 Message Date
Keith Randall
4e182db5fc Revert "cmd/compile: use generated loops instead of DUFFCOPY on amd64"
This reverts commit ec9e1176c3 (CL 678620).

Reason for revert: causing regalloc to get into an infinite loop

Change-Id: Ie53c58c6126804af6d6883ea4acdcfb632a172bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/695196
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
2025-08-12 16:04:18 -07:00
Keith Randall
ec9e1176c3 cmd/compile: use generated loops instead of DUFFCOPY on amd64
goarch: amd64
cpu: 12th Gen Intel(R) Core(TM) i7-12700
                        │     base      │                 exp                 │
                        │    sec/op     │   sec/op     vs base                │
MemmoveKnownSize112-20     1.764n ±  0%   1.247n ± 0%  -29.31% (p=0.000 n=10)
MemmoveKnownSize128-20     1.891n ±  0%   1.405n ± 1%  -25.72% (p=0.000 n=10)
MemmoveKnownSize192-20     2.521n ±  0%   2.114n ± 3%  -16.16% (p=0.000 n=10)
MemmoveKnownSize248-20     4.028n ±  0%   3.877n ± 1%   -3.75% (p=0.000 n=10)
MemmoveKnownSize256-20     3.272n ±  0%   2.961n ± 2%   -9.53% (p=0.000 n=10)
MemmoveKnownSize512-20     6.733n ±  3%   5.936n ± 4%  -11.83% (p=0.000 n=10)
MemmoveKnownSize1024-20   13.905n ±  5%   9.798n ± 9%  -29.54% (p=0.000 n=10)

Change-Id: Icc01cec0d8b072300d749a5ce76f53b3725b5c65
Reviewed-on: https://go-review.googlesource.com/c/go/+/678620
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Jakub Ciolek <jakub@ciolek.dev>
2025-08-12 09:15:08 -07:00
qiulaidongfeng
4ee0df8c46 cmd: remove dead code
Fixes #74076

Change-Id: Icc67b3d4e342f329584433bd1250c56ae8f5a73d
Reviewed-on: https://go-review.googlesource.com/c/go/+/690635
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Commit-Queue: Alan Donovan <adonovan@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Alan Donovan <adonovan@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-08-05 10:31:25 -07:00
Keith Randall
eef5f8d930 cmd/compile: enforce that locals are always accessed with SP base register
After CL 678937, we could have a situation where the value of the
stack pointer is in both SP and another register. We need to make sure
that regalloc picks SP when issuing a reference to local variables;
the assembler expects that.

Fixes #74836

Change-Id: I2ac73ece6eb44b4a78c1369f8a69e51ab9748754
Reviewed-on: https://go-review.googlesource.com/c/go/+/692395
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Mark Freeman <mark@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-08-01 08:57:29 -07:00
khr@golang.org
89a0af86b8 cmd/compile: allow ops to specify clobbering input registers
Same as clobbering fixed registers, but which register is clobbered
depends on which register was assigned to the input.

Add code similar to resultInArg0 processing that makes a register
copy before allowing the op to clobber the last available copy of a value.

(Will be used by subsequent CLs in this stack.)

Change-Id: I6bad88b2cb9ac3303d960ff0fb1611727292cfc4
Reviewed-on: https://go-review.googlesource.com/c/go/+/680335
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Mark Freeman <mark@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-07-30 07:37:25 -07:00
Keith Randall
5045fdd8ff cmd/compile: fix containsUnavoidableCall computation
The previous algorithm was incorrect, as it reused the dominatedByCall
slice without resetting it. It also used the depth fields even though
they were not yet calculated.

Also, clean up a lot of the loop detector code that we never use.

Always compute depths. It is cheap.

Update #71868

Not really sure how to test this. As it is just an advisory bit,
nothing goes really wrong when the result is incorrect.

Change-Id: Ic0ae87a4d3576554831252d88b05b058ca68af41
Reviewed-on: https://go-review.googlesource.com/c/go/+/680775
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
2025-07-25 13:52:00 -07:00
khr@golang.org
524946d247 cmd/compile: don't preload registers if destination already scheduled
In regalloc, we allocate some values to registers before loop entry,
so that they don't need to be loaded (from spill locations) during
the loop.

But it is pointless if we've already regalloc'd the loop body.
Whatever restores we needed for the body are already generated.

It's not clear if this code is ever useful. No tests fail if I just
remove it. But at least this change is worthwhile. It doesn't help,
and it actively inserts more restores than we really need (mostly
because the desired register list is approximate - I have seen cases
where the loads implicated here end up being dead because the restores
hit the wrong registers and the edge shuffle pass knows it needs
the restores in different registers).

While we are here, might as well have layoutRegallocOrder return
the standard layout order instead of recomputing it.

Change-Id: Ia624d5121de59b6123492603695de50b272b277f
Reviewed-on: https://go-review.googlesource.com/c/go/+/672735
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
2025-05-19 17:13:21 -07:00
apocelipes
8a8efafa88 cmd/compile: use the builtin clear
To simplify the code a bit.

Change-Id: Ia72f576de59ff161ec389a4992bb635f89783540
GitHub-Last-Rev: eaec8216be
GitHub-Pull-Request: golang/go#73411
Reviewed-on: https://go-review.googlesource.com/c/go/+/666117
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-04-18 04:21:12 -07:00
khr@golang.org
e373771490 cmd/compile: use zero register instead of specialized *zero instructions
This lets us get rid of lots of specialized opcodes for storing zero.
Instead, use regular store opcodes that just happen to use the zero
register as one of their inputs.

Change-Id: I2902a6f9b0831cb598df45189ca6bb57221bef72
Reviewed-on: https://go-review.googlesource.com/c/go/+/633075
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-04-04 15:26:24 -07:00
Mateusz Poliwczak
e9242ee812 cmd/compile: remove references to *gc.Node in docs
ssa.Sym is only implemented by *ir.Name or *obj.LSym.

Change-Id: Ia171db618abd8b438fcc2cf402f40f3fe3ec6833
Reviewed-on: https://go-review.googlesource.com/c/go/+/660995
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-03-27 11:25:27 -07:00
khr@golang.org
cc16fb52e6 cmd/compile: ensure we don't reuse temporary register
Before this CL, we could use the same register for both a temporary
register and for moving a value in the output register out of the way.

Fixes #71857

Change-Id: Iefbfd9d4139136174570d8aadf8a0fb391791ea9
Reviewed-on: https://go-review.googlesource.com/c/go/+/651221
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-02-25 10:19:25 -08:00
Keith Randall
a0029e95e5 cmd/compile: regalloc: handle desired registers of 2-output insns
Particularly with 2-word load instructions, this becomes important.
Classic example is:

    func f(p *string) string {
        return *p
    }

We want the two loads to put the return values directly into
the two ABI return registers.

At this point in the stack, cmd/go is 1.1% smaller.

Change-Id: I51fd1710238e81d15aab2bfb816d73c8e7c207b1
Reviewed-on: https://go-review.googlesource.com/c/go/+/631137
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-13 14:08:07 -08:00
Keith Randall
0c16278124 cmd: use built-in clear for maps instead of range+delete
Now that we're bootstrapping from a toolchain that has the clear builtin.

Update #64751

Change-Id: Ia86d96c253c9f7c66131cd02048a493047569641
Reviewed-on: https://go-review.googlesource.com/c/go/+/610237
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2024-09-03 22:29:23 +00:00
Keith Randall
f90f7e90b3 cmd: use built-in min/max instead of bespoke versions
Now that we're bootstrapping from a toolchain that has min/max builtins.

Update #64751

Change-Id: I63eedf3cca00f56f62ca092949cb2dc61db03361
Reviewed-on: https://go-review.googlesource.com/c/go/+/610355
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2024-09-03 22:26:52 +00:00
Keith Randall
36b45bca66 cmd/compile: regalloc: drop values that aren't used until after a call
No point in keeping values in registers when their next use is after
a call, as we'd have to spill/restore them anyway.

cmd/go is 0.1% smaller.

Fixes #59297

Change-Id: I10ee761d0d23229f57de278f734c44d6a8dccd6c
Reviewed-on: https://go-review.googlesource.com/c/go/+/509255
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-08-26 22:29:43 +00:00
Keith Randall
c18ff29295 cmd/compile: make sync/atomic AND/OR operations intrinsic on amd64
Update #61395

Change-Id: I59a950f48efc587dfdffce00e2f4f3ab99d8df00
Reviewed-on: https://go-review.googlesource.com/c/go/+/594738
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Nicolas Hillegeer <aktau@google.com>
2024-07-23 21:29:38 +00:00
WANG Xuerui
08b9640333 cmd/compile: teach regalloc to rightly do nothing on loong64 in case of dynlinking
This is needed before actual support for buildmode=plugin is added.
Should not affect current behavior.

Change-Id: I86371d7e373fd529cb8710850d7b0fbbf1eb52ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/480877
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: WANG Xuerui <git@xen0n.name>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-11-21 17:49:56 +00:00
Matthew Dempsky
5d9e0be159 cmd/compile/internal/ssa: replace Frontend.Auto with Func.NewLocal
Change-Id: I0858568d225daba1c318842dc0c9b5e652dff612
Reviewed-on: https://go-review.googlesource.com/c/go/+/526519
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2023-09-08 19:09:14 +00:00
eric fang
e0ceba8139 cmd/compile: enhance tighten pass for memory values
[This is a roll-forward of CL 458755, which was reverted due to make.bash
being broken on GOAMD64=v3. But it turned out that the problem was caused
by wrong bswap/load rewrite rules, and it was fixed in CL 492616.]

This CL enhances the tighten pass. Previously if a value has memory arg,
then the tighten pass won't move it, actually if the memory state is
consistent among definition and use block, we can move the value. This
CL optimizes this case. This is useful for the following situation:
b1:
  x = load(...mem)
  if(...) goto b2 else b3
b2:
  use(x)
b3:
  some_op_not_use_x

For the micro-benchmark mentioned in #56620, the performance improvement
is about 15%.
There's no noticeable performance change in the go1 benchmark.

Fixes #56620

Change-Id: I36ea68bed384986cd3ae81cb9e6efe84bb213adc
Reviewed-on: https://go-review.googlesource.com/c/go/+/492895
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Eric Fang <eric.fang@arm.com>
2023-05-16 01:01:38 +00:00
Daniel Martí
aa6e168480 Revert "cmd/compile: enhance tighten pass for memory values"
This reverts CL 458755.

Reason for revert: broke make.bash on GOAMD64=v3:

/workdir/go/src/crypto/sha1/sha1.go:54:35: internal compiler error: '(*digest).MarshalBinary': func (*digest).MarshalBinary, startMem[b13] has different values, old v206, new v338

goroutine 34 [running]:
runtime/debug.Stack()
	/workdir/go/src/runtime/debug/stack.go:24 +0x9f
bootstrap/cmd/compile/internal/base.FatalfAt({0x13, 0xaa0f1}, {0xc000db4440, 0x40}, {0xc0013b0000, 0x5, 0x5})
	/workdir/go/src/cmd/compile/internal/base/print.go:234 +0x2d1
bootstrap/cmd/compile/internal/base.Fatalf(...)
	/workdir/go/src/cmd/compile/internal/base/print.go:203
bootstrap/cmd/compile/internal/ssagen.(*ssafn).Fatalf(0xc000d90000, {0x13, 0xaa0f1}, {0xcb7b91, 0x3a}, {0xc000d99bc0, 0x4, 0x4})
	/workdir/go/src/cmd/compile/internal/ssagen/ssa.go:7896 +0x1f8
bootstrap/cmd/compile/internal/ssa.(*Func).Fatalf(0xc000d82340, {0xcb7b91, 0x3a}, {0xc000d99bc0, 0x4, 0x4})
	/workdir/go/src/cmd/compile/internal/ssa/func.go:716 +0x342
bootstrap/cmd/compile/internal/ssa.memState(0xc000d82340, {0xc000ec6200, 0x22, 0x40}, {0xc001046000, 0x22, 0x40})
	/workdir/go/src/cmd/compile/internal/ssa/tighten.go:240 +0x6c5
bootstrap/cmd/compile/internal/ssa.tighten(0xc000d82340)
[...]

Change-Id: Ic445fb48fe0f2c60ac67abe259b66594f1419152
Reviewed-on: https://go-review.googlesource.com/c/go/+/492335
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2023-05-03 21:28:37 +00:00
erifan01
ea8f037996 cmd/compile: enhance tighten pass for memory values
This CL enhances the tighten pass. Previously if a value has memory arg,
then the tighten pass won't move it, actually if the memory state is
consistent among definition and use block, we can move the value. This
CL optimizes this case. This is useful for the following situation:
b1:
  x = load(...mem)
  if(...) goto b2 else b3
b2:
  use(x)
b3:
  some_op_not_use_x

For the micro-benchmark mentioned in #56620, the performance improvement
is about 15%.
There's no noticeable performance change in the go1 benchmark.

Fixes #56620

Change-Id: I9b152754f27231f583a6995fc7cd8472aa7d390c
Reviewed-on: https://go-review.googlesource.com/c/go/+/458755
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
2023-05-03 19:56:09 +00:00
Keith Randall
4237dea5e3 cmd/compile: lower priority of avoiding registers
We avoid allocating registers when we know they may have a fixed use
later (arg/return value, or the CX shift argument to SHRQ, etc.) But
it isn't worth avoiding that register if it requires moving another
register.

A move we may have to do later is not worth a move we definitely have
to do now.

Fixes #59288

Change-Id: Ibbdcbaea9caee0c5f3e0d6956a1a084ba89757a9
Reviewed-on: https://go-review.googlesource.com/c/go/+/479895
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2023-03-28 17:31:36 +00:00
Damien Neil
b98c1b22bd all: gofmt
Change-Id: I926388ee5aeeff11f765cbd4558b66645d1bbc08
Reviewed-on: https://go-review.googlesource.com/c/go/+/477836
Run-TryBot: Damien Neil <dneil@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-03-20 22:54:22 +00:00
Michael Pratt
2c6aaaea64 cmd/compile/internal/ssa: drop overwritten regalloc basic block input requirements
For the following description, consider the following basic block graph:

      b1 ───┐┌──── b2
            ││
            ││
            ▼▼
            b3

For register allocator transitions between basic blocks, there are two
key passes (significant paraphrasing):

First, each basic block is visited in some predetermined visit order.
This is the core visitOrder range loop in regAllocState.regalloc. The
specific ordering heuristics aren't important here, except that the
order guarantees that when visiting a basic block at least one of its
predecessors has already been visited.

Upon visiting a basic block, that block sets its expected starting
register state (regAllocState.startRegs) based on the ending register
state (regAlloc.State.endRegs) of one of its predecessors. (How it
chooses which predecessor to use is not important here.)

From that starting state, registers are assigned for all values in the
block, ultimately resulting in some ending register state.

After all blocks have been visited, the shuffle pass
(regAllocState.shuffle) ensures that for each edge, endRegs of the
predecessor == startRegs of the successor. That is, it makes sure that
the startRegs assumptions actually hold true for each edge. It does this
by adding moves to the end of the predecessor block to place values in
the expected register for the successor block. These may be moves from
other registers, or from memory if the value is spilled.

Now on to the actual problem:

Assume that b1 places some value v1 into register R10, and thus ends
with endRegs containing R10 = v1.

When b3 is visited, it selects b1 as its model predecessor and sets
startRegs with R10 = v1.

b2 does not have v1 in R10, so later in the shuffle pass, we will add a
move of v1 into R10 to the end of b2 to ensure it is available for b3.

This is all perfectly fine and exactly how things should work.

Now suppose that b3 does not use v1. It does need to use some other
value v2, which is not currently in a register. When assigning v2 to a
register, it finds all registers are already in use and it needs to dump
a value. Ultimately, it decides to dump v1 from R10 and replace it with
v2.

This is fine, but it has downstream effects on shuffle in b2. b3's
startRegs still state that R10 = v1, so b2 will add a move to R10 even
though b3 will unconditionally overwrite it. i.e., the move at the end
of b2 is completely useless and can result in code like:

// end of b2
MOV n(SP), R10 // R10 = v1 <-- useless
// start of b3
MOV m(SP), R10 // R10 = v2

This is precisely what happened in #58298.

This CL addresses this problem by dropping registers from startRegs if
they are never used in the basic block prior to getting dumped. This
allows the shuffle pass to avoid placing those useless values into the
register.

There is a significant limitation to this CL, which is that it only
impacts the immediate predecessors of an overwriting block. We can
discuss this by zooming out a bit on the previous graph:

b4 ───┐┌──── b5
      ││
      ││
      ▼▼
      b1 ───┐┌──── b2
            ││
            ││
            ▼▼
            b3

Here we have the same graph, except we can see the two predecessors of
b1.

Now suppose that rather than b1 assigning R10 = v1 as above, the
assignment is done in b4. b1 has startRegs R10 = v1, doesn't use the
value at all, and simply passes it through to endRegs R10 = v1.

Now the shuffle pass will require both b2 and b5 to add a move to
assigned R10 = v1, because that is specified in their successor
startRegs.

With this CL, b3 drops R10 = v1 from startRegs, but there is no
backwards propagation, so b1 still has R10 = v1 in startRegs, and b5
still needs to add a useless move.

Extending this CL with such propagation may significantly increase the
number of useless moves we can remove, though it will add complexity to
maintenance and could potentially impact build performance depending on
how efficiently we could implement the propagation (something I haven't
considered carefully).

As-is, this optimization does not impact much code. In bent .text size
geomean is -0.02%. In the container/heap test binary, 18 of ~2500
functions are impacted by this CL. Bent and sweet do not show a
noticeable performance impact one way or another, however #58298 does
show a case where this can have impact if the useless instructions end
up in the hot path of a tight loop.

For #58298.

Change-Id: I2fcef37c955159d068fa0725f995a1848add8a5f
Reviewed-on: https://go-review.googlesource.com/c/go/+/471158
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2023-03-17 19:01:54 +00:00
Keith Randall
4a0e84a1be cmd/compile: improve register overwrite decision for resultInArg0 ops
When we're compiling a resultInArg0 op, we need to clobber the
register containing the input value. So we first make a register copy
of the input value. We can then clobber either of the two registers
the value is in and still have the original input value in a register
for future uses.

Before this CL, we always clobbered the original, not the copy.
But that's not always the right decision - if the original is already
in a specific register that it needs to be in later (typically, a
return value register), clobber the copy instead.

This optimization can remove a mov instruction. It saves 1376 bytes
of instructions in cmd/go.

Redo of CL 460656, reverted at CL 463475, with a fix for s390x.

The new code just ensures that the copied value is in a register
which is a valid input register for the instruction.

Change-Id: Id570b8a60a6d2da9090de80a90b6bb0266e9e38a
Reviewed-on: https://go-review.googlesource.com/c/go/+/463221
Auto-Submit: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-01-26 00:07:40 +00:00
Keith Randall
2bf0f54bbd Revert "cmd/compile: improve register overwrite decision for resultInArg0 ops"
This reverts CL 460656
Reason for revert: This breaks s390x.

Change-Id: I8fada14fabc90593b8033ed11188c04963d2da75
Reviewed-on: https://go-review.googlesource.com/c/go/+/463475
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
2023-01-25 19:25:48 +00:00
Keith Randall
3c7884a21b cmd/compile: improve register overwrite decision for resultInArg0 ops
When we're compiling a resultInArg0 op, we need to clobber the
register containing the input value. So we first make a register copy
of the input value. We can then clobber either of the two registers
the value is in and still have the original input value in a register
for future uses.

Before this CL, we always clobbered the original, not the copy.
But that's not always the right decision - if the original is already
in a specific register that it needs to be in later (typically, a
return value register), clobber the copy instead.

This optimization can remove a mov instruction. It saves 1376 bytes
of instructions in cmd/go.

Change-Id: I162870c84b9a180da6715bb24c296a902974fed3
Reviewed-on: https://go-review.googlesource.com/c/go/+/460656
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
2023-01-25 18:50:22 +00:00
Keith Randall
9088c691da cmd/compile: ensure temp register mask isn't empty
We need to avoid nospill registers at this point in regalloc.
Make sure that we don't restrict our register set to avoid registers
desired by other instructions, if the resulting set includes only
nospill registers.

Fixes #57846

Change-Id: I05478e4513c484755dc2e8621d73dac868e45a27
Reviewed-on: https://go-review.googlesource.com/c/go/+/461685
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-01-17 18:21:06 +00:00
Keith Randall
5f7abeca5a cmd/compile: teach regalloc about temporary registers
Temporary registers are sometimes needed for an architecture backend
which needs to use several machine instructions to implement a single
SSA instruction.

Mark such instructions so that regalloc can reserve the temporary register
for it. That way we don't have to reserve a fixed register like we do now.

Convert the temp-register-using instructions on amd64 to use this
new mechanism. Other archs can follow as needed.

Change-Id: I1d0c8588afdad5cd18b4398eb5a0f755be5dead7
Reviewed-on: https://go-review.googlesource.com/c/go/+/398556
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2022-11-17 18:53:13 +00:00
liu-xuewen
e81263c791 cmd/compile: remove issueSpill
Remove the useless issueSpill and continue directly.

Change-Id: I085e566be6f7200235e1bfe1f56a8e959316386a
GitHub-Last-Rev: 84db90cf34
GitHub-Pull-Request: golang/go#56520
Reviewed-on: https://go-review.googlesource.com/c/go/+/447195
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Auto-Submit: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2022-11-03 15:30:34 +00:00
Keith Randall
68bd383368 cmd/compile: add cache of sizeable objects so they can be reused
We kind of have this mechanism already, just normalizing it and
using it in a bunch of places. Previously a bunch of places cached
slices only for the duration of a single function compilation. Now
we can reuse slices across a whole compiler run.

Use a sync.Pool of powers-of-two sizes. This lets us use not
too much memory, and avoid holding onto memory we're no longer
using when a GC happens.

There's a few different types we need, so generate the code for it.
Generics would be useful here, but we can't use generics in the
compiler because of bootstrapping.

Change-Id: I6cf37e7b7b2e802882aaa723a0b29770511ccd82
Reviewed-on: https://go-review.googlesource.com/c/go/+/444820
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2022-10-31 21:41:20 +00:00
Keith Randall
7ddc45263c cmd/compile: separate out sparsemaps that need position
Make them a separate type, so the normal sparse maps don't
need the extra storage.

Change-Id: I3a0219487c35ea63723499723b0c742e321d97c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/444819
Reviewed-by: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2022-10-31 21:41:06 +00:00
Johan Brandhorst-Satzkorn
85196fc982 cmd/internal/ssa: correct references to _gen folder
The gen folder was renamed to _gen in CL 435472, but references in code
and docs were not updated. This updates the references.

Change-Id: Ibadc0cdcb5bed145c3257b58465a8df370487ae5
Reviewed-on: https://go-review.googlesource.com/c/go/+/444355
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Johan Brandhorst-Satzkorn <johan.brandhorst@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-10-23 17:42:11 +00:00
Keith Randall
bdd0f0f780 cmd/compile: better propagation of desired registers
This fixes two independent problems:

We normally propagate desired registers backwards through opcodes that
are marked resultInArg0. Unfortunately for the desired register
computation, ADDQconst is not marked as resultInArg0. This is because
the amd64 backend can write it out as LEAQ instead if the input and
output registers don't match. For desired register purposes, we want
to treat ADDQconst as resultInArg0, so that we get an ADDQ instead of
a LEAQ if we can.

Desired registers don't currently work for tuple-generating opcodes.
Declare that the desired register applies to the first element of the
tuple, and propagate the desired register back through Select0.

Noticed when fixing #51964

Change-Id: I83346b988882cd58c2d7e7e5b419a2b9a244ab66
Reviewed-on: https://go-review.googlesource.com/c/go/+/396035
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-03-31 20:19:48 +00:00
Joel Sing
6458b2e8db all: add support for c-archive and c-shared on linux/riscv64
This provides the runtime glue (_rt0_riscv64_linux_lib) for c-archive and c-shared
support, along with enabling both of these buildmodes on linux/riscv64.

Both misc/cgo/testcarchive and misc/cgo/testcshared now pass on this platform.

Fixes #47100

Change-Id: I7ad75b23ae1d592dbac60d15bba557668287711f
Reviewed-on: https://go-review.googlesource.com/c/go/+/334872
Trust: Joel Sing <joel@sing.id.au>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-11-03 09:23:34 +00:00
Cherry Mui
e35b5b25d9 cmd/compile: fix typo in comment in CL 358435
Change-Id: I0d8128668fc7a80b29aabc58dbc9a2929b889ec9
Reviewed-on: https://go-review.googlesource.com/c/go/+/358614
Trust: Cherry Mui <cherryyz@google.com>
Trust: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2021-10-25 21:02:43 +00:00
Cherry Mui
9012996a9a cmd/compile: don't clobber LR for tail calls
When doing a tail call the link register is live as the callee
will directly return to the caller (of the function that does the
tail call). Don't allocate or clobber the link register.

Fixes #49032.

Change-Id: I2d60f2354e5b6c14aa285c8983a9786687b90223
Reviewed-on: https://go-review.googlesource.com/c/go/+/358435
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-10-25 17:46:41 +00:00
Cuong Manh Le
f686f6a963 cmd/compile: remove Value.RemoveArg
It's only used in two places:

 - The one in regalloc.go can be replaced with v.resetArgs()
 - The one in rewrite.go can be open coded

and can cause wrong usage like the bug that CL 358117 fixed.

Change-Id: I125baf237db159d056fe4b1c73072331eea4d06a
Reviewed-on: https://go-review.googlesource.com/c/go/+/357965
Trust: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-10-25 02:59:35 +00:00
Joel Sing
dcee007aad cmd/compile: sort regalloc switch by architecture
Also tweak comment for the arm64 case.

Change-Id: I073405bd2acf901dcaaf33a034a84b6a09dd4a83
Reviewed-on: https://go-review.googlesource.com/c/go/+/334869
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Meng Zhuo <mzh@golangcn.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-08-21 08:58:07 +00:00
cuiweixie
ea8298e2f5 cmd/compile/internal/ssa: delete unused code
Fixes #46186

Change-Id: Idb0674079f9484593e07cca172dfbb19be0e594d
GitHub-Last-Rev: 615fc53655
GitHub-Pull-Request: golang/go#46185
Reviewed-on: https://go-review.googlesource.com/c/go/+/320111
Reviewed-by: Ben Shi <powerman1st@163.com>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: David Chase <drchase@google.com>
2021-08-16 14:50:20 +00:00
Than McIntosh
56af34f875 cmd/compile: place reg spills after OpArg{Int,Float}Reg ops
Tweak the register allocator to maintain the invariant that
OpArg{Int,Float}Reg values are placed together at the start of the
entry block, before any other non-pseudo-op values. Without this
change, when the register allocator adds spills we can wind up with an
interleaving of OpArg*Reg and stores, which complicates debug location
analysis.

Updates #40724.

Change-Id: Icf30dd814a9e25263ecbea2e48feb840a6e7f2bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/322630
Trust: Than McIntosh <thanm@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-05-27 18:47:37 +00:00
Cherry Zhang
5b328c4a2f cmd/compile: use desired register only if it satisfies register mask
In the register allocator, if possible, we allocate a value to its
desired register (the ideal register for its next use). In some
cases the desired register does not satisfies the value's output
register mask. We should not use the register in this case.

In the following example, v33 is going to be returned as a
function result, so it is allocated to its desired register AX.
However, its Op cannot use AX as output, causing miscompilation.

v33 = CMOVQEQF <int> v24 v28 v29 : AX (~R0[int])
v35 = MakeResult <int,int,mem> v33 v26 v18
Ret v35

Change-Id: Id0f4f27c4b233ee297e83077e3c8494fe193e664
Reviewed-on: https://go-review.googlesource.com/c/go/+/314630
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
2021-04-28 16:13:40 +00:00
Cherry Zhang
6b8e3e2d06 cmd/compile: reduce redundant register moves for regabi calls
Currently, if we have AX=a and BX=b, and we want to make a call
F(1, a, b), to move arguments into the desired registers it emits

	MOVQ AX, CX
	MOVL $1, AX // AX=1
	MOVQ BX, DX
	MOVQ CX, BX // BX=a
	MOVQ DX, CX // CX=b

This has a few redundant moves.

This is because we process inputs in order. First, allocate 1 to
AX, which kicks out a (in AX) to CX (a free register at the
moment). Then, allocate a to BX, which kicks out b (in BX) to DX.
Finally, put b to CX.

Notice that if we start with allocating CX=b, then BX=a, AX=1,
we will not have redundant moves. This CL reduces redundant moves
by allocating them in different order: First, for inpouts that are
already in place, keep them there. Then allocate free registers.
Then everything else.

                             before       after
cmd/compile binary size     23703888    23609680
            text size        8565899     8533291

(with regabiargs enabled.)

Change-Id: I69e1bdf745f2c90bb791f6d7c45b37384af1e874
Reviewed-on: https://go-review.googlesource.com/c/go/+/311371
Trust: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
2021-04-19 16:21:39 +00:00
Russ Cox
95ed5c3800 internal/buildcfg: move build configuration out of cmd/internal/objabi
The go/build package needs access to this configuration,
so move it into a new package available to the standard library.

Change-Id: I868a94148b52350c76116451f4ad9191246adcff
Reviewed-on: https://go-review.googlesource.com/c/go/+/310731
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Jay Conrod <jayconrod@google.com>
2021-04-16 19:20:53 +00:00
Cherry Zhang
6ae3b70ef2 cmd/compile: add clobberdeadreg mode
When -clobberdeadreg flag is set, the compiler inserts code that
clobbers integer registers at call sites. This may be helpful for
debugging register ABI.

Only implemented on AMD64 for now.

Change-Id: Ia203d3f891c30fd95d0103489056fe01d63a2899
Reviewed-on: https://go-review.googlesource.com/c/go/+/302809
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2021-03-19 23:21:21 +00:00
Austin Clements
0c93b16d01 cmd: move experiment flags into objabi.Experiment
This moves all remaining GOEXPERIMENT flags into the objabi.Experiment
struct, drops the "_enabled" from their name, and makes them all bool
typed.

We also drop DebugFlags.Fieldtrack because the previous CL shifted the
one test that used it to use GOEXPERIMENT instead.

Change-Id: I3406fe62b1c300bb4caeaffa6ca5ce56a70497fe
Reviewed-on: https://go-review.googlesource.com/c/go/+/302389
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2021-03-18 21:27:23 +00:00
erifan01
600259b099 cmd/compile: use depth first topological sort algorithm for layout
The current layout algorithm tries to put consecutive blocks together,
so the priority of the successor block is higher than the priority of
the zero indegree block. This algorithm is beneficial for subsequent
register allocation, but will result in more branch instructions.
The depth-first topological sorting algorithm is a well-known layout
algorithm, which has applications in many languages, and it helps to
reduce branch instructions. This CL applies it to the layout pass.
The test results show that it helps to reduce the code size.

This CL also includes the following changes:
1, Removed the primary predecessor mechanism. The new layout algorithm is
  not very friendly to register allocator in some cases, in order to adapt
  to the new layout algorithm, a new primary predecessor selection strategy
  is introduced.
2, Since the new layout implementation may place non-loop blocks between
  loop blocks, some adaptive modifications have also been made to looprotate
  pass.
3, The layout also affects the results of codegen, so this CL also adjusted
  several codegen tests accordingly.

It is inevitable that this CL will cause the code size or performance of a
few functions to decrease, but the number of cases it improves is much larger
than the number of cases it drops.

Statistical data from compilecmp on linux/amd64 is as follow:
name                      old time/op       new time/op       delta
Template                        382ms ± 4%        382ms ± 4%    ~     (p=0.497 n=49+50)
Unicode                         170ms ± 9%        169ms ± 8%    ~     (p=0.344 n=48+50)
GoTypes                         2.01s ± 4%        2.01s ± 4%    ~     (p=0.628 n=50+48)
Compiler                        190ms ±10%        189ms ± 9%    ~     (p=0.734 n=50+50)
SSA                             11.8s ± 2%        11.8s ± 3%    ~     (p=0.877 n=50+50)
Flate                           241ms ± 9%        241ms ± 8%    ~     (p=0.897 n=50+49)
GoParser                        366ms ± 3%        361ms ± 4%  -1.21%  (p=0.004 n=47+50)
Reflect                         835ms ± 3%        838ms ± 3%    ~     (p=0.275 n=50+49)
Tar                             336ms ± 4%        335ms ± 3%    ~     (p=0.454 n=48+48)
XML                             433ms ± 4%        431ms ± 3%    ~     (p=0.071 n=49+48)
LinkCompiler                    706ms ± 4%        705ms ± 4%    ~     (p=0.608 n=50+49)
ExternalLinkCompiler            1.85s ± 3%        1.83s ± 2%  -1.47%  (p=0.000 n=49+48)
LinkWithoutDebugCompiler        437ms ± 5%        437ms ± 6%    ~     (p=0.953 n=49+50)
[Geo mean]                      615ms             613ms       -0.37%

name                      old alloc/op      new alloc/op      delta
Template                       38.7MB ± 1%       38.7MB ± 1%    ~     (p=0.834 n=50+50)
Unicode                        28.1MB ± 0%       28.1MB ± 0%  -0.22%  (p=0.000 n=49+50)
GoTypes                         168MB ± 1%        168MB ± 1%    ~     (p=0.054 n=47+47)
Compiler                       23.0MB ± 1%       23.0MB ± 1%    ~     (p=0.432 n=50+50)
SSA                            1.54GB ± 0%       1.54GB ± 0%  +0.21%  (p=0.000 n=50+50)
Flate                          23.6MB ± 1%       23.6MB ± 1%    ~     (p=0.153 n=43+46)
GoParser                       35.1MB ± 1%       35.1MB ± 2%    ~     (p=0.202 n=50+50)
Reflect                        84.7MB ± 1%       84.7MB ± 1%    ~     (p=0.333 n=48+49)
Tar                            34.5MB ± 1%       34.5MB ± 1%    ~     (p=0.406 n=46+49)
XML                            44.3MB ± 2%       44.2MB ± 3%    ~     (p=0.981 n=50+50)
LinkCompiler                    131MB ± 0%        128MB ± 0%  -2.74%  (p=0.000 n=50+50)
ExternalLinkCompiler            120MB ± 0%        120MB ± 0%  +0.01%  (p=0.007 n=50+50)
LinkWithoutDebugCompiler       77.3MB ± 0%       77.3MB ± 0%  -0.02%  (p=0.000 n=50+50)
[Geo mean]                     69.3MB            69.1MB       -0.22%

file      before    after     Δ        %
addr2line 4104220   4043684   -60536   -1.475%
api       5342502   5249678   -92824   -1.737%
asm       4973785   4858257   -115528  -2.323%
buildid   2667844   2625660   -42184   -1.581%
cgo       4686849   4616313   -70536   -1.505%
compile   23667431  23268406  -399025  -1.686%
cover     4959676   4874108   -85568   -1.725%
dist      3515934   3450422   -65512   -1.863%
doc       3995581   3925469   -70112   -1.755%
fix       3379202   3318522   -60680   -1.796%
link      6743249   6629913   -113336  -1.681%
nm        4047529   3991777   -55752   -1.377%
objdump   4456151   4388151   -68000   -1.526%
pack      2435040   2398072   -36968   -1.518%
pprof     13804080  13565808  -238272  -1.726%
test2json 2690043   2645987   -44056   -1.638%
trace     10418492  10232716  -185776  -1.783%
vet       7258259   7121259   -137000  -1.888%
total     113145867 111204202 -1941665 -1.716%

The situation on linux/arm64 is as follow:
name                      old time/op       new time/op       delta
Template                        280ms ± 1%        282ms ± 1%  +0.75%  (p=0.000 n=46+48)
Unicode                         124ms ± 2%        124ms ± 2%  +0.37%  (p=0.045 n=50+50)
GoTypes                         1.69s ± 1%        1.70s ± 1%  +0.56%  (p=0.000 n=49+50)
Compiler                        122ms ± 1%        123ms ± 1%  +0.93%  (p=0.000 n=50+50)
SSA                             12.6s ± 1%        12.7s ± 0%  +0.72%  (p=0.000 n=50+50)
Flate                           170ms ± 1%        172ms ± 1%  +0.97%  (p=0.000 n=49+49)
GoParser                        262ms ± 1%        263ms ± 1%  +0.39%  (p=0.000 n=49+48)
Reflect                         639ms ± 1%        650ms ± 1%  +1.63%  (p=0.000 n=49+49)
Tar                             243ms ± 1%        245ms ± 1%  +0.82%  (p=0.000 n=50+50)
XML                             324ms ± 1%        327ms ± 1%  +0.72%  (p=0.000 n=50+49)
LinkCompiler                    597ms ± 1%        596ms ± 1%  -0.27%  (p=0.001 n=48+47)
ExternalLinkCompiler            1.90s ± 1%        1.88s ± 1%  -1.00%  (p=0.000 n=50+50)
LinkWithoutDebugCompiler        364ms ± 1%        363ms ± 1%    ~     (p=0.220 n=49+50)
[Geo mean]                      485ms             488ms       +0.49%

name                      old alloc/op      new alloc/op      delta
Template                       38.7MB ± 0%       38.8MB ± 1%    ~     (p=0.093 n=43+49)
Unicode                        28.4MB ± 0%       28.4MB ± 0%  +0.03%  (p=0.000 n=49+45)
GoTypes                         169MB ± 1%        169MB ± 1%  +0.23%  (p=0.010 n=50+50)
Compiler                       23.2MB ± 1%       23.2MB ± 1%  +0.11%  (p=0.000 n=40+44)
SSA                            1.54GB ± 0%       1.55GB ± 0%  +0.45%  (p=0.000 n=47+49)
Flate                          23.8MB ± 2%       23.8MB ± 1%    ~     (p=0.543 n=50+50)
GoParser                       35.3MB ± 1%       35.4MB ± 1%    ~     (p=0.792 n=50+50)
Reflect                        85.2MB ± 1%       85.2MB ± 0%    ~     (p=0.055 n=50+47)
Tar                            34.5MB ± 1%       34.5MB ± 1%  +0.06%  (p=0.015 n=50+50)
XML                            43.8MB ± 2%       43.9MB ± 2%  +0.19%  (p=0.000 n=48+48)
LinkCompiler                    137MB ± 0%        136MB ± 0%  -0.92%  (p=0.000 n=50+50)
ExternalLinkCompiler            127MB ± 0%        127MB ± 0%    ~     (p=0.516 n=50+50)
LinkWithoutDebugCompiler       84.0MB ± 0%       84.0MB ± 0%    ~     (p=0.057 n=50+50)
[Geo mean]                     70.4MB            70.4MB       +0.01%

file      before    after     Δ        %
addr2line 4021557   4002933   -18624   -0.463%
api       5127847   5028503   -99344   -1.937%
asm       5034716   4936836   -97880   -1.944%
buildid   2608118   2594094   -14024   -0.538%
cgo       4488592   4398320   -90272   -2.011%
compile   22501129  22213592  -287537  -1.278%
cover     4742301   4713573   -28728   -0.606%
dist      3388071   3365311   -22760   -0.672%
doc       3802250   3776082   -26168   -0.688%
fix       3306147   3216939   -89208   -2.698%
link      6404483   6363699   -40784   -0.637%
nm        3941026   3921930   -19096   -0.485%
objdump   4383330   4295122   -88208   -2.012%
pack      2404547   2389515   -15032   -0.625%
pprof     12996234  12856818  -139416  -1.073%
test2json 2668500   2586788   -81712   -3.062%
trace     9816276   9609580   -206696  -2.106%
vet       6900682   6787338   -113344  -1.643%
total     108535806 107056973 -1478833 -1.363%

Change-Id: Iaec1cdcaacca8025e9babb0fb8a532fddb70c87d
Reviewed-on: https://go-review.googlesource.com/c/go/+/255239
Reviewed-by: eric fang <eric.fang@arm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: eric fang <eric.fang@arm.com>
2021-03-16 02:44:54 +00:00
Cherry Zhang
c4190fc34d cmd/compile: remove ARMv5 special case in register allocator
The register allocator has a special case that doesn't allocate
LR on ARMv5. This was necessary when softfloat expansion was done
by the assembler. Now softfloat calls are inserted by SSA, so it
works as normal. Remove this special case.

Change-Id: I5502f07597f4d4b675dc16b6b0d7cb47e1e8974b
Reviewed-on: https://go-review.googlesource.com/c/go/+/301792
Trust: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2021-03-15 18:11:32 +00:00
David Chase
9d88a9e2bf cmd/compile: implement simple register results
at least for ints and strings

includes simple test

For #40724.

Change-Id: Ib8484e5b957b08f961574a67cfd93d3d26551558
Reviewed-on: https://go-review.googlesource.com/c/go/+/295309
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-03-04 19:45:11 +00:00
David Chase
4532467c18 cmd/compile: pass register parameters to called function
still needs morestack
still needs results
lots of corner cases also not dealt with.

For #40724.

Change-Id: I03abdf1e8363d75c52969560b427e488a48cd37a
Reviewed-on: https://go-review.googlesource.com/c/go/+/293889
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
2021-03-04 03:30:55 +00:00