Commit graph

126 commits

Author SHA1 Message Date
khr@golang.org
20d7c57422 cmd/compile: pair loads and stores on arm64
Look for possible paired load/store operations on arm64.
I don't expect this would be a lot faster, but it will save
binary space, and indirectly through the icache at least a bit
of time.

Change-Id: I4dd73b0e6329c4659b7453998f9b75320fcf380b
Reviewed-on: https://go-review.googlesource.com/c/go/+/629256
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-02-13 14:07:47 -08:00
Cuong Manh Le
864aa86448 cmd/compile: run checkbce after fuseLate pass
So the bounds check which are eliminated during late fuse pass could be
detected correctly.

Fixes #67329

Change-Id: Id7992fbb8c26e0d43e7db66a0a3a2c0d9ed937a8
Reviewed-on: https://go-review.googlesource.com/c/go/+/598635
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-07-23 23:50:30 +00:00
jeffery
69aa1974f3 cmd/compile: combine phielim and copyelim into a single pass
Change-Id: Id21145b14169d28bac2144a31f6d3d9729f4be1e
GitHub-Last-Rev: 5413f4753e
GitHub-Pull-Request: golang/go#63818
Reviewed-on: https://go-review.googlesource.com/c/go/+/538535
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2024-04-22 14:55:18 +00:00
David Chase
b72bbaebf9 cmd/compile: expand calls cleanup
Convert expand calls into a smaller number of focused
recursive rewrites, and rely on an enhanced version of
"decompose" to clean up afterwards.

Debugging information seems to emerge intact.

Change-Id: Ic46da4207e3a4da5c8e2c47b637b0e35abbe56bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/507295
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-10-06 20:57:33 +00:00
Yi Yang
4ee1d542ed cmd/compile: sparse conditional constant propagation
sparse conditional constant propagation can discover optimization
opportunities that cannot be found by just combining constant folding
and constant propagation and dead code elimination separately.

This is a re-submit of PR#59575, which fix a broken dominance relationship caught by ssacheck

Updates https://github.com/golang/go/issues/59399

Change-Id: I57482dee38f8e80a610aed4f64295e60c38b7a47
GitHub-Last-Rev: 830016f24e
GitHub-Pull-Request: golang/go#60469
Reviewed-on: https://go-review.googlesource.com/c/go/+/498795
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2023-09-12 21:01:50 +00:00
Matthew Dempsky
5d6f835b3e cmd/compile/internal/ssagen: call AllocFrame after ssa.Compile
This indirection is no longer necessary.

Change-Id: Ibb5eb1753febdc17a93ea9c35130e3d2b26c360e
Reviewed-on: https://go-review.googlesource.com/c/go/+/526518
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2023-09-08 19:05:18 +00:00
Bryan Mills
02d234e34d Revert "cmd/compile: sparse conditional constant propagation"
This reverts CL 483875.

Reason for revert: appears to cause internal compiler errors on the ssacheck builder.

Change-Id: I662418384291470c1962c417797a5890dd9aa7a4
Reviewed-on: https://go-review.googlesource.com/c/go/+/497855
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Bryan Mills <bcmills@google.com>
2023-05-24 14:39:34 +00:00
Yi Yang
fa50248ce6 cmd/compile: sparse conditional constant propagation
sparse conditional constant propagation can discover optimization opportunities that cannot be found by just combining constant folding and constant propagation and dead code elimination separately.

Updates #59399

Change-Id: Ia954e906480654a6f0cc065d75b5912f96f36b2e
GitHub-Last-Rev: 90fc02db99
GitHub-Pull-Request: golang/go#59575
Reviewed-on: https://go-review.googlesource.com/c/go/+/483875
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
2023-05-24 02:54:03 +00:00
Keith Randall
cedf5008a8 cmd/compile: introduce separate memory op combining pass
Memory op combining is currently done using arch-specific rewrite rules.
Instead, do them as a arch-independent rewrite pass. This ensures that
all architectures (with unaligned loads & stores) get equal treatment.

This removes a lot of rewrite rules.

The new pass is a bit more comprehensive. It handles things like out-of-order
writes and is careful not to apply partial optimizations that then block
further optimizations.

Change-Id: I780ff3bb052475cd725a923309616882d25b8d9e
Reviewed-on: https://go-review.googlesource.com/c/go/+/478475
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2023-04-21 21:05:46 +00:00
Cherry Mui
40a6e2dae5 cmd/compile: tighten for huge functions in -N mode
Currently, in -N mode we skip the tighten pass. However, for very
large functions, many values live across blocks can cause
pathological behavior in the register allocator, which could use
a huge amount of memory or cause the program to hang. For
functions that large, debugging using a debugger is unlikely to be
very useful (the function is probably generated anyway). So we do
a little optimization to make fewer values live across blocks and
make it easier for the compiler.

Fixes #52180.

Change-Id: I355fe31bb87ea5d0870bb52dd06405dd5d791dab
Reviewed-on: https://go-review.googlesource.com/c/go/+/475339
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2023-03-13 22:04:56 +00:00
Marcel Meyer
b419db6c15 all: fix typos in go file comments
This is the second round to look for spelling mistakes. This time the
manual sifting of the result list was made easier by filtering out
capitalized and camelcase words.

grep -r --include '*.go' -E '^// .*$' . | aspell list | grep -E -x '[A-Za-z]{1}[a-z]*' | sort | uniq

This PR will be imported into Gerrit with the title and first
comment (this text) used to generate the subject and body of
the Gerrit change.

Change-Id: Ie8a2092aaa7e1f051aa90f03dbaf2b9aaf5664a9
GitHub-Last-Rev: fc2bd6e0c5
GitHub-Pull-Request: golang/go#57737
Reviewed-on: https://go-review.googlesource.com/c/go/+/461595
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
2023-01-20 03:27:26 +00:00
Wayne Zuo
1c783f7c68 cmd/compile: split 3 operand LEA in late lower pass
On newer amd64 cpus 3 operand LEA instructions are slow, CL 114655 split
them to 2 LEA instructions in genssa.

This CL make late lower pass run after addressing modes, and split 3
operand LEA in late lower pass so that we can do common-subexpression
elimination for splited LEAs.

Updates #21735

Change-Id: Ied49139c7abab655e1a14a6fd793bdf9f987d1f1
Reviewed-on: https://go-review.googlesource.com/c/go/+/440035
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
2022-10-17 15:11:16 +00:00
eric fang
ddc7d2a80c cmd/compile: add late lower pass for last rules to run
Usually optimization rules have corresponding priorities, some need to
be run first, some run next, and some run last, which produces the best
code. But currently our optimization rules have no priority, this CL
adds a late lower pass that runs those rules that need to be run at last,
such as split unreasonable constant folding. This pass can be seen as
the second round of the lower pass.

For example:
func foo(a, b uint64) uint64 {
        d := a+0x1234568
        d1 := b+0x1234568
        return d&d1
}
The code generated by the master branch:
	0x0004 00004        ADD     $19088744, R0, R2 // movz+movk+add
	0x0010 00016        ADD     $19088744, R1, R1 // movz+movk+add
	0x001c 00028        AND     R1, R2, R0

This is because the current constant folding optimization rules do not
take into account the range of constants, causing the constant to be
loaded repeatedly. This CL splits these unreasonable constants folding
in the late lower pass. With this CL the generated code:
	0x0004 00004        MOVD    $19088744, R2 // movz+movk
	0x000c 00012        ADD     R0, R2, R3
	0x0010 00016        ADD     R1, R2, R1
	0x0014 00020        AND     R1, R3, R0

This CL also adds constant folding optimization for ADDS instruction.

In addition, in order not to introduce the codegen regression, an
optimization rule is added to change the addition of a negative number
into a subtraction of a positive number.

go1 benchmarks:
name                     old time/op    new time/op    delta
BinaryTree17-8              1.22s ± 1%     1.24s ± 0%  +1.56%  (p=0.008 n=5+5)
Fannkuch11-8                1.54s ± 0%     1.53s ± 0%  -0.69%  (p=0.016 n=4+5)
FmtFprintfEmpty-8          14.1ns ± 0%    14.1ns ± 0%    ~     (p=0.079 n=4+5)
FmtFprintfString-8         26.0ns ± 0%    26.1ns ± 0%  +0.23%  (p=0.008 n=5+5)
FmtFprintfInt-8            32.3ns ± 0%    32.9ns ± 1%  +1.72%  (p=0.008 n=5+5)
FmtFprintfIntInt-8         54.5ns ± 0%    55.5ns ± 0%  +1.83%  (p=0.008 n=5+5)
FmtFprintfPrefixedInt-8    61.5ns ± 0%    62.0ns ± 0%  +0.93%  (p=0.008 n=5+5)
FmtFprintfFloat-8          72.0ns ± 0%    73.6ns ± 0%  +2.24%  (p=0.008 n=5+5)
FmtManyArgs-8               221ns ± 0%     224ns ± 0%  +1.22%  (p=0.008 n=5+5)
GobDecode-8                1.91ms ± 0%    1.93ms ± 0%  +0.98%  (p=0.008 n=5+5)
GobEncode-8                1.40ms ± 1%    1.39ms ± 0%  -0.79%  (p=0.032 n=5+5)
Gzip-8                      115ms ± 0%     117ms ± 1%  +1.17%  (p=0.008 n=5+5)
Gunzip-8                   19.4ms ± 1%    19.3ms ± 0%  -0.71%  (p=0.016 n=5+4)
HTTPClientServer-8         27.0µs ± 0%    27.3µs ± 0%  +0.80%  (p=0.008 n=5+5)
JSONEncode-8               3.36ms ± 1%    3.33ms ± 0%    ~     (p=0.056 n=5+5)
JSONDecode-8               17.5ms ± 2%    17.8ms ± 0%  +1.71%  (p=0.016 n=5+4)
Mandelbrot200-8            2.29ms ± 0%    2.29ms ± 0%    ~     (p=0.151 n=5+5)
GoParse-8                  1.35ms ± 1%    1.36ms ± 1%    ~     (p=0.056 n=5+5)
RegexpMatchEasy0_32-8      24.5ns ± 0%    24.5ns ± 0%    ~     (p=0.444 n=4+5)
RegexpMatchEasy0_1K-8       131ns ±11%     118ns ± 6%    ~     (p=0.056 n=5+5)
RegexpMatchEasy1_32-8      22.9ns ± 0%    22.9ns ± 0%    ~     (p=0.905 n=4+5)
RegexpMatchEasy1_1K-8       126ns ± 0%     127ns ± 0%    ~     (p=0.063 n=4+5)
RegexpMatchMedium_32-8      486ns ± 5%     483ns ± 0%    ~     (p=0.381 n=5+4)
RegexpMatchMedium_1K-8     15.4µs ± 1%    15.5µs ± 0%    ~     (p=0.151 n=5+5)
RegexpMatchHard_32-8        687ns ± 0%     686ns ± 0%    ~     (p=0.103 n=5+5)
RegexpMatchHard_1K-8       20.7µs ± 0%    20.7µs ± 1%    ~     (p=0.151 n=5+5)
Revcomp-8                   175ms ± 2%     176ms ± 3%    ~     (p=1.000 n=5+5)
Template-8                 20.4ms ± 6%    20.1ms ± 2%    ~     (p=0.151 n=5+5)
TimeParse-8                 112ns ± 0%     113ns ± 0%  +0.97%  (p=0.016 n=5+4)
TimeFormat-8                156ns ± 0%     145ns ± 0%  -7.14%  (p=0.029 n=4+4)

Change-Id: I3ced26e89041f873ac989586514ccc5ee09f13da
Reviewed-on: https://go-review.googlesource.com/c/go/+/425134
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Eric Fang <eric.fang@arm.com>
2022-10-05 02:40:56 +00:00
cuiweixie
095b6f050f cmd/compile/internal/ssa: use strings.Builder
Change-Id: Ieb15b54d36f18d1fbccbafe5451a4758df797718
Reviewed-on: https://go-review.googlesource.com/c/go/+/428359
Run-TryBot: Robert Griesemer <gri@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
2022-09-07 01:29:43 +00:00
Russ Cox
19309779ac all: gofmt main repo
[This CL is part of a sequence implementing the proposal #51082.
The design doc is at https://go.dev/s/godocfmt-design.]

Run the updated gofmt, which reformats doc comments,
on the main repository. Vendored files are excluded.

For #51082.

Change-Id: I7332f099b60f716295fb34719c98c04eb1a85407
Reviewed-on: https://go-review.googlesource.com/c/go/+/384268
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2022-04-11 16:34:30 +00:00
Russ Cox
690ac4071f all: remove trailing blank doc comment lines
A future change to gofmt will rewrite

	// Doc comment.
	//
	func f()

to

	// Doc comment.
	func f()

Apply that change preemptively to all doc comments.

For #51082.

Change-Id: I4023e16cfb0729b64a8590f071cd92f17343081d
Reviewed-on: https://go-review.googlesource.com/c/go/+/384259
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-01 18:18:07 +00:00
Russ Cox
7d87ccc860 all: fix various doc comment formatting nits
A run of lines that are indented with any number of spaces or tabs
format as a <pre> block. This commit fixes various doc comments
that format badly according to that (standard) rule.

For example, consider:

	// - List item.
	//   Second line.
	// - Another item.

Because the - lines are unindented, this is actually two paragraphs
separated by a one-line <pre> block. This CL rewrites it to:

	//  - List item.
	//    Second line.
	//  - Another item.

Today, that will format as a single <pre> block.
In a future release, we hope to format it as a bulleted list.

Various other minor fixes as well, all in preparation for reformatting.

For #51082.

Change-Id: I95cf06040d4186830e571cd50148be3bf8daf189
Reviewed-on: https://go-review.googlesource.com/c/go/+/384257
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-01 18:18:01 +00:00
David Chase
af72ddfcd7 cmd/compile: extend dump-to-file to handle "genssa" (asm) case.
Extend the existing dump-to-file to also do assembly output
to make it easier to write debug-information tests that check
for line-numbering in particular orders.

Includes POC test (which is silent w/o -v):
go test  -v -run TestDebugLines cmd/compile/internal/ssa
=== RUN   TestDebugLines
Preserving temporary directory /var/folders/v6/xyzzy/T/debug_lines_test321
About to run (cd /var/folders/v6/xyzzy/T/debug_lines_test321; \
    GOSSADIR=/var/folders/v6/xyzzy/T/debug_lines_test321 \
    /Users/drchase/work/go/bin/go build -o foo.o \
    '-gcflags=-N -l -d=ssa/genssa/dump=sayhi' \
    /Users/drchase/work/go/src/cmd/compile/internal/ssa/testdata/sayhi.go )
Saw stmt# 8 for submatch '8' on dump line #7 = ' v107   00005 (+8)  MOVQ    AX, "".n(SP)'
Saw stmt# 9 for submatch '9' on dump line #9 = ' v87    00007 (+9)  MOVUPS  X15, ""..autotmp_2-32(SP)'
Saw stmt# 10 for submatch '10' on dump line #46 = ' v65     00044 (+10)     MOVUPS  X15, ""..autotmp_2-32(SP)'
Saw stmt# 11 for submatch '11' on dump line #83 = ' v131    00081 (+11)     MOVQ    "".wg+8(SP), AX'
--- PASS: TestDebugLines (4.95s)
PASS
ok  	cmd/compile/internal/ssa	5.685s

Includes a test to ensure that inlining information is printed correctly.

Updates #47880.

Change-Id: I83b596476a88687d71d5b65dbb94641a576d747e
Reviewed-on: https://go-review.googlesource.com/c/go/+/348970
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-09-20 19:36:41 +00:00
Than McIntosh
fd09593667 cmd/compile: minor doc enhancements
Add a little more detail to the ssa README relating to GOSSAFUNC.

Update the -d=ssa help section to give a little more detail on what
to expect with applying the /debug=X qualifier to a phase.

Change-Id: I7027735f1f2955dbb5b9be36d9a648e8dc655048
Reviewed-on: https://go-review.googlesource.com/c/go/+/315229
Trust: Than McIntosh <thanm@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2021-04-29 20:46:14 +00:00
Russ Cox
95ed5c3800 internal/buildcfg: move build configuration out of cmd/internal/objabi
The go/build package needs access to this configuration,
so move it into a new package available to the standard library.

Change-Id: I868a94148b52350c76116451f4ad9191246adcff
Reviewed-on: https://go-review.googlesource.com/c/go/+/310731
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Jay Conrod <jayconrod@google.com>
2021-04-16 19:20:53 +00:00
Austin Clements
0c93b16d01 cmd: move experiment flags into objabi.Experiment
This moves all remaining GOEXPERIMENT flags into the objabi.Experiment
struct, drops the "_enabled" from their name, and makes them all bool
typed.

We also drop DebugFlags.Fieldtrack because the previous CL shifted the
one test that used it to use GOEXPERIMENT instead.

Change-Id: I3406fe62b1c300bb4caeaffa6ca5ce56a70497fe
Reviewed-on: https://go-review.googlesource.com/c/go/+/302389
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2021-03-18 21:27:23 +00:00
David Chase
f7dad5eae4 [dev.regabi] cmd/compile: remove leftover code form late call lowering work
It's no longer conditional.

Change-Id: I697bb0e9ffe9644ec4d2766f7e8be8b82d3b0638
Reviewed-on: https://go-review.googlesource.com/c/go/+/286013
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2021-01-26 18:35:19 +00:00
Michele Di Pede
94b3fd06cb cmd/compile: code cleanup
Change-Id: Ibf68e663f29a5cb3b64a7d923c005c16da647769
Reviewed-on: https://go-review.googlesource.com/c/go/+/266537
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
2020-10-30 19:48:57 +00:00
David Chase
15f01d6ae9 cmd/compile: delay expansion of OpArg until expand_calls
As it says, delay expanpsion of OpArg to the expand_calls phase,
to enable (eventually) interprocedural SSA optimizations, and
(sooner) change to a register ABI.

Includes a round of cleanup to function names and comments,
largely to match the expanded scope of the functions.

This CL removes the per-function dependence on GOSSAHASH,
but the go116lateCallExpansion kill switch remains (and was
tested locally to ensure it worked).

Two functions in expand_calls.go that performed overlapping
things were combined into a single function that is called
twice.

Fixes #42236.
For #40724.

Change-Id: Icbb78947eaa39f17f2c1210d5c2caef20abd6571
Reviewed-on: https://go-review.googlesource.com/c/go/+/262117
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-29 03:23:51 +00:00
David Chase
c3fe874f25 cmd/compile: avoid generating CSEs; do all aggregates; maintain debug names
This adds a pass to detect common selection operations,
to avoid generating duplicates.  Duplicate offsets are
also detected.

All aggregate types are now handled; there is some freedom in where
expand_calls is run, though it must run before softfloat.

Debug-name-maintenance is now incremental both in decompose builtin
and in expand_calls; it might be good to push this into all the
decompose passes.

(this is a smash of 5 CLs that rewrote some of the same code several
times to deal with phase-ordering problems, and included an abandoned
attempt.)

For #40724.

Change-Id: I2a0c32f20660bf8b99e2bcecd33545d97d2bd3c6
Reviewed-on: https://go-review.googlesource.com/c/go/+/249458
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-23 18:02:16 +00:00
David Chase
f4cbf3477f cmd/compile: allow directory specification for GOSSAFUNC output
This was useful for debugging failures occurring during make.bash.
The added flush also ensures that any hints in the GOSSAFUNC output
are flushed before fatal exit.

The environment variable GOSSADIR specifies where the SSA html debugging
files should be placed.  To avoid collisions, each one is written into
the [package].[functionOrMethod].html, where [package] is the filepath
separator separated package name, function is the function name, and method
is either (*Type).Method, or Type.Method, as appropriate.  Directories
are created as necessary to make this work.

Change-Id: I420927426b618b633bb1ffc51cf0f223b8f6d49c
Reviewed-on: https://go-review.googlesource.com/c/go/+/252338
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-01 20:52:34 +00:00
David Chase
ad8447bed9 cmd/compile: fix late call expansion for SSA-able aggregate results and arguments
This change incorporates the decision that it should be possible to
run call expansion relatively late in the optimization chain, so that
(1) calls themselves can be exposed to useful optimizations
(2) the effect of selectors on aggregates is seen at the rewrite,
    so that assignment of parts into registers is less complicated
    (at least I hope it works that way).

That means that selectors feeding into SelectN need to be processed,
and Make* feeding into call parameters need to be processed.

This does however require that call expansion run before decompose
builtins.

This doesn't yet handle rewrites of strings, slices, interfaces,
and complex numbers.

Passes run.bash and race.bash

Change-Id: I71ff23d3c491043beb30e926949970c4f63ef1a4
Reviewed-on: https://go-review.googlesource.com/c/go/+/245133
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-01 16:43:20 +00:00
David Chase
44c0586931 cmd/compile: add code to expand calls just before late opt
Still needs to generate the calls that will need lowering.

Change-Id: Ifd4e510193441a5e27c462c1f1d704f07bf6dec3
Reviewed-on: https://go-review.googlesource.com/c/go/+/242359
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-09-18 16:14:40 +00:00
surechen
17553c6e71 cmd/compile: move dumpFileSeq
I noticed that there is a Todo comment here. This variable is only used for filename when dump a function's ssa passes result in details. It is no problem to print a function alone, but may be edited by not only one goroutine if dump multiple functions at the same time. Although it looks only dump one function's ssa passes now. As far as I am concerned this variable can be a member variable of the struct Func. I'm not sure if this change is necessary. Looking forward to your advices, thank you very much.

Change-Id: I35dd7247889e0cc7f19c0b400b597206592dee75
Reviewed-on: https://go-review.googlesource.com/c/go/+/244918
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
2020-08-17 21:04:19 +00:00
Michael Munday
ac743dea8e cmd/compile: always tighten and de-duplicate tuple selectors
The scheduler assumes two special invariants that apply to tuple
selectors (Select0 and Select1 ops):

  1. There is only one tuple selector of each type per generator.
  2. Tuple selectors and generators reside in the same block.

Prior to this CL the assumption was that these invariants would
only be broken by the CSE pass. The CSE pass therefore contained
code to move and de-duplicate selectors to fix these invariants.

However it is also possible to write relatively basic optimization
rules that cause these invariants to be broken. For example:

  (A (Select0 (B))) -> (Select1 (B))

This rule could result in the newly added selector (Select1) being
in a different block to the tuple generator (see issue #38356). It
could also result in duplicate selectors if this rule matches
multiple times for the same tuple generator (see issue #39472).

The CSE pass will 'fix' these invariants. However it will only do
so when optimizations are enabled (since disabling optimizations
disables the CSE pass).

This CL moves the CSE tuple selector fixup code into its own pass
and makes it mandatory even when optimizations are disabled. This
allows tuple selectors to be treated like normal ops for most of
the compilation pipeline until after the new pass has run, at which
point we need to be careful to maintain the invariant again.

Fixes #39472.

Change-Id: Ia3f79e09d9c65ac95f897ce37e967ee1258a080b
Reviewed-on: https://go-review.googlesource.com/c/go/+/237118
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-06-10 14:55:29 +00:00
Bradford Lamson-Scribner
763bd58b19 cmd/compile: restore missing columns in ssa.html
If the final pass(es) are identical during ssa.html generation,
they are persisted in-memory as "pendingPhases" but never get
written as a column in the html. This change flushes those
in-memory phases.

Fixes #38242

Change-Id: Id13477dcbe7b419a818bb457861b2422ba5ef4bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/227182
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-05 23:23:03 +00:00
Bradford Lamson-Scribner
6736b2fdb2 cmd/compile: refactor around HTMLWriter removing logger in favor of Func
Replace HTMLWriter's Logger field with a *Func. Implement Fatalf method
for HTMLWriter which gets the Frontend() from the Func and calls down
into it's Fatalf method, passing the msg and args along. Replace
remaining calls to the old Logger with calls to logging methods on
the Func.

Change-Id: I966342ef9997396f3416fb152fa52d60080ebecb
Reviewed-on: https://go-review.googlesource.com/c/go/+/227277
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-05 20:54:32 +00:00
Keith Randall
98cb76799c cmd/compile: insert complicated x86 addressing modes as a separate pass
Use a separate compiler pass to introduce complicated x86 addressing
modes.  Loads in the normal architecture rules (for x86 and all other
platforms) can have constant offsets (AuxInt values) and symbols (Aux
values), but no more.

The complex addressing modes (x+y, x+2*y, etc.) are introduced in a
separate pass that combines loads with LEAQx ops.

Organizing rewrites this way simplifies the number of rewrites
required, as there are lots of different rule orderings that have to
be specified to ensure these complex addressing modes are always found
if they are possible.

Update #36468

Change-Id: I5b4bf7b03a1e731d6dfeb9ef19b376175f3b4b44
Reviewed-on: https://go-review.googlesource.com/c/go/+/217097
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2020-03-10 00:13:21 +00:00
Michael Munday
e37cc29863 cmd/compile: optimize integer-in-range checks
This CL incorporates code from CL 201206 by Josh Bleecher Snyder
(thanks Josh).

This CL restores the integer-in-range optimizations in the SSA
backend. The fuse pass is enhanced to detect inequalities that
could be merged and fuse their associated blocks while the generic
rules optimize them into a single unsigned comparison.

For example, the inequality `x >= 0 && x < 10` will now be optimized
to `unsigned(x) < 10`.

Overall has a fairly positive impact on binary sizes.

name                      old time/op       new time/op       delta
Template                        192ms ± 1%        192ms ± 1%    ~     (p=0.757 n=17+18)
Unicode                        76.6ms ± 2%       76.5ms ± 2%    ~     (p=0.603 n=19+19)
GoTypes                         694ms ± 1%        693ms ± 1%    ~     (p=0.569 n=19+20)
Compiler                        3.26s ± 0%        3.27s ± 0%  +0.25%  (p=0.000 n=20+20)
SSA                             7.41s ± 0%        7.49s ± 0%  +1.10%  (p=0.000 n=17+19)
Flate                           120ms ± 1%        120ms ± 1%  +0.38%  (p=0.003 n=19+19)
GoParser                        152ms ± 1%        152ms ± 1%    ~     (p=0.061 n=17+19)
Reflect                         422ms ± 1%        425ms ± 2%  +0.76%  (p=0.001 n=18+20)
Tar                             167ms ± 1%        167ms ± 0%    ~     (p=0.730 n=18+19)
XML                             233ms ± 4%        231ms ± 1%    ~     (p=0.752 n=20+17)
LinkCompiler                    927ms ± 8%        928ms ± 8%    ~     (p=0.857 n=19+20)
ExternalLinkCompiler            1.81s ± 2%        1.81s ± 2%    ~     (p=0.513 n=19+20)
LinkWithoutDebugCompiler        556ms ±10%        583ms ±13%  +4.95%  (p=0.007 n=20+20)
[Geo mean]                      478ms             481ms       +0.52%

name                      old user-time/op  new user-time/op  delta
Template                        270ms ± 5%        269ms ± 7%    ~     (p=0.925 n=20+20)
Unicode                         134ms ± 7%        131ms ±14%    ~     (p=0.593 n=18+20)
GoTypes                         981ms ± 3%        987ms ± 2%  +0.63%  (p=0.049 n=19+18)
Compiler                        4.50s ± 2%        4.50s ± 1%    ~     (p=0.588 n=19+20)
SSA                             10.6s ± 2%        10.6s ± 1%    ~     (p=0.141 n=20+19)
Flate                           164ms ± 8%        165ms ±10%    ~     (p=0.738 n=20+20)
GoParser                        202ms ± 5%        203ms ± 6%    ~     (p=0.820 n=20+20)
Reflect                         587ms ± 6%        597ms ± 3%    ~     (p=0.087 n=20+18)
Tar                             230ms ± 6%        228ms ± 8%    ~     (p=0.569 n=19+20)
XML                             311ms ± 6%        314ms ± 5%    ~     (p=0.369 n=20+20)
LinkCompiler                    878ms ± 8%        887ms ± 7%    ~     (p=0.289 n=20+20)
ExternalLinkCompiler            1.60s ± 7%        1.60s ± 7%    ~     (p=0.820 n=20+20)
LinkWithoutDebugCompiler        498ms ±12%        489ms ±11%    ~     (p=0.398 n=20+20)
[Geo mean]                      611ms             611ms       +0.05%

name                      old alloc/op      new alloc/op      delta
Template                       36.1MB ± 0%       36.0MB ± 0%  -0.32%  (p=0.000 n=20+20)
Unicode                        28.3MB ± 0%       28.3MB ± 0%  -0.03%  (p=0.000 n=19+20)
GoTypes                         121MB ± 0%        121MB ± 0%    ~     (p=0.226 n=16+20)
Compiler                        563MB ± 0%        563MB ± 0%    ~     (p=0.166 n=20+19)
SSA                            1.32GB ± 0%       1.33GB ± 0%  +0.88%  (p=0.000 n=20+19)
Flate                          22.7MB ± 0%       22.7MB ± 0%  -0.02%  (p=0.033 n=19+20)
GoParser                       27.9MB ± 0%       27.9MB ± 0%  -0.02%  (p=0.001 n=20+20)
Reflect                        78.3MB ± 0%       78.2MB ± 0%  -0.01%  (p=0.019 n=20+20)
Tar                            34.0MB ± 0%       34.0MB ± 0%  -0.04%  (p=0.000 n=20+20)
XML                            43.9MB ± 0%       43.9MB ± 0%  -0.07%  (p=0.000 n=20+19)
LinkCompiler                    205MB ± 0%        205MB ± 0%  +0.44%  (p=0.000 n=20+18)
ExternalLinkCompiler            223MB ± 0%        223MB ± 0%  +0.03%  (p=0.000 n=20+20)
LinkWithoutDebugCompiler        139MB ± 0%        142MB ± 0%  +1.75%  (p=0.000 n=20+20)
[Geo mean]                     93.7MB            93.9MB       +0.20%

name                      old allocs/op     new allocs/op     delta
Template                         363k ± 0%         361k ± 0%  -0.58%  (p=0.000 n=20+19)
Unicode                          329k ± 0%         329k ± 0%  -0.06%  (p=0.000 n=19+20)
GoTypes                         1.28M ± 0%        1.28M ± 0%  -0.01%  (p=0.000 n=20+20)
Compiler                        5.40M ± 0%        5.40M ± 0%  -0.01%  (p=0.000 n=20+20)
SSA                             12.7M ± 0%        12.8M ± 0%  +0.80%  (p=0.000 n=20+20)
Flate                            228k ± 0%         228k ± 0%    ~     (p=0.194 n=20+20)
GoParser                         295k ± 0%         295k ± 0%  -0.04%  (p=0.000 n=20+20)
Reflect                          949k ± 0%         949k ± 0%  -0.01%  (p=0.000 n=20+20)
Tar                              337k ± 0%         337k ± 0%  -0.06%  (p=0.000 n=20+20)
XML                              418k ± 0%         417k ± 0%  -0.17%  (p=0.000 n=20+20)
LinkCompiler                     553k ± 0%         554k ± 0%  +0.22%  (p=0.000 n=20+19)
ExternalLinkCompiler            1.52M ± 0%        1.52M ± 0%  +0.27%  (p=0.000 n=20+20)
LinkWithoutDebugCompiler         186k ± 0%         186k ± 0%  +0.06%  (p=0.000 n=20+20)
[Geo mean]                       723k              723k       +0.03%

name                      old text-bytes    new text-bytes    delta
HelloSize                       828kB ± 0%        828kB ± 0%  -0.01%  (p=0.000 n=20+20)

name                      old data-bytes    new data-bytes    delta
HelloSize                      13.4kB ± 0%       13.4kB ± 0%    ~     (all equal)

name                      old bss-bytes     new bss-bytes     delta
HelloSize                       180kB ± 0%        180kB ± 0%    ~     (all equal)

name                      old exe-bytes     new exe-bytes     delta
HelloSize                      1.23MB ± 0%       1.23MB ± 0%  -0.33%  (p=0.000 n=20+20)

file      before    after     Δ       %
addr2line 4320075   4311883   -8192   -0.190%
asm       5191932   5187836   -4096   -0.079%
buildid   2835338   2831242   -4096   -0.144%
compile   20531717  20569099  +37382  +0.182%
cover     5322511   5318415   -4096   -0.077%
dist      3723749   3719653   -4096   -0.110%
doc       4743515   4739419   -4096   -0.086%
fix       3413960   3409864   -4096   -0.120%
link      6690119   6686023   -4096   -0.061%
nm        4269616   4265520   -4096   -0.096%
pprof     14942189  14929901  -12288  -0.082%
trace     11807164  11790780  -16384  -0.139%
vet       8384104   8388200   +4096   +0.049%
go        15339076  15334980  -4096   -0.027%
total     132258257 132226007 -32250  -0.024%

Fixes #30645.

Change-Id: If551ac5996097f3685870d083151b5843170aab0
Reviewed-on: https://go-review.googlesource.com/c/go/+/165998
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-03 14:30:26 +00:00
Josh Bleecher Snyder
97a268624c cmd/compile: add -d=ssa/check/seed=SEED
This change adds the option to run the ssa checker with a random seed.
The current system uses a completely fixed seed,
which is good for reproducibility but bad for exploring the state space.

Preserve what we have, but also provide a way for the caller
to provide a seed. The caller can report the seed
alongside any failures.

Change-Id: I2676a8112d8260e6cac86d95d2e8db4d3221aeeb
Reviewed-on: https://go-review.googlesource.com/c/go/+/216418
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-02 22:11:29 +00:00
Giovanni Bajo
19532d04bf cmd/compile: add debugging mode for poset
Add an internal mode to simplify debugging of posets
by checking the integrity after every mutation. Turn
it on within SSA checked builds.

Change-Id: Idaa8277f58e5bce3753702e212cea4d698de30ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/196780
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-10-14 21:00:36 +00:00
David Chase
adc4d2cc2d cmd/compile: run deadcode before nilcheck for better statement relocation
Nilcheck would move statements from NilCheck values to others that
turned out were already dead, which leads to lost statements.  Better
to eliminate the dead code first.

One "error" is removed from test/prove.go because the code is
actually dead, and the additional deadcode pass removes it before
prove can run.

Change-Id: If75926ca1acbb59c7ab9c8ef14d60a02a0a94f8b
Reviewed-on: https://go-review.googlesource.com/c/go/+/198479
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
2019-10-03 21:12:13 +00:00
Josh Bleecher Snyder
52ae04fdfc cmd/compile: improve shortcircuit pass
While working on #30645, I noticed that many instances
in which the walkinrange optimization could apply
were not even being considered.

This was because of extraneous blocks in the CFG,
of the type that shortcircuit normally removes.

The change improves the shortcircuit pass to handle
most of those cases. (There are a few that can only be
reasonably detected later in compilation, after other
optimizations have been run, but not enough to be worth chasing.)

Notable changes:

* Instead of calculating live-across-blocks values, use v.Uses == 1.
  This is cheaper and more straightforward.
  v.Uses did not exist when this pass was initially written.
* Incorporate a fusePlain and loop until stable.
  This is necessary to find many of the instances.
* Allow Copy and Not wrappers around Phi values.
  This significantly increases effectiveness.
* Allow removal of all preds, creating a dead block.
  The previous pass stopped unnecessarily at one pred.
* Use phielimValue during cleanup instead of manually
  setting the op to OpCopy.

The result is marginally faster compilation and smaller code.

name        old time/op       new time/op       delta
Template          213ms ± 2%        212ms ± 2%  -0.63%  (p=0.002 n=49+48)
Unicode          90.0ms ± 2%       89.8ms ± 2%    ~     (p=0.122 n=48+48)
GoTypes           710ms ± 3%        711ms ± 2%    ~     (p=0.433 n=45+49)
Compiler          3.23s ± 2%        3.22s ± 2%    ~     (p=0.124 n=47+49)
SSA               10.0s ± 1%        10.0s ± 1%  -0.43%  (p=0.000 n=48+50)
Flate             135ms ± 3%        135ms ± 2%    ~     (p=0.311 n=49+49)
GoParser          158ms ± 2%        158ms ± 2%    ~     (p=0.757 n=48+48)
Reflect           447ms ± 2%        447ms ± 2%    ~     (p=0.815 n=49+48)
Tar               189ms ± 2%        189ms ± 3%    ~     (p=0.530 n=47+49)
XML               251ms ± 3%        250ms ± 1%  -0.75%  (p=0.002 n=49+48)
[Geo mean]        427ms             426ms       -0.25%

name        old user-time/op  new user-time/op  delta
Template          265ms ± 2%        265ms ± 2%    ~     (p=0.969 n=48+50)
Unicode           119ms ± 6%        119ms ± 6%    ~     (p=0.738 n=50+50)
GoTypes           923ms ± 2%        925ms ± 2%    ~     (p=0.057 n=43+47)
Compiler          4.37s ± 2%        4.37s ± 2%    ~     (p=0.691 n=50+46)
SSA               13.4s ± 1%        13.4s ± 1%    ~     (p=0.282 n=42+49)
Flate             162ms ± 2%        162ms ± 2%    ~     (p=0.774 n=48+50)
GoParser          186ms ± 2%        186ms ± 3%    ~     (p=0.213 n=47+47)
Reflect           572ms ± 2%        573ms ± 3%    ~     (p=0.303 n=50+49)
Tar               240ms ± 3%        240ms ± 2%    ~     (p=0.939 n=46+44)
XML               302ms ± 2%        302ms ± 2%    ~     (p=0.399 n=47+47)
[Geo mean]        540ms             541ms       +0.07%

name        old alloc/op      new alloc/op      delta
Template         36.8MB ± 0%       36.7MB ± 0%  -0.42%  (p=0.008 n=5+5)
Unicode          28.1MB ± 0%       28.1MB ± 0%    ~     (p=0.151 n=5+5)
GoTypes           124MB ± 0%        124MB ± 0%  -0.26%  (p=0.008 n=5+5)
Compiler          571MB ± 0%        566MB ± 0%  -0.84%  (p=0.008 n=5+5)
SSA              1.86GB ± 0%       1.85GB ± 0%  -0.58%  (p=0.008 n=5+5)
Flate            22.8MB ± 0%       22.8MB ± 0%  -0.17%  (p=0.008 n=5+5)
GoParser         27.3MB ± 0%       27.3MB ± 0%  -0.20%  (p=0.008 n=5+5)
Reflect          79.5MB ± 0%       79.3MB ± 0%  -0.20%  (p=0.008 n=5+5)
Tar              34.7MB ± 0%       34.6MB ± 0%  -0.42%  (p=0.008 n=5+5)
XML              45.4MB ± 0%       45.3MB ± 0%  -0.29%  (p=0.008 n=5+5)
[Geo mean]       80.0MB            79.7MB       -0.34%

name        old allocs/op     new allocs/op     delta
Template           378k ± 0%         377k ± 0%  -0.22%  (p=0.008 n=5+5)
Unicode            339k ± 0%         339k ± 0%    ~     (p=0.643 n=5+5)
GoTypes           1.36M ± 0%        1.36M ± 0%  -0.10%  (p=0.008 n=5+5)
Compiler          5.51M ± 0%        5.50M ± 0%  -0.13%  (p=0.008 n=5+5)
SSA               17.5M ± 0%        17.5M ± 0%  -0.14%  (p=0.008 n=5+5)
Flate              234k ± 0%         234k ± 0%  -0.04%  (p=0.008 n=5+5)
GoParser           299k ± 0%         299k ± 0%  -0.05%  (p=0.008 n=5+5)
Reflect            978k ± 0%         979k ± 0%  +0.02%  (p=0.016 n=5+5)
Tar                351k ± 0%         351k ± 0%  -0.04%  (p=0.008 n=5+5)
XML                435k ± 0%         435k ± 0%  -0.11%  (p=0.008 n=5+5)
[Geo mean]         840k              840k       -0.08%

file      before    after     Δ       %
go        14794788  14770212  -24576  -0.166%
addr2line 4203688   4199592   -4096   -0.097%
api       5954056   5941768   -12288  -0.206%
asm       4862704   4846320   -16384  -0.337%
cgo       4778920   4770728   -8192   -0.171%
compile   24001568  23923792  -77776  -0.324%
cover     5198440   5190248   -8192   -0.158%
dist      3595248   3587056   -8192   -0.228%
doc       4618504   4610312   -8192   -0.177%
fix       3337416   3333320   -4096   -0.123%
link      6120408   6116312   -4096   -0.067%
nm        4149064   4140872   -8192   -0.197%
objdump   4555608   4547416   -8192   -0.180%
pprof     14616324  14595844  -20480  -0.140%
test2json 2766328   2762232   -4096   -0.148%
trace     11638844  11622460  -16384  -0.141%
vet       8274936   8258552   -16384  -0.198%
total     132520780 132270972 -249808 -0.189%

Change-Id: Ifcd235a2a6e5f13ed5c93e62523e2ef61321fccf
Reviewed-on: https://go-review.googlesource.com/c/go/+/178197
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27 22:15:13 +00:00
Josh Bleecher Snyder
260e3d0818 cmd/compile: run deadcode before lowered CSE
CSE can make dead values live again.
Running deadcode first avoids that;
it also makes CSE more efficient.

file    before    after     Δ       %       
api     5970616   5966520   -4096   -0.069% 
asm     4867088   4846608   -20480  -0.421% 
compile 23988320  23935072  -53248  -0.222% 
link    6084376   6080280   -4096   -0.067% 
nm      4165736   4161640   -4096   -0.098% 
objdump 4572216   4568120   -4096   -0.090% 
pprof   14452996  14457092  +4096   +0.028% 
trace   11467292  11471388  +4096   +0.036% 
total   132181100 132099180 -81920  -0.062% 

Compiler performance impact is negligible:

name        old alloc/op      new alloc/op      delta
Template         38.8MB ± 0%       38.8MB ± 0%  -0.04%  (p=0.008 n=5+5)
Unicode          28.2MB ± 0%       28.2MB ± 0%    ~     (p=1.000 n=5+5)
GoTypes           131MB ± 0%        131MB ± 0%  -0.14%  (p=0.008 n=5+5)
Compiler          606MB ± 0%        606MB ± 0%  -0.05%  (p=0.008 n=5+5)
SSA              2.14GB ± 0%       2.13GB ± 0%  -0.26%  (p=0.008 n=5+5)
Flate            24.0MB ± 0%       24.0MB ± 0%  -0.18%  (p=0.008 n=5+5)
GoParser         28.8MB ± 0%       28.8MB ± 0%  -0.15%  (p=0.008 n=5+5)
Reflect          83.8MB ± 0%       83.7MB ± 0%  -0.11%  (p=0.008 n=5+5)
Tar              36.4MB ± 0%       36.4MB ± 0%  -0.09%  (p=0.008 n=5+5)
XML              47.9MB ± 0%       47.8MB ± 0%  -0.15%  (p=0.008 n=5+5)
[Geo mean]       84.6MB            84.5MB       -0.12%

name        old allocs/op     new allocs/op     delta
Template           379k ± 0%         380k ± 0%  +0.15%  (p=0.008 n=5+5)
Unicode            340k ± 0%         340k ± 0%    ~     (p=0.738 n=5+5)
GoTypes           1.36M ± 0%        1.36M ± 0%  +0.05%  (p=0.008 n=5+5)
Compiler          5.49M ± 0%        5.49M ± 0%  +0.12%  (p=0.008 n=5+5)
SSA               17.5M ± 0%        17.5M ± 0%  -0.18%  (p=0.008 n=5+5)
Flate              235k ± 0%         235k ± 0%    ~     (p=0.079 n=5+5)
GoParser           302k ± 0%         302k ± 0%    ~     (p=0.310 n=5+5)
Reflect            976k ± 0%         977k ± 0%  +0.08%  (p=0.008 n=5+5)
Tar                352k ± 0%         352k ± 0%  +0.12%  (p=0.008 n=5+5)
XML                436k ± 0%         436k ± 0%  -0.05%  (p=0.008 n=5+5)
[Geo mean]         842k              842k       +0.03%


Change-Id: I53e8faed1859885ca5c4a5d45067a50984f3eff1
Reviewed-on: https://go-review.googlesource.com/c/go/+/175879
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27 21:04:53 +00:00
Josh Bleecher Snyder
0047353c53 cmd/compile: add countRule rewrite rule helper
noteRule is useful when you're trying to debug
a particular rule, or get a general sense for
how often a rule fires overall.

It is less useful if you're trying to figure
out which functions might be useful to benchmark
to ascertain the impact of a newly added rule.

Enter countRule. You use it like noteRule,
except that you get per-function summaries.

Sample output:

 # runtime
(*mspan).sweep: idx1=1
evacuate_faststr: idx1=1
evacuate_fast32: idx1=1
evacuate: idx1=2
evacuate_fast64: idx1=1
sweepone: idx1=1
purgecachedstats: idx1=1
mProf_Free: idx1=1

This suggests that the map benchmarks
might be good to run for this added rule.

Change-Id: Id471c3231f1736165f2020f6979ff01c29677808
Reviewed-on: https://go-review.googlesource.com/c/go/+/167088
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-05-08 22:58:10 +00:00
Josh Bleecher Snyder
49ad7bc67c cmd/compile: note that some rules know the name of the opt pass
Change-Id: I4a70f4a52f84cf50f99939351319504b1c5dff76
Reviewed-on: https://go-review.googlesource.com/c/go/+/175777
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-05-07 17:43:46 +00:00
Josh Bleecher Snyder
c1544ff906 cmd/compile: move phi tighten after critical
The phi tighten pass moves rematerializable phi args
to the immediate predecessor of the phis.
This reduces value lifetimes for regalloc.

However, the critical edge removal pass can introduce
new blocks, which can change what a block's
immediate precedessor is. This can result in tightened
phi args being spilled unnecessarily.

This change moves the phi tighten pass after the
critical edge pass, when the block structure is stable.

This improves the code generated for

func f(s string) bool { return s == "abcde" }

Before this change:

"".f STEXT nosplit size=44 args=0x18 locals=0x0
	0x0000 00000 (x.go:3)	MOVQ	"".s+16(SP), AX
	0x0005 00005 (x.go:3)	CMPQ	AX, $5
	0x0009 00009 (x.go:3)	JNE	40
	0x000b 00011 (x.go:3)	MOVQ	"".s+8(SP), AX
	0x0010 00016 (x.go:3)	CMPL	(AX), $1684234849
	0x0016 00022 (x.go:3)	JNE	36
	0x0018 00024 (x.go:3)	CMPB	4(AX), $101
	0x001c 00028 (x.go:3)	SETEQ	AL
	0x001f 00031 (x.go:3)	MOVB	AL, "".~r1+24(SP)
	0x0023 00035 (x.go:3)	RET
	0x0024 00036 (x.go:3)	XORL	AX, AX
	0x0026 00038 (x.go:3)	JMP	31
	0x0028 00040 (x.go:3)	XORL	AX, AX
	0x002a 00042 (x.go:3)	JMP	31

Observe the duplicated blocks at the end.
After this change:

"".f STEXT nosplit size=40 args=0x18 locals=0x0
	0x0000 00000 (x.go:3)	MOVQ	"".s+16(SP), AX
	0x0005 00005 (x.go:3)	CMPQ	AX, $5
	0x0009 00009 (x.go:3)	JNE	36
	0x000b 00011 (x.go:3)	MOVQ	"".s+8(SP), AX
	0x0010 00016 (x.go:3)	CMPL	(AX), $1684234849
	0x0016 00022 (x.go:3)	JNE	36
	0x0018 00024 (x.go:3)	CMPB	4(AX), $101
	0x001c 00028 (x.go:3)	SETEQ	AL
	0x001f 00031 (x.go:3)	MOVB	AL, "".~r1+24(SP)
	0x0023 00035 (x.go:3)	RET
	0x0024 00036 (x.go:3)	XORL	AX, AX
	0x0026 00038 (x.go:3)	JMP	31

Change-Id: I12c81aa53b89456cb5809aa5396378245f3beda9
Reviewed-on: https://go-review.googlesource.com/c/go/+/172597
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-04-19 15:39:53 +00:00
Daniel Martí
be9c534cde cmd/compile: don't crash on -d=ssa/
I forgot how to pull up the ssa debug options help, so instead of
writing -d=ssa/help, I just wrote -d=ssa/. Much to my amusement, the
compiler just crashed, as shown below. Fix that.

	panic: runtime error: index out of range

	goroutine 1 [running]:
	cmd/compile/internal/ssa.PhaseOption(0x7ffc375d2b70, 0x0, 0xdbff91, 0x5, 0x1, 0x0, 0x0, 0x1, 0x1)
	    /home/mvdan/tip/src/cmd/compile/internal/ssa/compile.go:327 +0x1876
	cmd/compile/internal/gc.Main(0xde7bd8)
	    /home/mvdan/tip/src/cmd/compile/internal/gc/main.go:411 +0x41d0
	main.main()
	    /home/mvdan/tip/src/cmd/compile/main.go:51 +0xab

Change-Id: Ia2ad394382ddf8f4498b16b5cfb49be0317fc1aa
Reviewed-on: https://go-review.googlesource.com/c/154421
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-02-26 17:50:01 +00:00
Keith Randall
2b4f24a2d2 cmd/compile: randomize value order in block for testing
A little bit of compiler stress testing. Randomize the order
of the values in a block before every phase. This randomization
makes sure that we're not implicitly depending on that order.

Currently the random seed is a hash of the function name.
It provides determinism, but sacrifices some coverage.
Other arrangements are possible (env var, ...) but require
more setup.

Fixes #20178

Change-Id: Idae792a23264bd9a3507db6ba49b6d591a608e83
Reviewed-on: https://go-review.googlesource.com/c/33909
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-11-28 17:13:46 +00:00
David Chase
04105ef1da cmd/compile: decompose composite OpArg before decomposeUser
This makes it easier to track names of function arguments
for debugging purposes.

Change-Id: Ic34856fe0b910005e1c7bc051d769d489a4b158e
Reviewed-on: https://go-review.googlesource.com/c/150098
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-11-23 22:59:32 +00:00
Josh Bleecher Snyder
a55f3ee46d cmd/compile: fuse before branchelim
The branchelim pass works better after fuse.
Running fuse before branchelim also increases
the stability of generated code amidst other compiler changes,
which was the original motivation behind this change.

The fuse pass is not cheap enough to run in its entirety
before branchelim, but the most important half of it is.
This change makes it possible to run "plain fuse" independently
and does so before branchelim.

During make.bash, elimIf occurrences increase from 4244 to 4288 (1%),
and elimIfElse occurrences increase from 989 to 1079 (9%).

Toolspeed impact is marginal; plain fuse pays for itself.

name        old time/op       new time/op       delta
Template          189ms ± 2%        189ms ± 2%    ~     (p=0.890 n=45+46)
Unicode          93.2ms ± 5%       93.4ms ± 7%    ~     (p=0.790 n=48+48)
GoTypes           662ms ± 4%        660ms ± 4%    ~     (p=0.186 n=48+49)
Compiler          2.89s ± 4%        2.91s ± 3%  +0.89%  (p=0.050 n=49+44)
SSA               8.23s ± 2%        8.21s ± 1%    ~     (p=0.165 n=46+44)
Flate             123ms ± 4%        123ms ± 3%  +0.58%  (p=0.031 n=47+49)
GoParser          154ms ± 4%        154ms ± 4%    ~     (p=0.492 n=49+48)
Reflect           430ms ± 4%        429ms ± 4%    ~     (p=1.000 n=48+48)
Tar               171ms ± 3%        170ms ± 4%    ~     (p=0.122 n=48+48)
XML               232ms ± 3%        232ms ± 2%    ~     (p=0.850 n=46+49)
[Geo mean]        394ms             394ms       +0.02%

name        old user-time/op  new user-time/op  delta
Template          236ms ± 5%        236ms ± 4%    ~     (p=0.934 n=50+50)
Unicode           132ms ± 7%        130ms ± 9%    ~     (p=0.087 n=50+50)
GoTypes           861ms ± 3%        867ms ± 4%    ~     (p=0.124 n=48+50)
Compiler          3.93s ± 4%        3.94s ± 3%    ~     (p=0.584 n=49+44)
SSA               12.2s ± 2%        12.3s ± 1%    ~     (p=0.610 n=46+45)
Flate             149ms ± 4%        150ms ± 4%    ~     (p=0.194 n=48+49)
GoParser          193ms ± 5%        191ms ± 6%    ~     (p=0.239 n=49+50)
Reflect           553ms ± 5%        556ms ± 5%    ~     (p=0.091 n=49+49)
Tar               218ms ± 5%        218ms ± 5%    ~     (p=0.359 n=49+50)
XML               299ms ± 5%        298ms ± 4%    ~     (p=0.482 n=50+49)
[Geo mean]        516ms             516ms       -0.01%

name        old alloc/op      new alloc/op      delta
Template         36.3MB ± 0%       36.3MB ± 0%  -0.02%  (p=0.000 n=49+49)
Unicode          29.7MB ± 0%       29.7MB ± 0%    ~     (p=0.270 n=50+50)
GoTypes           126MB ± 0%        126MB ± 0%  -0.34%  (p=0.000 n=50+49)
Compiler          534MB ± 0%        531MB ± 0%  -0.50%  (p=0.000 n=50+50)
SSA              1.98GB ± 0%       1.98GB ± 0%  -0.06%  (p=0.000 n=49+49)
Flate            24.6MB ± 0%       24.6MB ± 0%  -0.29%  (p=0.000 n=50+50)
GoParser         29.5MB ± 0%       29.4MB ± 0%  -0.15%  (p=0.000 n=49+50)
Reflect          87.3MB ± 0%       87.2MB ± 0%  -0.13%  (p=0.000 n=49+50)
Tar              35.6MB ± 0%       35.5MB ± 0%  -0.17%  (p=0.000 n=50+50)
XML              48.2MB ± 0%       48.0MB ± 0%  -0.30%  (p=0.000 n=48+50)
[Geo mean]       83.1MB            82.9MB       -0.20%

name        old allocs/op     new allocs/op     delta
Template           352k ± 0%         352k ± 0%  -0.01%  (p=0.004 n=49+49)
Unicode            341k ± 0%         341k ± 0%    ~     (p=0.341 n=48+50)
GoTypes           1.28M ± 0%        1.28M ± 0%  -0.03%  (p=0.000 n=50+49)
Compiler          4.96M ± 0%        4.96M ± 0%  -0.05%  (p=0.000 n=50+49)
SSA               15.5M ± 0%        15.5M ± 0%  -0.01%  (p=0.000 n=50+49)
Flate              233k ± 0%         233k ± 0%  +0.01%  (p=0.032 n=49+49)
GoParser           294k ± 0%         294k ± 0%    ~     (p=0.052 n=46+48)
Reflect           1.04M ± 0%        1.04M ± 0%    ~     (p=0.171 n=50+47)
Tar                343k ± 0%         343k ± 0%  -0.03%  (p=0.000 n=50+50)
XML                429k ± 0%         429k ± 0%  -0.04%  (p=0.000 n=50+50)
[Geo mean]         812k              812k       -0.02%

Object files grow slightly; branchelim often increases binary size, at least on amd64.

name        old object-bytes  new object-bytes  delta
Template          509kB ± 0%        509kB ± 0%  -0.01%  (p=0.008 n=5+5)
Unicode           224kB ± 0%        224kB ± 0%    ~     (all equal)
GoTypes          1.84MB ± 0%       1.84MB ± 0%  +0.00%  (p=0.008 n=5+5)
Compiler         6.71MB ± 0%       6.71MB ± 0%  +0.01%  (p=0.008 n=5+5)
SSA              21.2MB ± 0%       21.2MB ± 0%  +0.01%  (p=0.008 n=5+5)
Flate             324kB ± 0%        324kB ± 0%  -0.00%  (p=0.008 n=5+5)
GoParser          404kB ± 0%        404kB ± 0%  -0.02%  (p=0.008 n=5+5)
Reflect          1.40MB ± 0%       1.40MB ± 0%  +0.09%  (p=0.008 n=5+5)
Tar               452kB ± 0%        452kB ± 0%  +0.06%  (p=0.008 n=5+5)
XML               596kB ± 0%        596kB ± 0%  +0.00%  (p=0.008 n=5+5)
[Geo mean]       1.04MB            1.04MB       +0.01%

Change-Id: I535c711b85380ff657fc0f022bebd9cb14ddd07f
Reviewed-on: https://go-review.googlesource.com/c/129378
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-15 16:51:08 +00:00
Yury Smolsky
c35069d642 cmd/compile: clean the output of GOSSAFUNC
Since we print almost everything to ssa.html in the GOSSAFUNC mode,
there is a need to stop spamming stdout when user just wants to see
ssa.html.

This changes cleans output of the GOSSAFUNC debug mode.
To enable the dump of the debug data to stdout, one must
put suffix + after the function name like that:

GOSSAFUNC=Foo+

Otherwise gc will not print the IR and ASM to stdout after each phase.
AST IR is still sent to stdout because it is not included
into ssa.html. It will be fixed in a separate change.

The change adds printing out the full path to the ssa.html file.

Updates #25942

Change-Id: I711e145e05f0443c7df5459ca528dced273a62ee
Reviewed-on: https://go-review.googlesource.com/126603
Run-TryBot: Yury Smolsky <yury@smolsky.by>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-08-23 05:11:17 +00:00
Cherry Zhang
3f54e8537a cmd/compile: run generic deadcode in -N mode
Late opt pass may generate dead stores, which messes up store
chain calculation in later passes. Run generic deadcode even
in -N mode to remove them.

Fixes #26163.

Change-Id: I8276101717bb978d5980e6c7998f53fd8d0ae10f
Reviewed-on: https://go-review.googlesource.com/121856
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-07-02 20:53:23 +00:00
Yury Smolsky
5eb98b3c30 cmd/compile: use expandable columns in ssa.html
Display just a few columns in ssa.html, other
columns can be expanded by clicking on collapsed column.

Use sans serif font for the text, slightly smaller font size
for non program text.

Fixes #25286

Change-Id: I1094695135401602d90b97b69e42f6dda05871a2
Reviewed-on: https://go-review.googlesource.com/117275
Run-TryBot: Yury Smolsky <yury@smolsky.by>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-06-13 21:02:54 +00:00
David Chase
c2c1822b12 cmd/compile: assign and preserve statement boundaries.
A new pass run after ssa building (before any other
optimization) identifies the "first" ssa node for each
statement. Other "noise" nodes are tagged as being never
appropriate for a statement boundary (e.g., VarKill, VarDef,
Phi).

Rewrite, deadcode, cse, and nilcheck are modified to move
the statement boundaries forward whenever possible if a
boundary-tagged ssa value is removed; never-boundary nodes
are ignored in this search (some operations involving
constants are also tagged as never-boundary and also ignored
because they are likely to be moved or removed during
optimization).

Code generation treats all nodes except those explicitly
marked as statement boundaries as "not statement" nodes,
and floats statement boundaries to the beginning of each
same-line run of instructions found within a basic block.

Line number html conversion was modified to make statement
boundary nodes a bit more obvious by prepending a "+".

The code in fuse.go that glued together the value slices
of two blocks produced a result that depended on the
former capacities (not lengths) of the two slices.  This
causes differences in the 386 bootstrap, and also can
sometimes put values into an order that does a worse job
of preserving statement boundaries when values are removed.

Portions of two delve tests that had caught problems were
incorporated into ssa/debug_test.go.  There are some
opportunities to do better with optimized code, but the
next-ing is not lying or overly jumpy.

Over 4 CLs, compilebench geomean measured binary size
increase of 3.5% and compile user time increase of 3.8%
(this is after optimization to reuse a sparse map instead
of creating multiple maps.)

This CL worsens the optimized-debugging experience with
Delve; we need to work with the delve team so that
they can use the is_stmt marks that we're emitting now.

The reference output changes from time to time depending
on other changes in the compiler, sometimes better,
sometimes worse.

This CL now includes a test ensuring that 99+% of the lines
in the Go command itself (a handy optimized binary) include
is_stmt markers.

Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a
Reviewed-on: https://go-review.googlesource.com/102435
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-05-14 14:09:49 +00:00