Fixes a performance regression due to CL 418554.
Fixes#56440
Change-Id: I6ff152e9b83084756363f49ee6b0844a7a284880
Reviewed-on: https://go-review.googlesource.com/c/go/+/445875
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
The gen folder was renamed to _gen in CL 435472, but references in code
and docs were not updated. This updates the references.
Change-Id: Ibadc0cdcb5bed145c3257b58465a8df370487ae5
Reviewed-on: https://go-review.googlesource.com/c/go/+/444355
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Johan Brandhorst-Satzkorn <johan.brandhorst@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
The type of the source and destination of a memmove call isn't
always accurate. It will always be a pointer (or an unsafe.Pointer), but
the base type might not be accurate. This comes about because multiple
copies of a pointer with different base types are coalesced into a single value.
In the failing example, the IData selector of the input argument is a
*[32]byte in one branch of the type switch, and a *[]byte in the other branch.
During the expand_calls pass both IDatas become just copies of the input
register. Those copies are deduped and an arbitrary one wins (in this case,
*[]byte is the unfortunate winner).
Generally an op v can rely on v.Type during rewrite rules. But relying
on v.Args[i].Type is discouraged.
Fixes#55122
Change-Id: I348fd9accf2058a87cd191eec01d39cda612f120
Reviewed-on: https://go-review.googlesource.com/c/go/+/431496
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Rotating by c, then by d, is the same as rotating by c+d.
Change-Id: I36df82261460ff80f7c6d39bcdf0e840cef1c91a
Reviewed-on: https://go-review.googlesource.com/c/go/+/424894
Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Ruinan Sun <Ruinan.Sun@arm.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Detect rotate instructions while still in architecture-independent form.
It's easier to do here, and we don't need to repeat it in each
architecture file.
Change-Id: I9396954b3f3b3bfb96c160d064a02002309935bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/421195
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Eric Fang <eric.fang@arm.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
Reviewed-by: Ruinan Sun <Ruinan.Sun@arm.com>
Run-TryBot: Keith Randall <khr@golang.org>
We don't need a multiply when the element type is size 0 or 1.
The panic functions don't return, so we don't need any post-call
code (register restores, etc.).
Change-Id: I0dcea5df56d29d7be26554ddca966b3903c672e5
Reviewed-on: https://go-review.googlesource.com/c/go/+/419754
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
The x + (-y) => x - y rule is hitted 75 times while building stage 3 and tools
and make the linux-amd64 go binary 0.007% smaller.
It transform:
NEG AX
ADD BX, AX
Into:
SUB BX, AX
Which is 2X faster (assuming this assembly in a vacum).
The x ^ (-1) => ^x rule is not hitted in the toolchain.
It transforms:
XOR $-1, AX
Into:
NOT AX
Which is more compact as it doesn't encode the immediate.
Cache usage aside, this does not affect performance
(assuming this assembly in a vacum).
On my ryzen 3600, with some surrouding code, this randomly might be 2X faster,
I guess this has to do with loading the immediate into a temporary register.
Combined to an other rule that already exists it also rewrite manual two's
complement negation from:
XOR $-1, AX
INC AX
Into:
NEG AX
Which is 2X faster.
The other rules just eliminates similar trivial cases and help constants
folding.
This should generalise to other architectures.
Change-Id: Ia1e51b340622e7ed88e5d856f3b1aa424aa039de
GitHub-Last-Rev: ce35ff2efd
GitHub-Pull-Request: golang/go#52395
Reviewed-on: https://go-review.googlesource.com/c/go/+/400714
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
This CL adds late expanded memequal(x, const, sz) inlining for 2, 4, 8
bytes size. This PoC is using the same method as CL 248404.
This optimization fires about 100 times in Go compiler (1675 occurrences
reduced to 1574, so -6%).
Also, added unit-tests to codegen/comparisions.go file.
Updates #37275
Change-Id: Ia52808d573cb706d1da8166c5746ede26f46c5da
Reviewed-on: https://go-review.googlesource.com/c/go/+/328291
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Trust: David Chase <drchase@google.com>
The most common cases:
len(s) > 0
len(s) < 1
and they can be simplified to:
len(s) != 0
len(s) == 0
Fixes#48054
Change-Id: I16e5b0cffcfab62a4acc2a09977a6cd3543dd000
Reviewed-on: https://go-review.googlesource.com/c/go/+/346050
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
In external linking mode the external linker may insert
trampolines, which use R12 as a scratch register. So a call could
potentially clobber R12 if the target is laid out too far. Mark
R12 clobbered.
Also, we will use R12 for trampolines in the Go linker as well.
CL 310731 updated the generated rewrite files so imports are
grouped, but the generator was not updated to do so. Grouped
imports are nice. But as those are generated files, for
simplicity and my laziness, just regenerate with the current
generator (which makes imports not grouped).
Change-Id: Iddb741ff7314a291ade5fbffc7d315f555808409
Reviewed-on: https://go-review.googlesource.com/c/go/+/314453
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
The go/build package needs access to this configuration,
so move it into a new package available to the standard library.
Change-Id: I868a94148b52350c76116451f4ad9191246adcff
Reviewed-on: https://go-review.googlesource.com/c/go/+/310731
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Jay Conrod <jayconrod@google.com>
The rule that inlines memmove expects SSA ops that calls memmove
with arguments in memory. This CL adds a version that matches
it with arguments in registers, so the optimization works for
both situations.
Change-Id: Ideb64f65b7521481ab2ca7c9975a6cf7b70d5966
Reviewed-on: https://go-review.googlesource.com/c/go/+/309332
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: David Chase <drchase@google.com>
If I change a rule in ARM64.rules to use the variable name "b" in a
conflicting way, rulegen would previously not complain, and the compiler
would later give a confusing error:
$ go run *.go && go build cmd/compile/internal/ssa
# cmd/compile/internal/ssa
../rewriteARM64.go:24236:10: b.NewValue0 undefined (type int64 has no field or method NewValue0)
Make rulegen complain early about those cases. Sometimes they might
happen to be harmless, but in general they can easily cause confusion or
unintended effect due to shadowing.
After the change, with the same conflicting rule:
$ go run *.go && go build cmd/compile/internal/ssa
2021/03/22 11:31:49 rule ARM64.rules:495 uses the reserved name b
exit status 1
Note that 24 existing rules were using reserved names. It seems like the
shadowing was harmless, as it wasn't causing typechecking issues nor did
it seem to cause unintended behavior when the rule rewrite code ran.
The bool values "b" were renamed "t", since that seems to have a
precedent in other rules and in the fmt package.
Sequential values like "a b c" were renamed to "x y z", since "b" is
reserved.
Finally, "typ" was renamed to "_typ", since there doesn't seem to be an
obviously better answer.
Passes all three of:
$ GOARCH=amd64 go build -toolexec 'toolstash -cmp' -a std
$ GOARCH=arm64 go build -toolexec 'toolstash -cmp' -a std
$ GOARCH=mips64 go build -toolexec 'toolstash -cmp' -a std
Fixes#45154.
Change-Id: I1cce194dc7b477886a9c218c17973e996bcedccf
Reviewed-on: https://go-review.googlesource.com/c/go/+/303549
Trust: Daniel Martí <mvdan@mvdan.cc>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Add generic rule to rewrite the single-precision square root expression
with one single-precision instruction. The optimization will reduce two
times of precision converting between double-precision and single-precision.
On arm64 flatform.
previous:
FCVTSD F0, F0
FSQRTD F0, F0
FCVTDS F0, F0
optimized:
FSQRTS S0, S0
And this patch adds the test case to check the correctness.
This patch refers to CL 241877, contributed by Alice Xu
(dianhong.xu@arm.com)
Change-Id: I6de5d02281c693017ac4bd4c10963dd55989bd7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/276873
Trust: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
StaticLECall (multiple value in +mem, multiple value result +mem) ->
StaticCall (multiple ergister value in +mem,
multiple register-sized-value result +mem) ->
ARCH CallStatic (multiple ergister value in +mem,
multiple register-sized-value result +mem)
But the architecture-dependent stuff is indifferent to whether
it is mem->mem or (mem)->(mem) until Prog generation.
Deal with OpSelectN -> Prog in ssagen/ssa.go, others, as they
appear.
For #40724.
Change-Id: I1d0436f6371054f1881862641d8e7e418e4a6a16
Reviewed-on: https://go-review.googlesource.com/c/go/+/293391
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Current optimization: When we copy a->b and then b->c, we might as well
copy a->c instead of b->c (then b might be dead and go away).
*Except* if a is a volatile location (might be clobbered by a call).
In that case, we really do want to copy a immediately, because there
might be a call before we can do the a->c copy.
User calls can't happen in between, because the rule matches up the
memory states. But calls inserted for memory barriers, particularly
runtime.typedmemmove, can.
(I guess we could introduce a register-calling-convention version
of runtime.typedmemmove, but that seems a bigger change than this one.)
Fixes#43570
Change-Id: Ifa518bb1a6f3a8dd46c352d4fd54ea9713b3eb1a
Reviewed-on: https://go-review.googlesource.com/c/go/+/282492
Trust: Keith Randall <khr@golang.org>
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Proof of concept; also an actual optimization that fires 180 times
in the Go source base.
Change-Id: I5cb87474be764264cde6e4cbcb471ef109306f08
Reviewed-on: https://go-review.googlesource.com/c/go/+/248404
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Rulegen was not run again between patchsets 2 and 3 of CL 264683, so
the rewritegeneric.go file is out of sync. Run rulegen.
Change-Id: I67d8d267c4f666943ed9bf8c33aac2ac6013336a
Reviewed-on: https://go-review.googlesource.com/c/go/+/266080
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Rémy Oudompheng <remyoudompheng@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
This is done by decomposing the number to be divided in 32-bit
components and using the 32-bit magic multiply. For the lowering to be
effective the constant must fit in 16 bits.
On ARM the expression n / 5 compiles to 25 instructions.
Benchmark for GOARCH=arm (Cortex-A53)
name old time/op new time/op delta
DivconstU64/3-6 1.19µs ± 0% 0.03µs ± 1% -97.40% (p=0.000 n=9+9)
DivconstU64/5-6 1.18µs ± 1% 0.03µs ± 1% -97.38% (p=0.000 n=10+8)
DivconstU64/37-6 1.13µs ± 1% 0.04µs ± 1% -96.51% (p=0.000 n=10+8)
DivconstU64/1234567-6 852ns ± 0% 901ns ± 1% +5.73% (p=0.000 n=8+9)
Benchmark for GOARCH=386 (Haswell)
name old time/op new time/op delta
DivconstU64/3-4 18.0ns ± 2% 5.6ns ± 1% -69.06% (p=0.000 n=10+10)
DivconstU64/5-4 17.8ns ± 1% 5.5ns ± 1% -68.87% (p=0.000 n=9+10)
DivconstU64/37-4 17.8ns ± 1% 7.3ns ± 0% -58.90% (p=0.000 n=10+10)
DivconstU64/1234567-4 17.5ns ± 1% 16.0ns ± 0% -8.55% (p=0.000 n=10+9)
Change-Id: I38a19b4d59093ec021ef2e5241364a3dad4eae73
Reviewed-on: https://go-review.googlesource.com/c/go/+/264683
Run-TryBot: Emmanuel Odeke <emmanuel@orijtech.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
Repeats existing patterns for old calls, so that these will apply
during the optimization phases that precede call expansion.
Change-Id: I1ca0a78c159aa1a51004db217edde4ecc772b646
Reviewed-on: https://go-review.googlesource.com/c/go/+/248190
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Includes a few tweaks to Value.copyOf(a) (make it a no-op for
a self-copy) and new pattern hack "___" (3 underscores) is
like ellipsis, except the replacement doesn't need to have
matching ellipsis/underscores.
Moved the arg-length check in generated pattern-matching code
BEFORE the args are probed, because not all instances of
variable length OpFoo will have all the args mentioned in
some rule for OpFoo, and when that happens, the compiler
panics without the early check.
Change-Id: I66de40672b3794a6427890ff96c805a488d783f4
Reviewed-on: https://go-review.googlesource.com/c/go/+/247537
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Extend use of AuxCall
Change-Id: I68b6d9bad09506532e1415fd70d44cf6c15b4b93
Reviewed-on: https://go-review.googlesource.com/c/go/+/239081
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This is prerequisite to moving call expansion later into SSA,
and probably a good idea anyway. Passes tests.
This is the first minimal CL that does a 1-for-1 substitution
of *ssa.AuxCall for *obj.LSym. Next step (next CL) is to make
this change for all calls so that additional information can
be stored in AuxCall.
Change-Id: Ia3a7715648fd9fb1a176850767a726e6f5b959eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/237680
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
With this patch, opt pass can expose more obvious constant-folding
opportunites.
Example:
func test(i int) int {return (i+8)-(i+4)}
The previous version:
MOVD "".i(FP), R0
ADD $8, R0, R1
ADD $4, R0, R0
SUB R0, R1, R0
MOVD R0, "".~r1+8(FP)
RET (R30)
The optimized version:
MOVD $4, R0
MOVD R0, "".~r1+8(FP)
RET (R30)
This patch removes some existing reassociation rules, such as "x+(z-C)",
because the current generic rewrite rules will canonicalize "x-const"
to "x+(-const)", making "x+(z-C)" equal to "x+(z+(-C))".
This patch also adds test cases.
Change-Id: I857108ba0b5fcc18a879eeab38e2551bc4277797
Reviewed-on: https://go-review.googlesource.com/c/go/+/237137
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
There are some architecture-independent rules in #21439, since an
unsigned integer >= 0 is always true and < 0 is always false. This CL
adds these optimizations to generic rules.
Updates #21439
Change-Id: Iec7e3040b761ecb1e60908f764815fdd9bc62495
Reviewed-on: https://go-review.googlesource.com/c/go/+/246617
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This triggers in 131 functions in std+cmd.
In those functions, it often helps considerably
(2-10% text size reduction).
Noticed while working on #38554.
Change-Id: Id0dbb8e7cb21d469ec08ec3d5be9beb9e8291e9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/229707
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
We set up static symbols during walk that
we later make copies of to initialize local variables.
It is difficult to ascertain at that time exactly
when copying a symbol is profitable vs locally
initializing an autotmp.
During SSA, we are much better placed to optimize.
This change recognizes when we are copying from a
global readonly all-zero symbol and replaces it with
direct zeroing.
This often allows the all-zero symbol to be
deadcode eliminated at link time.
This is not ideal--it makes for large object files,
and longer link times--but it is the cleanest fix I could find.
This makes the final binary for the program in #38554
shrink from >500mb to ~2.2mb.
It also shrinks the standard binaries:
file before after Δ %
addr2line 4412496 4404304 -8192 -0.186%
buildid 2893816 2889720 -4096 -0.142%
cgo 4841048 4832856 -8192 -0.169%
compile 19926480 19922432 -4048 -0.020%
cover 5281816 5277720 -4096 -0.078%
link 6734648 6730552 -4096 -0.061%
nm 4366240 4358048 -8192 -0.188%
objdump 4755968 4747776 -8192 -0.172%
pprof 14653060 14612100 -40960 -0.280%
trace 11805940 11777268 -28672 -0.243%
vet 7185560 7181416 -4144 -0.058%
total 113588440 113465560 -122880 -0.108%
And not just by removing unnecessary symbols;
the program text shrinks a bit as well.
Fixes#38554
Change-Id: I8381ae6084ae145a5e0cd9410c451e52c0dc51c8
Reviewed-on: https://go-review.googlesource.com/c/go/+/229704
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Keep track of all expressions encountered while
generating a rewrite result, and re-use them whenever possible.
Named expressions may still be used for clarity when desired.
Change-Id: I640dca108763eb8baeff8f9a4169300af3445b82
Reviewed-on: https://go-review.googlesource.com/c/go/+/229800
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
This is followup of CL 228860, which rewrite shift rules to use typed
aux. That CL introduced nlz* functions, to refactor left shift rules.
While at it, we realize there's a bug in old rules with both right/left
shift rules, but only fix for left shift rules only.
This CL fixes the bug for right shift rules.
Passes toolstash-check.
Change-Id: Id8f2158b1b66c9e87f3fdeaa7ae3e35dc0666f8b
Reviewed-on: https://go-review.googlesource.com/c/go/+/229137
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
This optimization works on any integer with exactly one bit set.
This is identical to being a power of two, except in the
most negative number. Use oneBit instead.
The rule now triggers in a few more places in std+cmd,
in packages encoding/asn1, crypto/elliptic, and
vendor/golang.org/x/crypto/cryptobyte.
This change obviates the need for CL 222479
by doing this optimization consistently in the compiler.
Change-Id: I983c6235290fdc634fda5e11b10f1f8ce041272f
Reviewed-on: https://go-review.googlesource.com/c/go/+/229124
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Surprisingly many rules needed no modification.
Use wrapper functions for aux like we did for auxint.
Simplifies things a bit.
Change-Id: I2e852e77f1585dcb306a976ab9335f1ac5b4a770
Reviewed-on: https://go-review.googlesource.com/c/go/+/227961
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>