Combine (AND m (SRWconst x)) or (SRWconst (AND m x)) when mask m is
and the shift value produce constant which can be encoded into an
RLWINM instruction.
Combine (CLRLSLDI (SRWconst x)) if the combining of the underling rotate
masks produces a constant which can be encoded into RLWINM.
Likewise for (SLDconst (SRWconst x)) and (CLRLSDI (RLWINM x)).
Combine rotate word + and operations which can be encoded as a single
RLWINM/RLWNM instruction.
The most notable performance improvements arise from the crypto
benchmarks below (GOARCH=power8 on a ppc64le/linux):
pkg:golang.org/x/crypto/blowfish goos:linux goarch:ppc64le
ExpandKeyWithSalt 52.2µs ± 0% 47.5µs ± 0% -8.88%
ExpandKey 44.4µs ± 0% 40.3µs ± 0% -9.15%
pkg:golang.org/x/crypto/ssh/internal/bcrypt_pbkdf goos:linux goarch:ppc64le
Key 57.6ms ± 0% 52.3ms ± 0% -9.13%
pkg:golang.org/x/crypto/bcrypt goos:linux goarch:ppc64le
Equal 90.9ms ± 0% 82.6ms ± 0% -9.13%
DefaultCost 91.0ms ± 0% 82.7ms ± 0% -9.12%
Change-Id: I59a0ca29face38f4ab46e37124c32906f216c4ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/260798
Run-TryBot: Carlos Eduardo Seo <carlos.seo@linaro.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Repeats existing patterns for old calls, so that these will apply
during the optimization phases that precede call expansion.
Change-Id: I1ca0a78c159aa1a51004db217edde4ecc772b646
Reviewed-on: https://go-review.googlesource.com/c/go/+/248190
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
A recent change to improve shifts was generating some
invalid cases when the rule was based on an AND. The
extended mnemonics CLRLSLDI and CLRLSLWI only allow
certain values for the operands and in the mask case
those values were not being checked properly. This
adds a check to those rules to verify that the
'b' and 'n' values used when an AND was part of the rule
have correct values.
There was a bug in some diag messages in asm9. The
message expected 3 values but only provided 2. Those are
corrected here also.
The test/codegen/shift.go was updated to add a few more
cases to check for the case mentioned here.
Some of the comments that mention the order of operands
in these extended mnemonics were wrong and those have been
corrected.
Fixes#41683.
Change-Id: If5bb860acaa5051b9e0cd80784b2868b85898c31
Reviewed-on: https://go-review.googlesource.com/c/go/+/258138
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Includes a few tweaks to Value.copyOf(a) (make it a no-op for
a self-copy) and new pattern hack "___" (3 underscores) is
like ellipsis, except the replacement doesn't need to have
matching ellipsis/underscores.
Moved the arg-length check in generated pattern-matching code
BEFORE the args are probed, because not all instances of
variable length OpFoo will have all the args mentioned in
some rule for OpFoo, and when that happens, the compiler
panics without the early check.
Change-Id: I66de40672b3794a6427890ff96c805a488d783f4
Reviewed-on: https://go-review.googlesource.com/c/go/+/247537
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This change adds rules to find pairs of instructions that can
be combined into a single shifts. These instruction sequences
are common in array addressing within loops. Improvements can
be seen in many crypto packages and the hash packages.
These are based on the extended mnemonics found in the ISA
sections C.8.1 and C.8.2.
Some rules in PPC64.rules were moved because the ordering prevented
some matching.
The following results were generated on power9.
hash/crc32:
CRC32/poly=Koopman/size=40/align=0 195ns ± 0% 163ns ± 0% -16.41%
CRC32/poly=Koopman/size=40/align=1 200ns ± 0% 163ns ± 0% -18.50%
CRC32/poly=Koopman/size=512/align=0 1.98µs ± 0% 1.67µs ± 0% -15.46%
CRC32/poly=Koopman/size=512/align=1 1.98µs ± 0% 1.69µs ± 0% -14.80%
CRC32/poly=Koopman/size=1kB/align=0 3.90µs ± 0% 3.31µs ± 0% -15.27%
CRC32/poly=Koopman/size=1kB/align=1 3.85µs ± 0% 3.31µs ± 0% -14.15%
CRC32/poly=Koopman/size=4kB/align=0 15.3µs ± 0% 13.1µs ± 0% -14.22%
CRC32/poly=Koopman/size=4kB/align=1 15.4µs ± 0% 13.1µs ± 0% -14.79%
CRC32/poly=Koopman/size=32kB/align=0 137µs ± 0% 105µs ± 0% -23.56%
CRC32/poly=Koopman/size=32kB/align=1 137µs ± 0% 105µs ± 0% -23.53%
crypto/rc4:
RC4_128 733ns ± 0% 650ns ± 0% -11.32% (p=1.000 n=1+1)
RC4_1K 5.80µs ± 0% 5.17µs ± 0% -10.89% (p=1.000 n=1+1)
RC4_8K 45.7µs ± 0% 40.8µs ± 0% -10.73% (p=1.000 n=1+1)
crypto/sha1:
Hash8Bytes 635ns ± 0% 613ns ± 0% -3.46% (p=1.000 n=1+1)
Hash320Bytes 2.30µs ± 0% 2.18µs ± 0% -5.38% (p=1.000 n=1+1)
Hash1K 5.88µs ± 0% 5.38µs ± 0% -8.62% (p=1.000 n=1+1)
Hash8K 42.0µs ± 0% 37.9µs ± 0% -9.75% (p=1.000 n=1+1)
There are other improvements found in golang.org/x/crypto which are all in the
range of 5-15%.
Change-Id: I193471fbcf674151ffe2edab212799d9b08dfb8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/252097
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Extend use of AuxCall
Change-Id: I68b6d9bad09506532e1415fd70d44cf6c15b4b93
Reviewed-on: https://go-review.googlesource.com/c/go/+/239081
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This is prerequisite to moving call expansion later into SSA,
and probably a good idea anyway. Passes tests.
This is the first minimal CL that does a 1-for-1 substitution
of *ssa.AuxCall for *obj.LSym. Next step (next CL) is to make
this change for all calls so that additional information can
be stored in AuxCall.
Change-Id: Ia3a7715648fd9fb1a176850767a726e6f5b959eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/237680
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The current implementation stores the comparison pseudo-ops of arm64
conditional instructions (CSEL/CSEL0) in Aux, this patch modifies it
and stores it in AuxInt, which can avoid the allocation.
Change-Id: I0b69e51f63acd84c6878c6a59ccf6417501a8cfc
Reviewed-on: https://go-review.googlesource.com/c/go/+/252517
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This helps remove uses that aren't needed any more.
That in turn helps other rules with Uses==1 conditions fire.
Update #39918
Change-Id: I68635b675472f1d59e59604e4d34b949a0016533
Reviewed-on: https://go-review.googlesource.com/c/go/+/249463
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
This patch adds the ARM6464Bitfield auxInt to auxIntType() and
returns its Go type as "arm64Bitfield" type, which is defined
as int16 type.
And the Go type of SymOff auxInt is int32, but some functions
(such as min(), areAdjacentOffsets() and read16/32/64(),etc.)
use SymOff as an input parameter and treat its type as int64,
this patch adds the type conversion for these rules.
Passes toolstash-check -all.
Change-Id: Ib234b48d0a97ef244dd37878e06b5825316dd782
Reviewed-on: https://go-review.googlesource.com/c/go/+/234378
Reviewed-by: Keith Randall <khr@golang.org>
Add a new conversion function to convert aux type to Op type.
Passes toolstash-check -all.
Change-Id: I25d649a5296f6f178d64320dfc5d291e0a597e24
Reviewed-on: https://go-review.googlesource.com/c/go/+/233739
Reviewed-by: Keith Randall <khr@golang.org>
There is an optimization rule that removes calls to racefuncenter and
racefuncexit, if there are no other race calls in the function. The
rule removes the call to racefuncenter, but it does *not* remove the
store of its argument to the outargs section of the frame. If the
outargs section is now size 0 (because the calls to racefuncenter/exit
were the only calls), then that argument store clobbers the frame
pointer instead.
The fix is to remove the argument store when removing the call to
racefuncenter. (Racefuncexit doesn't have an argument.)
Change-Id: I183ec4d92bbb4920200e1be27b7b8f66b89a2a0a
Reviewed-on: https://go-review.googlesource.com/c/go/+/248262
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Fixes the *noov opcodes so they handle a constant argument properly.
Most of the infrastructure for this CL is in CL 238077 (the arm32 one).
Fixes#39505
Change-Id: Id424a4e18964b848f05aa42f4d78e5f2e2cdf43b
Reviewed-on: https://go-review.googlesource.com/c/go/+/237999
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Encode the flag results in an auxint field instead of having
one opcode per flag state. This helps us handle the new *noov
branches in a unified manner.
This is only for arm, arm64 is in a subsequent CL.
We could extend to other architectures as well, athough it would
only be cleanup, no behavioral change.
Update #39505
Change-Id: Ia46cea596faad540d1496c5915ab1274571543f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/238077
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This CL changes the arm64 TBZ/TBNZ block from using Aux to using
a (typed) AuxInt. The corresponding rules have also been changed
to be typed.
Passes
GOARCH=arm64 gotip build -toolexec 'toolstash -cmp' -a std
Change-Id: I98d0cd2a791948f1db13259c17fb1b9b2807a043
Reviewed-on: https://go-review.googlesource.com/c/go/+/230839
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
We set up static symbols during walk that
we later make copies of to initialize local variables.
It is difficult to ascertain at that time exactly
when copying a symbol is profitable vs locally
initializing an autotmp.
During SSA, we are much better placed to optimize.
This change recognizes when we are copying from a
global readonly all-zero symbol and replaces it with
direct zeroing.
This often allows the all-zero symbol to be
deadcode eliminated at link time.
This is not ideal--it makes for large object files,
and longer link times--but it is the cleanest fix I could find.
This makes the final binary for the program in #38554
shrink from >500mb to ~2.2mb.
It also shrinks the standard binaries:
file before after Δ %
addr2line 4412496 4404304 -8192 -0.186%
buildid 2893816 2889720 -4096 -0.142%
cgo 4841048 4832856 -8192 -0.169%
compile 19926480 19922432 -4048 -0.020%
cover 5281816 5277720 -4096 -0.078%
link 6734648 6730552 -4096 -0.061%
nm 4366240 4358048 -8192 -0.188%
objdump 4755968 4747776 -8192 -0.172%
pprof 14653060 14612100 -40960 -0.280%
trace 11805940 11777268 -28672 -0.243%
vet 7185560 7181416 -4144 -0.058%
total 113588440 113465560 -122880 -0.108%
And not just by removing unnecessary symbols;
the program text shrinks a bit as well.
Fixes#38554
Change-Id: I8381ae6084ae145a5e0cd9410c451e52c0dc51c8
Reviewed-on: https://go-review.googlesource.com/c/go/+/229704
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Use nlzX variants instead. While at it, also remove tests involve
nlz/nlo/nto/log2, since when we are calling directly "math/bits"
functions.
Passes toolstash-check.
Change-Id: I83899741a29e05bc2c19d73652961ac795001781
Reviewed-on: https://go-review.googlesource.com/c/go/+/229138
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Convert first section of 386 optimization rules to the typed aux form.
Adds addOffset{32,64} functions that returns ValAndOffs and a
ValAndOff.canAdd32 function that takes an int32.
Passes
GOARCH=386 gotip build -toolexec 'toolstash -cmp' -a std
Change-Id: I69d2a8ace6936d5e8ba6ba047183002bf07dd5be
Reviewed-on: https://go-review.googlesource.com/c/go/+/228825
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This is the second attempt. The first attempt was CL 229127,
which got rolled back by CL 229177, because it caused
an infinite loop during compilation on some platforms.
I didn't notice that the trybots hadn't completed when I submitted; mea culpa.
The bug was that we were checking x&(x-1)==0, which is also true of 0,
which does not have exactly one bit set.
This caused an infinite rewrite rule loop.
Updates #38547
file before after Δ %
compile 19678112 19669808 -8304 -0.042%
total 113143160 113134856 -8304 -0.007%
Change-Id: I417a4f806e1ba61277e31bab2e57dd3f1ac7e835
Reviewed-on: https://go-review.googlesource.com/c/go/+/229197
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
This reverts commit 066c47ca5f.
Reason for revert: This appears to have broken a bunch of builders.
Change-Id: I68b4decf3c1892766e195d8eb018844cdff69443
Reviewed-on: https://go-review.googlesource.com/c/go/+/229177
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
This optimization works on any integer with exactly one bit set.
This is identical to being a power of two, except in the
most negative number. Use oneBit instead.
The rule now triggers in a few more places in std+cmd,
in packages encoding/asn1, crypto/elliptic, and
vendor/golang.org/x/crypto/cryptobyte.
This change obviates the need for CL 222479
by doing this optimization consistently in the compiler.
Change-Id: I983c6235290fdc634fda5e11b10f1f8ce041272f
Reviewed-on: https://go-review.googlesource.com/c/go/+/229124
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This reverts commit 98c32670fd454939794504225dca1d4ec55045d5.
Rolling-forward with trivial format-string fix
cmd/compile: adjust RISCV64 rewrite rules to use typed aux fields
Also add a typed version of mergeSym to rewrite.go to assist with a few
rules that used mergeSym in the untyped-form.
Remove a few extra int32 overflow checks that no longer make sense, as
adding two int8s or int16s should never overflow an int32.
Passes toolstash-check -all.
Original review: https://go-review.googlesource.com/c/go/+/228882
Change-Id: Ib63db4ee1687446f0f3d9f11575a40dd85cbce55
Reviewed-on: https://go-review.googlesource.com/c/go/+/229126
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Also add a typed version of mergeSym to rewrite.go to assist with a few
rules that used mergeSym in the untyped-form.
Remove a few extra int32 overflow checks that no longer make sense, as
adding two int8s or int16s should never overflow an int32.
Passes toolstash-check -all.
Change-Id: I72ddd2b0d9001faa87ad0ab54f500057164661b7
Reviewed-on: https://go-review.googlesource.com/c/go/+/228882
Reviewed-by: Keith Randall <khr@golang.org>
This first pass makes the rules using the condition code mask
(CCMask) and rotate parameters (RotateParams) aux values strongly
typed. This required adding strongly typed aux handling to the
block rulegen.
More CLs like this to follow, but this is probably the most
complex.
Passes toolstash-check -all.
Change-Id: Ie513b07d527f0c1b398d7748331442dcb5f7b17d
Reviewed-on: https://go-review.googlesource.com/c/go/+/228518
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
If -d=ssa/PASS/debug=N is specified (N >= 2) for a rewrite pass
(e.g. lower), when a Value (or Block) is rewritten, print the
Value (or Block) before and after.
For #31915.
Updates #19013.
Change-Id: I80eadd44302ae736bc7daed0ef68529ab7a16776
Reviewed-on: https://go-review.googlesource.com/c/go/+/176718
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Surprisingly many rules needed no modification.
Use wrapper functions for aux like we did for auxint.
Simplifies things a bit.
Change-Id: I2e852e77f1585dcb306a976ab9335f1ac5b4a770
Reviewed-on: https://go-review.googlesource.com/c/go/+/227961
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Move a lot of the constant folding rules to use strongly
typed AuxInt fields.
We need more than a cast to convert AuxInt to, e.g., float32.
Make conversion functions for converting back and forth.
Change-Id: Ia3d95ee3583ee2179a10938e20210a7617358c88
Reviewed-on: https://go-review.googlesource.com/c/go/+/227866
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
For 1.15, unless someone really wants it in 1.14.
A performance-sensitive user thought this would be useful,
though "large" was not well-defined. If 128 is large,
there are 139 static instances of "large" copies in the compiler
itself.
Includes test.
Change-Id: I81f20c62da59d37072429f3a22c1809e6fb2946d
Reviewed-on: https://go-review.googlesource.com/c/go/+/205066
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
zeroUpper32Bits wasn't checking for shift-extension ops. This would not
check shifts that were marking as bounded by prove (normally, shifts
are wrapped in a sequence that ends with an ANDL, and zeroUpper32Bits
would see the ANDL).
This produces no changes on generated output right now, but will be
important once CL196679 lands because many shifts will be marked
as bounded, and lower will stop generating the masking code sequence
around them.
Change-Id: Iaea94acc5b60bb9a5021c9fb7e4a1e2e5244435e
Reviewed-on: https://go-review.googlesource.com/c/go/+/226338
Reviewed-by: Keith Randall <khr@golang.org>
Make sure we don't use the rewrite ptr + (c + x) -> c + (ptr + x), as
that may create an ephemeral out-of-bounds pointer.
I have not seen an actual bug caused by this yet, but we've seen
them in the 386 port so I'm fixing this issue for amd64 as well.
The load-combining rules needed to be reworked somewhat to still
work without the above broken rule.
Update #37881
Change-Id: I8046d170e89e2035195f261535e34ca7d8aca68a
Reviewed-on: https://go-review.googlesource.com/c/go/+/226437
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Retrying CL 222782, with a fix that will hopefully stop the random crashing.
The issue with the previous CL is that it does pointer arithmetic
in a way that may briefly generate an out-of-bounds pointer. If an
interrupt happens to occur in that state, the referenced object may
be collected incorrectly.
Suppose there was code that did s[x+c]. The previous CL had a rule
to the effect of ptr + (x + c) -> c + (ptr + x). But ptr+x is not
guaranteed to point to the same object as ptr. In contrast,
ptr+(x+c) is guaranteed to point to the same object as ptr, because
we would have already checked that x+c is in bounds.
For example, strconv.trim used to have this code:
MOVZX -0x1(BX)(DX*1), BP
CMPL $0x30, AL
After CL 222782, it had this code:
LEAL 0(BX)(DX*1), BP
CMPB $0x30, -0x1(BP)
An interrupt between those last two instructions could see BP pointing
outside the backing store of the slice involved.
It's really hard to actually demonstrate a bug. First, you need to
have an interrupt occur at exactly the right time. Then, there must
be no other pointers to the object in question. Since the interrupted
frame will be scanned conservatively, there can't even be a dead
pointer in another register or on the stack. (In the example above,
a bug can't happen because BX still holds the original pointer.)
Then, the object in question needs to be collected (or at least
scanned?) before the interrupted code continues.
This CL needs to handle load combining somewhat differently than CL 222782
because of the new restriction on arithmetic. That's the only real
difference (other than removing the bad rules) from that old CL.
This bug is also present in the amd64 rewrite rules, and we haven't
seen any crashing as a result. I will fix up that code similarly to
this one in a separate CL.
Update #37881
Change-Id: I5f0d584d9bef4696bfe89a61ef0a27c8d507329f
Reviewed-on: https://go-review.googlesource.com/c/go/+/225798
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Use a separate compiler pass to introduce complicated x86 addressing
modes. Loads in the normal architecture rules (for x86 and all other
platforms) can have constant offsets (AuxInt values) and symbols (Aux
values), but no more.
The complex addressing modes (x+y, x+2*y, etc.) are introduced in a
separate pass that combines loads with LEAQx ops.
Organizing rewrites this way simplifies the number of rewrites
required, as there are lots of different rule orderings that have to
be specified to ensure these complex addressing modes are always found
if they are possible.
Update #36468
Change-Id: I5b4bf7b03a1e731d6dfeb9ef19b376175f3b4b44
Reviewed-on: https://go-review.googlesource.com/c/go/+/217097
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Trying this CL again, with a fixed test that allows platforms
to disagree on the exact behavior of converting NaNs.
We store 32-bit floating point constants in a 64-bit field, by
converting that 32-bit float to 64-bit float to store it, and convert
it back to use it.
That works for *almost* all floating-point constants. The exception is
signaling NaNs. The round trip described above means we can't represent
a 32-bit signaling NaN, because conversions strip the signaling bit.
To fix this issue, just forbid NaNs as floating-point constants in SSA
form. This shouldn't affect any real-world code, as people seldom
constant-propagate NaNs (except in test code).
Additionally, NaNs are somewhat underspecified (which of the many NaNs
do you get when dividing 0/0?), so when cross-compiling there's a
danger of using the compiler machine's NaN regime for some math, and
the target machine's NaN regime for other math. Better to use the
target machine's NaN regime always.
Update #36400
Change-Id: Idf203b688a15abceabbd66ba290d4e9f63619ecb
Reviewed-on: https://go-review.googlesource.com/c/go/+/221790
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
The SSA backend has rules to read the contents of readonly Lsyms.
However, this rule was failing to trigger for many readonly Lsyms.
This is because the readonly attribute that was set on the Node.Name
was not propagated to its Lsym until the dump globals phase, after SSA runs.
To work around this phase ordering problem, introduce Node.SetReadonly,
which sets Node.Name.Readonly and also configures the Lsym
enough that SSA can use it.
This change also fixes a latent problem in the rewrite rule function,
namely that reads past the end of lsym.P were treated as entirely zero,
instead of merely requiring padding with trailing zeros.
This change also adds an amd64 rule needed to fully optimize
the results of this change. It would be better not to need this,
but the zero extension that should handle this for us
gets optimized away too soon (see #36897 for a similar problem).
I have not investigated whether other platforms also need new
rules to take full advantage of the new optimizations.
Compiled code for (interface{})(true) on amd64 goes from:
LEAQ type.bool(SB), AX
MOVBLZX ""..stmp_0(SB), BX
LEAQ runtime.staticbytes(SB), CX
ADDQ CX, BX
to
LEAQ type.bool(SB), AX
LEAQ runtime.staticbytes+1(SB), BX
Prior to this change, the readonly symbol rewrite rules
fired a total of 884 times during make.bash.
Afterwards they fire 1807 times.
file before after Δ %
cgo 4827832 4823736 -4096 -0.085%
compile 24907768 24895656 -12112 -0.049%
fix 3376952 3368760 -8192 -0.243%
pprof 14751700 14747604 -4096 -0.028%
total 120343528 120315032 -28496 -0.024%
Change-Id: I59ea52138276c37840f69e30fb109fd376d579ec
Reviewed-on: https://go-review.googlesource.com/c/go/+/220499
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>