The caller's PC is always available in the frame. We can just
load it when needed, no need to spill.
Change-Id: I9c0a525903e574bb4eec9fe53cbeb8c64321166a
Reviewed-on: https://go-review.googlesource.com/70710
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Add a compiler intrinsic for getcallersp. So we are able to get
rid of the argument (not done in this CL).
Change-Id: Ic38fda1c694f918328659ab44654198fb116668d
Reviewed-on: https://go-review.googlesource.com/69350
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
Combine setcc and store of result into setcc that writes directly to memory.
Triggers 200+ times in go tool.
Fixes#21630
Change-Id: Iafa22607426f4120140c88fae4b9aecb46e0bba8
Reviewed-on: https://go-review.googlesource.com/67950
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
First step towards removing the mandatory argument for
getcallerpc, which solves certain problems for the runtime.
This might also slightly improve performance.
Intrinsic enabled on 386, amd64, amd64p32,
runtime asm implementation removed on those architectures.
Now-superfluous argument remains in getcallerpc signature
(for a future CL; non-386/amd64 asm funcs ignore it).
Added getcallerpc to the "not a real function" test
in dcl.go, that story is a little odd with respect to
unexported functions but that is not this CL.
Fixes#17327.
Change-Id: I5df1ad91f27ee9ac1f0dd88fa48f1329d6306c3e
Reviewed-on: https://go-review.googlesource.com/31851
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Rematerializable ops can be inserted after the flagalloc phase,
they must therefore not clobber flags. This CL adds a check to
ensure this doesn't happen and fixes the instances where it
does currently.
amd64: ADDQconst and ADDLconst were recently changed to be
rematerializable in CL 54393 (only in tip, not 1.9). That change
has been reverted.
s390x: MOVDaddr could clobber flags when using dynamic linking due
to a ADD with immediate instruction. Change the code generation to
use LA/LAY instead.
Fixes#21080.
Change-Id: Ia85c882afa2a820a309e93775354b3169ec6d034
Reviewed-on: https://go-review.googlesource.com/63030
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Currently we only use 1 and 4 as a scale for indexed 4-byte load.
In code generated in #20711 we can use indexed load with scale=8,
to improve performance:
name old time/op new time/op delta
GM-6 108µs ± 0% 95µs ± 0% -12.06% (p=0.000 n=10+10)
So add new ops and combine loadidx1(shift 3..).. into loadidx8,
same for stores.
Change-Id: I5ed1c250ac40960e20606580cf9de221e75b72f1
Reviewed-on: https://go-review.googlesource.com/46134
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
The assembler barfs on large offsets. Make sure that all the
instructions that need to have their offsets in an int32
1) check on any rule that computes offsets for such instructions
2) change their aux fields so the check builder checks it.
The assembler also silently misassembled offsets between 1<<31
and 1<<32. Add a check in the assembler to barf on those as well.
Fixes#21655
Change-Id: Iebf24bf10f9f37b3ea819ceb7d588251c0f46d7d
Reviewed-on: https://go-review.googlesource.com/59630
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
We can rematerialize only ops that have SP or SB as their only argument.
There are some ADDQconst(SP) that can be rematerialized, but are spilled/filled instead,
so mark addconst as rematerializeable. This shaves ~1kb from go tool.
Change-Id: Ib4cf4fe5f2ec9d3d7e5f0f77f1193eba66ca2f08
Reviewed-on: https://go-review.googlesource.com/54393
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Implement int reg <-> fp reg moves on amd64.
If we see a load to int reg followed by an int->fp move, then we can just
load to the fp reg instead. Same for stores.
math.Abs is now:
MOVQ "".x+8(SP), AX
SHLQ $1, AX
SHRQ $1, AX
MOVQ AX, "".~r1+16(SP)
math.Copysign is now:
MOVQ "".x+8(SP), AX
SHLQ $1, AX
SHRQ $1, AX
MOVQ "".y+16(SP), CX
SHRQ $63, CX
SHLQ $63, CX
ORQ CX, AX
MOVQ AX, "".~r2+24(SP)
math.Float64bits is now:
MOVSD "".x+8(SP), X0
MOVSD X0, "".~r1+16(SP)
(it would be nicer to use a non-SSE reg for this, nothing is perfect)
And due to the fix for #21440, the inlined version of these improve as well.
name old time/op new time/op delta
Abs 1.38ns ± 5% 0.89ns ±10% -35.54% (p=0.000 n=10+10)
Copysign 1.56ns ± 7% 1.35ns ± 6% -13.77% (p=0.000 n=9+10)
Fixes#13095
Change-Id: Ibd7f2792412a6668608780b0688a77062e1f1499
Reviewed-on: https://go-review.googlesource.com/58732
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
This prevents unnecessary reg-reg moves during pointer arithmetic.
This change reduces the size of the full hello world binary by 0.4%.
Updates #21572
Change-Id: Ia0427021e5c94545a0dbd83a6801815806e5b12d
Reviewed-on: https://go-review.googlesource.com/58371
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
We can add a constant to loaction in memory with 1 instruction,
as opposed to load+add+store, so add a new op and relevent ssa rules.
Triggers in e. g. encoding/json isValidNumber:
NumberIsValid-6 36.4ns ± 0% 35.2ns ± 1% -3.32% (p=0.000 n=6+10)
Shaves ~2.5 kb from go tool.
Change-Id: I7ba576676c2522432360f77b290cecb9574a93c3
Reviewed-on: https://go-review.googlesource.com/54431
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
ADDSDmem comment said f32 (likely a copy-paste mistake).
Also swap ADDSSmem and ADDSDmem positions in the list to uniform the
list order.
Fixes#21225
Change-Id: I26bb116900c1cf4c4e6faeef613d7318c9c85b98
Reviewed-on: https://go-review.googlesource.com/52071
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Keith Randall <khr@golang.org>
Enhance the one-live-memory-at-a-time check to run during many
more phases of the SSA backend. Also make it work in an interblock
fashion.
Change types.IsMemory to return true for tuples containing a memory type.
Fix trim pass to build the merged phi correctly. Doesn't affect
code but allows the check to pass after trim runs.
Switch the AddTuple* ops to take the memory-containing tuple argument second.
Update #20335
Change-Id: I5b03ef3606b75a9e4f765276bb8b183cdc172b43
Reviewed-on: https://go-review.googlesource.com/43495
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The experiment "clobberdead" clobbers all pointer fields that the
compiler thinks are dead, just before and after every safepoint.
Useful for debugging the generation of live pointer bitmaps.
Helped find the following issues:
Update #15936
Update #16026
Update #16095
Update #18860
Change-Id: Id1d12f86845e3d93bae903d968b1eac61fc461f9
Reviewed-on: https://go-review.googlesource.com/23924
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Popcount instructions on amd64 are not guaranteed to be
present, so we must guard their call. Rewrite rules can't
generate control flow at the moment, so the intrinsifier
needs to generate that code.
name old time/op new time/op delta
OnesCount-8 2.47ns ± 5% 1.04ns ± 2% -57.70% (p=0.000 n=10+10)
OnesCount16-8 1.05ns ± 1% 0.78ns ± 0% -25.56% (p=0.000 n=9+8)
OnesCount32-8 1.63ns ± 5% 1.04ns ± 2% -35.96% (p=0.000 n=10+10)
OnesCount64-8 2.45ns ± 0% 1.04ns ± 1% -57.55% (p=0.000 n=6+10)
Update #18616
Change-Id: I4aff2cc9aa93787898d7b22055fe272a7cf95673
Reviewed-on: https://go-review.googlesource.com/38320
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
Note that this is a redo of an undo of the original buggy CL 38666.
We have lots of rewrite rules that vary only in the fact that
we have 2 versions for the 2 different orderings of various
commuting ops. For example:
(ADDL x (MOVLconst [c])) -> (ADDLconst [c] x)
(ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x)
It can get unwieldly quickly, especially when there is more than
one commuting op in a rule.
Our existing "fix" for this problem is to have rules that
canonicalize the operations first. For example:
(Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x)
Subsequent rules can then assume if there is a constant arg to Eq64,
it will be the first one. This fix kinda works, but it is fragile and
only works when we remember to include the required extra rules.
The fundamental problem is that the rule matcher doesn't
know anything about commuting ops. This CL fixes that fact.
We already have information about which ops commute. (The register
allocator takes advantage of commutivity.) The rule generator now
automatically generates multiple rules for a single source rule when
there are commutative ops in the rule. We can now drop all of our
almost-duplicate source-level rules and the canonicalization rules.
I have some CLs in progress that will be a lot less verbose when
the rule generator handles commutivity for me.
I had to reorganize the load-combining rules a bit. The 8-way OR rules
generated 128 different reorderings, which was causing the generator
to put too much code in the rewrite*.go files (the big ones were going
from 25K lines to 132K lines). Instead I reorganized the rules to
combine pairs of loads at a time. The generated rule files are now
actually a bit (5%) smaller.
Make.bash times are ~unchanged.
Compiler benchmarks are not observably different. Probably because
we don't spend much compiler time in rule matching anyway.
I've also done a pass over all of our ops adding commutative markings
for ops which hadn't had them previously.
Fixes#18292
Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805
Reviewed-on: https://go-review.googlesource.com/38801
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
This reverts commit 041ecb697f.
Reason for revert: Not working on S390x and some 386 archs.
I have a guess why the S390x is failing. No clue on the 386 yet.
Revert until I can figure it out.
Change-Id: I64f1ce78fa6d1037ebe7ee2a8a8107cb4c1db70c
Reviewed-on: https://go-review.googlesource.com/38790
Reviewed-by: Keith Randall <khr@golang.org>
We have lots of rewrite rules that vary only in the fact that
we have 2 versions for the 2 different orderings of various
commuting ops. For example:
(ADDL x (MOVLconst [c])) -> (ADDLconst [c] x)
(ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x)
It can get unwieldly quickly, especially when there is more than
one commuting op in a rule.
Our existing "fix" for this problem is to have rules that
canonicalize the operations first. For example:
(Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x)
Subsequent rules can then assume if there is a constant arg to Eq64,
it will be the first one. This fix kinda works, but it is fragile and
only works when we remember to include the required extra rules.
The fundamental problem is that the rule matcher doesn't
know anything about commuting ops. This CL fixes that fact.
We already have information about which ops commute. (The register
allocator takes advantage of commutivity.) The rule generator now
automatically generates multiple rules for a single source rule when
there are commutative ops in the rule. We can now drop all of our
almost-duplicate source-level rules and the canonicalization rules.
I have some CLs in progress that will be a lot less verbose when
the rule generator handles commutivity for me.
I had to reorganize the load-combining rules a bit. The 8-way OR rules
generated 128 different reorderings, which was causing the generator
to put too much code in the rewrite*.go files (the big ones were going
from 25K lines to 132K lines). Instead I reorganized the rules to
combine pairs of loads at a time. The generated rule files are now
actually a bit (5%) smaller.
[Note to reviewers: check these carefully. Most of the other rule
changes are trivial.]
Make.bash times are ~unchanged.
Compiler benchmarks are not observably different. Probably because
we don't spend much compiler time in rule matching anyway.
I've also done a pass over all of our ops adding commutative markings
for ops which hadn't had them previously.
Fixes#18292
Change-Id: I999b1307272e91965b66754576019dedcbe7527a
Reviewed-on: https://go-review.googlesource.com/38666
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
name old time/op new time/op delta
LeadingZeros-4 2.00ns ± 0% 1.34ns ± 1% -33.02% (p=0.000 n=8+10)
LeadingZeros16-4 1.62ns ± 0% 1.57ns ± 0% -3.09% (p=0.001 n=8+9)
LeadingZeros32-4 2.14ns ± 0% 1.48ns ± 0% -30.84% (p=0.002 n=8+10)
LeadingZeros64-4 2.06ns ± 1% 1.33ns ± 0% -35.08% (p=0.000 n=8+8)
8-bit args is a special case - the Go code is really fast because
it is just a single table lookup. So I've disabled that for now.
Intrinsics were actually slower:
LeadingZeros8-4 1.22ns ± 3% 1.58ns ± 1% +29.56% (p=0.000 n=10+10)
Update #18616
Change-Id: Ia9c289b9ba59c583ea64060470315fd637e814cf
Reviewed-on: https://go-review.googlesource.com/38311
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
Changes to ${GOARCH}Ops.go files were mechanically produced using
github.com/mdempsky/ssa-symops, a one-off tool that inserts
"SymEffect: X" elements by pattern matching against the Op names.
Change-Id: Ibf3e481ffd588647f2a31662d72114b740ccbfcf
Reviewed-on: https://go-review.googlesource.com/38084
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
There's no need to use @block rules, as canMergeLoad makes sure that
the load and op are already in the same block.
With no @block needed, we also don't need to set the type explicitly.
It can just be inherited from the op being rewritten.
Noticed while working on #19284.
Change-Id: Ied8bcc8058260118ff7e166093112e29107bcb7e
Reviewed-on: https://go-review.googlesource.com/37585
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Added a flag to generic and various architectures' atomic
operations that are judged to have observable side effects
and thus cannot be dead-code-eliminated.
Test requires GOMAXPROCS > 1 without preemption in loop.
Fixes#19182.
Change-Id: Id2230031abd2cca0bbb32fd68fc8a58fb912070f
Reviewed-on: https://go-review.googlesource.com/37333
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
XCHG needs to allow the stack pointer as an argument because we have a
rewrite that incorporates the address of a local variable into the
instruction.
Fixes#19184
Change-Id: Ic438e6e1946332cdce3864d15abecd41b911b2a9
Reviewed-on: https://go-review.googlesource.com/37253
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
On AMD64 Most operation can have one operand in memory.
Combine load and dependand operation into one new operation,
where possible. I've seen no significant performance changes on go1,
but this allows to remove ~1.8kb code from go tool. And in math package
I see e. g.:
Remainder-6 70.0ns ± 0% 64.6ns ± 0% -7.76% (p=0.000 n=9+1
Change-Id: I88b8602b1d55da8ba548a34eb7da4b25d59a297e
Reviewed-on: https://go-review.googlesource.com/36793
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Remove rotate generation from walk. Remove OLROT and ssa.Lrot* opcodes.
Generate rotates during SSA lowering for architectures that have them.
This CL will allow rotates to be generated in more situations,
like when the shift values are determined to be constant
only after some analysis.
Fixes#18254
Change-Id: I8d6d684ff5ce2511aceaddfda98b908007851079
Reviewed-on: https://go-review.googlesource.com/34232
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
We save and restore the link register in non-leaf functions because
it is clobbered by CALLs. It is therefore available for general
purpose use.
Only enabled on s390x currently. The RC4 benchmarks in particular
benefit from the extra register:
name old speed new speed delta
RC4_128 243MB/s ± 2% 341MB/s ± 2% +40.46% (p=0.008 n=5+5)
RC4_1K 267MB/s ± 0% 359MB/s ± 1% +34.32% (p=0.008 n=5+5)
RC4_8K 271MB/s ± 0% 362MB/s ± 0% +33.61% (p=0.008 n=5+5)
Change-Id: Id23bff95e771da9425353da2f32668b8e34ba09f
Reviewed-on: https://go-review.googlesource.com/30597
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Mark nil check operations as faulting if their arg is zero.
This lets the late nilcheck pass remove duplicates.
Fixes#17242.
Change-Id: I4c9938d8a5a1e43edd85b4a66f0b34004860bcd9
Reviewed-on: https://go-review.googlesource.com/29952
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Get rid of BlockCheck. Josh goaded me into it, and I went
down a rabbithole making it happen.
NilCheck now panics if the pointer is nil and returns void, as before.
BlockCheck is gone, and NilCheck is no longer a Control value for
any block. It just exists (and deadcode knows not to throw it away).
I rewrote the nilcheckelim pass to handle this case. In particular,
there can now be multiple NilCheck ops per block.
I moved all of the arch-dependent nil check elimination done as
part of ssaGenValue into its own proper pass, so we don't have to
duplicate that code for every architecture.
Making the arch-dependent nil check its own pass means I needed
to add a bunch of flags to the opcode table so I could write
the code without arch-dependent ops everywhere.
Change-Id: I419f891ac9b0de313033ff09115c374163416a9f
Reviewed-on: https://go-review.googlesource.com/29120
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
No need for it, we can treat calls as (mostly) normal values
that take a memory and return a memory.
Lowers the number of basic blocks needed to represent a function.
"go test -c net/http" uses 27% fewer basic blocks.
Probably doesn't affect generated code much, but should help
various passes whose running time and/or space depends on
the number of basic blocks.
Fixes#15631
Change-Id: I0bf21e123f835e2cfa382753955a4f8bce03dfa6
Reviewed-on: https://go-review.googlesource.com/28950
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
We used to reserve X15 to implement the 3-operand floating-point
sub/div ops with the 2-operand sub/div that 386/amd64 gives us.
Now that resultInArg0 is implemented, we no longer need to
reserve X15 (X7 on 386).
Fixes#15584
Change-Id: I978e6c0a35236e89641bfc027538cede66004e82
Reviewed-on: https://go-review.googlesource.com/28272
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Atomic add rules were depending on CSE to combine duplicate atomic ops.
With -N, CSE doesn't run.
Redo the rules for atomic add so there's only one atomic op.
Introduce an add-to-first-part-of-tuple pseudo-ops to make the atomic add result correct.
Change-Id: Ib132247051abe5f80fefad6c197db8df8ee06427
Reviewed-on: https://go-review.googlesource.com/27991
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Atomic swap, add/and/or, compare and swap.
Also works on amd64p32.
Change-Id: Idf2d8f3e1255f71deba759e6e75e293afe4ab2ba
Reviewed-on: https://go-review.googlesource.com/27813
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Inline atomic reads and writes on amd64. There's no reason
to pay the overhead of a call for these.
To keep atomic loads from being reordered, we make them
return a <value,memory> tuple.
Change the meaning of resultInArg0 for tuple-generating ops
to mean the first part of the result tuple, not the second.
This means we can always put the store part of the tuple last,
matching how arguments are laid out. This requires reordering
the outputs of add32carry and sub32carry and their descendents
in various architectures.
benchmark old ns/op new ns/op delta
BenchmarkAtomicLoad64-8 2.09 0.26 -87.56%
BenchmarkAtomicStore64-8 7.54 5.72 -24.14%
TBD (in a different CL): Cas, Or8, ...
Change-Id: I713ea88e7da3026c44ea5bdb56ed094b20bc5207
Reviewed-on: https://go-review.googlesource.com/27641
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Now that we have ops that can return 2 results, have BSF return a result
and flags. We can then get rid of the redundant comparison and use CMOV
instead of CMOVconst ops.
Get rid of a bunch of the ops we don't use. Ctz{8,16}, plus all the Clzs,
and CMOVNEs. I don't think we'll ever use them, and they would be easy
to add back if needed.
Change-Id: I8858a1d017903474ea7e4002fc76a6a86e7bd487
Reviewed-on: https://go-review.googlesource.com/27630
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Only need to zero-extend to 32 bits and we get the top
32 bits zeroed for free.
Only the WQ change actually generates different code.
The assembler did this optimization for us in the other two cases.
But we might as well do it during SSA so -S output more closely
matches the actual generated instructions.
Change-Id: I3e4ac50dc4da124014d4e31c86e9fc539d94f7fd
Reviewed-on: https://go-review.googlesource.com/23711
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
It's not a new backend, just a PtrSize==4 modification
of the existing AMD64 backend.
Change-Id: Icc63521a5cf4ebb379f7430ef3f070894c09afda
Reviewed-on: https://go-review.googlesource.com/25586
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reg allocator skips flag-typed values. Flag allocator uses the type
and whether the op has "clobberFlags" set.
Tested on AMD64, ARM, ARM64, 386. Passed 'toolstash -cmp' on AMD64.
PPC64 is coded blindly.
Change-Id: Ib1cc27efecef6a1bb27f7d7ed035a582660d244f
Reviewed-on: https://go-review.googlesource.com/25480
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
SSA compiler on AMD64 may spill Duff-adjusted address as scalar. If
the object is on stack and the stack moves, the spilled address become
invalid.
Making the spill pointer-typed does not work. The Duff-adjusted address
points to the memory before the area to be zeroed and may be invalid.
This may cause stack scanning code panic.
Fix it by doing Duff-adjustment in genValue, so the intermediate value
is not seen by the reg allocator, and will not be spilled.
Add a test to cover both cases. As it depends on allocation, it may
be not always triggered.
Fixes#16515.
Change-Id: Ia81d60204782de7405b7046165ad063384ede0db
Reviewed-on: https://go-review.googlesource.com/25309
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>