Helpful for indexed loads and stores when the stride is not equal to
the size being loaded/stored.
Update #7927
Change-Id: I8714dd4c7b18a96a611bf5647ee21f753d723945
Reviewed-on: https://go-review.googlesource.com/21346
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
Previously if we were only using the low bits of AuxInt,
the high bits were ignored and could be junk. This CL
changes that behavior to define the high bits to be the
sign-extended version of the low bits for all cases.
There are 2 main benefits:
- Deterministic representation. This helps with CSE.
(Const8 [0x1]) and (Const8 [0x101]) used to be the same "value"
but CSE couldn't see them as such.
- Testability. We can check that all ops leave AuxInt in a state
consistent with the new rule. In the old scheme, it was hard
to check whether a rule correctly used only the low-order bits.
Side benefits:
- ==0 and !=0 tests are easier.
Drawbacks:
- This differs from the runtime representation in registers,
where it is important that we allow upper bits to be undefined
(so we're not sign/zero-extending all the time).
- Ops that treat AuxInt as unsigned (shifts, mostly) need to be
a bit more careful.
Change-Id: I9a685ff27e36dc03287c9ab1cecd6c0b4045c819
Reviewed-on: https://go-review.googlesource.com/21256
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
The previous rules to combine indexed loads produced addresses like:
From: obj.Addr{
Type: TYPE_MEM,
Reg: REG_CX,
Name: NAME_AUTO,
Offset: 121,
...
}
which are erroneous because NAME_AUTO implies a base register of
REG_SP, and cmd/internal/obj/x86 makes many assumptions to this
effect. Note that previously we were also producing an extra "ADDQ
SP, CX" instruction, so indexing off of SP was already handled.
The approach taken by this CL to address the problem is to instead
produce addresses like:
From: obj.Addr{
Type: TYPE_MEM,
Reg: REG_SP,
Name: NAME_AUTO,
Offset: 121,
Index: REG_CX,
Scale: 1,
}
and to omit the "ADDQ SP, CX" instruction.
Downside to this approach is it requires adding a lot of new
MOV[WLQ]loadidx1 instructions that nearly duplicate functionality of
the existing MOV[WLQ]loadidx[248] instructions, but with a different
Scale.
Fixes#15001.
Change-Id: Iad9a1a41e5e2552f8d22e3ba975e4ea0862dffd2
Reviewed-on: https://go-review.googlesource.com/21245
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
One intrinsic was needed to help get the very best
performance out of a future GC; as long as that one was
being added, I also added Bswap since that is sometimes
a handy thing to have. I had intended to fill out the
bit-scan intrinsic family, but the mismatch between the
"scan forward" instruction and "count leading zeroes"
was large enough to cause me to leave it out -- it poses
a dilemma that I'd rather dodge right now.
These intrinsics are not exposed for general use.
That's a separate issue requiring an API proposal change
( https://github.com/golang/proposal )
All intrinsics are tested, both that they are substituted
on the appropriate architecture, and that they produce the
expected result.
Change-Id: I5848037cfd97de4f75bdc33bdd89bba00af4a8ee
Reviewed-on: https://go-review.googlesource.com/20564
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
No need to have both ops when they do the same thing.
Just declare MOVBload to zero extend and we can get rid
of MOVBQZXload. Same for W and L.
Kind of a followon cleanup for https://go-review.googlesource.com/c/19506/
Should enable an easier fix for #14920
Change-Id: I7cfac909a8ba387f433a6ae75c050740ebb34d42
Reviewed-on: https://go-review.googlesource.com/21004
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Parts of the SSA compiler in package gc contain amd64-specific code,
most notably Prog generation. Move this code into package amd64, so that
other architectures can be added more easily.
In package gc, this change is just moving code. There are no functional
changes or even any larger structural changes beyond changing function
names (mostly for export).
In the cmd/compile/internal/ssa/gen tool, more information is included
in arch to remove the AMD64-specific behavior in the main portion of the
tool. The generated opGen.go is identical.
Change-Id: I8eb37c6e6df6de1b65fa7dab6f3bc32c29daf643
Reviewed-on: https://go-review.googlesource.com/20609
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Split the auxFloat type into 32/64 bit versions and perform checking for
exactly representable float32 values. Perform const folding on
float32/64. Comment out some const negation rules that the frontend
already performs.
Change-Id: Ib3f8d59fa8b30e50fe0267786cfb3c50a06169d2
Reviewed-on: https://go-review.googlesource.com/20568
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
x86 has a lot of instructions that require the output to be in the same
register as one of the inputs. When allocating the output register,
allocate the same register as the input if it is available.
Improves the performance of golang.org/x/crypto/sha3 by
10% (from 6% slower than 1.6 to 4% faster).
Fixes#14745
Change-Id: I4d81785240c9368e4dc75107b45c959d200df8e6
Reviewed-on: https://go-review.googlesource.com/20488
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
This only deals with the loads themselves. The bounds checks
are a separate issue. Also doesn't handle stores, those are
harder because we need to make sure intermediate memory states
aren't observed (which is hard to do with rewrite rules).
Use one byte shorter instructions for zero-extending loads.
Update #14267
Change-Id: I40af25ab5208488151ba7db32bf96081878fa7d9
Reviewed-on: https://go-review.googlesource.com/20218
Reviewed-by: Alexandru Moșoi <alexandru@mosoi.ro>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Do some easy TODOs.
Move a bunch of other TODOs into bugs.
Change-Id: Iaba9dad6221a2af11b3cbcc512875f4a85842873
Reviewed-on: https://go-review.googlesource.com/20114
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.
This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:
$ perl -i -npe 's,^(\s*// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.) +([A-Z])')
$ go test go/doc -update
Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Add the max arg length to opcodes and use it in zcse. Doesn't affect
speed, but allows better checking in checkFunc and removes the need
to keep a list of zero arg opcodes up to date.
Change-Id: I157c6587154604119720ec6228b767b6e52bb5c7
Reviewed-on: https://go-review.googlesource.com/19994
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The frontend does this for 32 bits and below, but SSA needs
to do it for 64 bits. The algorithms are all copied from
cgen.go:cgen_div.
Speeds up TimeFormat substantially: ~40% slower to ~10% slower.
Change-Id: I023ea2eb6040df98ccd9105e15ca6ea695610a7a
Reviewed-on: https://go-review.googlesource.com/19302
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
Mostly indexed loads. A few more LEA cases.
Change-Id: Idc1d447ed0dd6e906cd48e70307a95e77f61cf5f
Reviewed-on: https://go-review.googlesource.com/19172
Reviewed-by: Todd Neal <todd@tneal.org>
Run-TryBot: Keith Randall <khr@golang.org>
Add the aux type to opcodes.
Add rematerializeable as a flag.
Change-Id: I906e19281498f3ee51bb136299bf26e13a54b2ec
Reviewed-on: https://go-review.googlesource.com/19088
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
Small optimizations I noticed while looking at Giovanni's test cases.
More shifts by constants.
Indexed stores for smaller types.
Fold LEA into loads/stores.
More extending loads.
CMP $0 of AND -> TEST
Fix order of TEST ops.
Giovanni's test cases at https://gist.github.com/rasky/62fba94e3a20d1b05b2a
Change-Id: I7077bc0b5319bf05767eeb39f401f4bb4b39f635
Reviewed-on: https://go-review.googlesource.com/19086
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
Reviewed-by: David Chase <drchase@google.com>
Rename StoreConst to ValAndOff so we can use it for other ops.
Make ValAndOff print nicely.
Add some notes & checks related to my aborted attempt to
implement combined CMP+load ops.
Change-Id: I2f901d12d42bc5a82879af0334806aa184a97e27
Reviewed-on: https://go-review.googlesource.com/18947
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: David Chase <drchase@google.com>
The x86 backend automatically rewrites MOV $0, AX to
XOR AX, AX. That rewrite isn't ok when the flags register
is live across the MOV. Keep track of which moves care
about preserving flags, then disable this rewrite for them.
On x86, Prog.Mark was being used to hold the length of the
instruction. We already store that in Prog.Isize, so no
need to store it in Prog.Mark also. This frees up Prog.Mark
to hold a bitmask on x86 just like all the other architectures.
Update #12405
Change-Id: Ibad8a8f41fc6222bec1e4904221887d3cc3ca029
Reviewed-on: https://go-review.googlesource.com/18861
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Add new constant-flags opcodes. These can be generated from
comparisons that we know the result of, like x&31 < 32.
Constant-fold the constant-flags opcodes into all flag users.
Reorder some CMPxconst args so they read in the comparison direction.
Reorg deadcode removal a bit - it needs to remove the OpCopy ops it
generates when strength-reducing Phi ops. So it needs to splice out all
the dead blocks and do a copy elimination before it computes live
values.
Change-Id: Ie922602033592ad8212efe4345394973d3b94d9f
Reviewed-on: https://go-review.googlesource.com/18267
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
With the separate flagalloc pass, it should be fine to
allow CSE of control values. The worst that can happen
is that the comparison gets un-CSEd by flagalloc.
Fix bug in flagalloc where flag restores were getting
clobbered by rematerialization during register allocation.
Change-Id: If476cf98b69973e8f1a8eb29441136dd12fab8ad
Reviewed-on: https://go-review.googlesource.com/17760
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Make sure that when a pointer value is live across a function
call, we save it as a pointer. (And similarly a uintptr
live across a function call should not be saved as a pointer.)
Add a nasty test case.
This is probably what is preventing the merge from master
to dev.ssa. Signs point to something like this bug happening
in mallocgc.
Change-Id: Ib23fa1251b8d1c50d82c6a448cb4a4fc28219029
Reviewed-on: https://go-review.googlesource.com/16830
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Introduce opcodes that store a constant value.
AuxInt now needs to hold both the value to be stored and the
constant offset at which to store it. Introduce a StoreConst
type to help encode/decode these parts to/from an AuxInt.
Change-Id: I1631883abe035cff4b16368683e1eb3d2ccb674d
Reviewed-on: https://go-review.googlesource.com/16170
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Replace REP MOVSB with all the copying techniques used by the
old compiler. Copy in chunks, DUFFCOPY, etc.
Introduces MOVO opcodes and an Int128 type to move around
16 bytes at a time.
Change-Id: I1e73e68ca1d8b3dd58bb4af2f4c9e5d9bf13a502
Reviewed-on: https://go-review.googlesource.com/16174
Reviewed-by: Todd Neal <todd@tneal.org>
Run-TryBot: Keith Randall <khr@golang.org>
Use faulting loads instead of test/jeq to do nil checks.
Fold nil checks into a following load/store if possible.
Makes binaries about 2% smaller.
Change-Id: I54af0f0a93c853f37e34e0ce7e3f01dd2ac87f64
Reviewed-on: https://go-review.googlesource.com/16287
Reviewed-by: David Chase <drchase@google.com>
getg reads from memory, so it should really have a
memory arg. It is critical in functions which call setg
to make sure getg gets ordered correctly with setg.
Change-Id: Ief4875421f741fc49c07b0e1f065ce2535232341
Reviewed-on: https://go-review.googlesource.com/16100
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
Cleaned up first-block-in-function code.
Added cases for |PHEAP for PPARAM and PAUTO.
Made PPARAMOUT act more like PAUTO for purposes
of address generation and vardef placement.
Added cases for OCLOSUREVAR and Ops for getting closure
pointer. Closure ops are scheduled at top of entry block
to capture DX.
Wrote test that seems to show proper behavior for addressed
parameters, locals, and returns.
Change-Id: Iee93ebf9e3d9f74cfb4d1c1da8038eb278d8a857
Reviewed-on: https://go-review.googlesource.com/14650
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
There's no need for special ops for panicindex and panicslice.
Just use regular runtime calls.
Change-Id: I71b9b73f4f1ebce1220fdc1e7b7f65cfcf4b7bae
Reviewed-on: https://go-review.googlesource.com/14726
Reviewed-by: David Chase <drchase@google.com>
OCALLINTER, as well as ODEFER/OPROC with OCALLMETH/OCALLINTER.
Move all the call logic to its own routine, a lot of the
code is shared.
Change-Id: Ieac59596165e434cc6d1d7b5e46b78957e9c5ed3
Reviewed-on: https://go-review.googlesource.com/14464
Reviewed-by: Todd Neal <todd@tneal.org>
Reviewed-by: David Chase <drchase@google.com>
TODO: for now, just function calls. Do method and interface calls.
Change-Id: Ib262dfa31cb753996cde899beaad4dc2e66705ac
Reviewed-on: https://go-review.googlesource.com/14035
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Modified to use the truncating conversion.
Fixes reflect.
Change-Id: I47bf3200abc2d2c662939a2a2351e2ff84168f4a
Reviewed-on: https://go-review.googlesource.com/14167
Reviewed-by: David Chase <drchase@google.com>
It does a XOR internally and clobbers flags.
Change-Id: Id6ef9219c4e6c3a2b5fc79c8d52bcfa30c148617
Reviewed-on: https://go-review.googlesource.com/14165
Reviewed-by: Keith Randall <khr@golang.org>
They do an AND or an OR internally, so they clobber flags.
Fixes#12441
Change-Id: I6c843bd268496bc13fc7e3c561d76619e961e8ad
Reviewed-on: https://go-review.googlesource.com/14180
Reviewed-by: Todd Neal <todd@tneal.org>
liblink rewrites MOV $0, reg into XOR reg, reg. Make MOVxconst clobber
flags so we don't generate invalid code in the unlikely case that it
matters. In testing, this change leads to no additional regenerated
flags due to a scheduling fix in CL14042.
Change-Id: I7bc1cfee94ef83beb2f97c31ec6a97e19872fb89
Reviewed-on: https://go-review.googlesource.com/14043
Reviewed-by: Keith Randall <khr@golang.org>
Still to do:
details, more testing corner cases. (e.g. negative zero)
Includes small cleanups for previous CL.
Note: complex division is currently done in the runtime,
so the division code here is apparently not yet necessary
and also not tested. Seems likely better to open code
division and expose the widening/narrowing to optimization.
Complex64 multiplication and division is done in wide
format to avoid cancellation errors; for division, this
also happens to be compatible with pre-SSA practice
(which uses a single complex128 division function).
It would-be-nice to widen for complex128 multiplication
intermediates as well, but that is trickier to implement
without a handy wider-precision format.
Change-Id: I595a4300f68868fb7641852a54674c6b2b78855e
Reviewed-on: https://go-review.googlesource.com/14028
Reviewed-by: Keith Randall <khr@golang.org>
Specifying types in rewrites for all subexpressions gets verbose
quickly. Allow opcodes to specify a default type which is used when
none is supplied explicitly.
Provide default types for a few easy opcodes. There are probably more
we can do, but this is a good start.
Change-Id: Iedc2a1a423cc3e2d4472640433982f9aa76a9f18
Reviewed-on: https://go-review.googlesource.com/14128
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Basic ops, no particular optimization in the pattern
matching yet (e.g. x!=x for Nan detection, x cmp constant,
etc.)
Change-Id: I0043564081d6dc0eede876c4a9eb3c33cbd1521c
Reviewed-on: https://go-review.googlesource.com/13704
Reviewed-by: Keith Randall <khr@golang.org>
MOVXload and MOVXstore opcodes have both an auxint offset
and an aux offset (a symbol name, like a local or arg or global).
Make sure we keep those values during rewrites.
Change-Id: Ic9fd61bf295b5d1457784c281079a4fb38f7ad3b
Reviewed-on: https://go-review.googlesource.com/13849
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
This further reduces the number of flags spills
during make.bash by about 50%.
Note that GetG is implemented by one or two MOVs,
which is why it does not clobber flags.
Change-Id: I6fede8c027b7dc340e00d1e15df1b87bf2b2d9ec
Reviewed-on: https://go-review.googlesource.com/13843
Reviewed-by: Keith Randall <khr@golang.org>
This reduces the number of flags spilled during
make.bash by > 90%.
I am working (slowly) on the rest.
Change-Id: I3c08ae228c33e2f726f615962996f0350c8d592b
Reviewed-on: https://go-review.googlesource.com/13813
Reviewed-by: David Chase <drchase@google.com>
Implement index check panics (and slice check panics, for when
we need those).
Clean up nil check. Now that the new regalloc is in we can use
the register we just tested as the address 0 destination.
Remove jumps after panic calls, they are unreachable.
Change-Id: Ifee6e510cdea49cc7c7056887e4f06c67488d491
Reviewed-on: https://go-review.googlesource.com/13687
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>