Commit graph

112 commits

Author SHA1 Message Date
Cherry Zhang
fddc004537 cmd/compile: remove nil check for Zero/Move on 386, AMD64, S390X
Fixes #18003.

Change-Id: Iadcc5c424c64badecfb5fdbd4dbd9197df56182c
Reviewed-on: https://go-review.googlesource.com/33421
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-02-02 21:28:38 +00:00
Keith Randall
01c8719f8b cmd/compile: move rotate instruction generation to SSA
Remove rotate generation from walk.  Remove OLROT and ssa.Lrot* opcodes.
Generate rotates during SSA lowering for architectures that have them.

This CL will allow rotates to be generated in more situations,
like when the shift values are determined to be constant
only after some analysis.

Fixes #18254

Change-Id: I8d6d684ff5ce2511aceaddfda98b908007851079
Reviewed-on: https://go-review.googlesource.com/34232
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-02-02 17:57:15 +00:00
Michael Munday
15817e409b cmd/compile: make link register allocatable in non-leaf functions
We save and restore the link register in non-leaf functions because
it is clobbered by CALLs. It is therefore available for general
purpose use.

Only enabled on s390x currently. The RC4 benchmarks in particular
benefit from the extra register:

name     old speed     new speed     delta
RC4_128  243MB/s ± 2%  341MB/s ± 2%  +40.46%  (p=0.008 n=5+5)
RC4_1K   267MB/s ± 0%  359MB/s ± 1%  +34.32%  (p=0.008 n=5+5)
RC4_8K   271MB/s ± 0%  362MB/s ± 0%  +33.61%  (p=0.008 n=5+5)

Change-Id: Id23bff95e771da9425353da2f32668b8e34ba09f
Reviewed-on: https://go-review.googlesource.com/30597
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-11 18:52:35 +00:00
Cherry Zhang
2756d56c89 cmd/compile: intrinsify math/big.mulWW, divWW on AMD64
Change-Id: I59f7afa7a5803d19f8b21fe70fc85ef997bb3a85
Reviewed-on: https://go-review.googlesource.com/30542
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-10-11 16:07:46 +00:00
Keith Randall
98938189a1 cmd/compile: remove duplicate nilchecks
Mark nil check operations as faulting if their arg is zero.
This lets the late nilcheck pass remove duplicates.

Fixes #17242.

Change-Id: I4c9938d8a5a1e43edd85b4a66f0b34004860bcd9
Reviewed-on: https://go-review.googlesource.com/29952
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-09-27 23:54:01 +00:00
Keith Randall
3134ab3c2d cmd/compile: redo nil checks
Get rid of BlockCheck. Josh goaded me into it, and I went
down a rabbithole making it happen.

NilCheck now panics if the pointer is nil and returns void, as before.
BlockCheck is gone, and NilCheck is no longer a Control value for
any block. It just exists (and deadcode knows not to throw it away).

I rewrote the nilcheckelim pass to handle this case.  In particular,
there can now be multiple NilCheck ops per block.

I moved all of the arch-dependent nil check elimination done as
part of ssaGenValue into its own proper pass, so we don't have to
duplicate that code for every architecture.

Making the arch-dependent nil check its own pass means I needed
to add a bunch of flags to the opcode table so I could write
the code without arch-dependent ops everywhere.

Change-Id: I419f891ac9b0de313033ff09115c374163416a9f
Reviewed-on: https://go-review.googlesource.com/29120
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-09-15 02:42:13 +00:00
Keith Randall
c345a3913f cmd/compile: get rid of BlockCall
No need for it, we can treat calls as (mostly) normal values
that take a memory and return a memory.

Lowers the number of basic blocks needed to represent a function.
"go test -c net/http" uses 27% fewer basic blocks.
Probably doesn't affect generated code much, but should help
various passes whose running time and/or space depends on
the number of basic blocks.

Fixes #15631

Change-Id: I0bf21e123f835e2cfa382753955a4f8bce03dfa6
Reviewed-on: https://go-review.googlesource.com/28950
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-09-12 23:27:02 +00:00
Cherry Zhang
f1ef5a06d2 cmd/compile: mark some AMD64 atomic ops as clobberFlags
Fixes #16985.

Change-Id: I5954db28f7b70dd3ac7768e471d5df871a5b20f9
Reviewed-on: https://go-review.googlesource.com/28510
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-09-06 14:26:18 +00:00
Keith Randall
cc0248aea5 cmd/compile: don't reserve X15 for float sub/div any more
We used to reserve X15 to implement the 3-operand floating-point
sub/div ops with the 2-operand sub/div that 386/amd64 gives us.

Now that resultInArg0 is implemented, we no longer need to
reserve X15 (X7 on 386).

Fixes #15584

Change-Id: I978e6c0a35236e89641bfc027538cede66004e82
Reviewed-on: https://go-review.googlesource.com/28272
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-08-31 20:35:49 +00:00
Keith Randall
0c6c3d1de7 cmd/compile: fix noopt build
Atomic add rules were depending on CSE to combine duplicate atomic ops.
With -N, CSE doesn't run.

Redo the rules for atomic add so there's only one atomic op.
Introduce an add-to-first-part-of-tuple pseudo-ops to make the atomic add result correct.

Change-Id: Ib132247051abe5f80fefad6c197db8df8ee06427
Reviewed-on: https://go-review.googlesource.com/27991
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-08-28 18:54:52 +00:00
Keith Randall
84aac622a4 cmd/compile: intrinsify the rest of runtime/internal/atomic for amd64
Atomic swap, add/and/or, compare and swap.

Also works on amd64p32.

Change-Id: Idf2d8f3e1255f71deba759e6e75e293afe4ab2ba
Reviewed-on: https://go-review.googlesource.com/27813
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-08-28 16:31:08 +00:00
Keith Randall
320ddcf834 cmd/compile: inline atomics from runtime/internal/atomic on amd64
Inline atomic reads and writes on amd64.  There's no reason
to pay the overhead of a call for these.

To keep atomic loads from being reordered, we make them
return a <value,memory> tuple.

Change the meaning of resultInArg0 for tuple-generating ops
to mean the first part of the result tuple, not the second.
This means we can always put the store part of the tuple last,
matching how arguments are laid out.  This requires reordering
the outputs of add32carry and sub32carry and their descendents
in various architectures.

benchmark                    old ns/op     new ns/op     delta
BenchmarkAtomicLoad64-8      2.09          0.26          -87.56%
BenchmarkAtomicStore64-8     7.54          5.72          -24.14%

TBD (in a different CL): Cas, Or8, ...

Change-Id: I713ea88e7da3026c44ea5bdb56ed094b20bc5207
Reviewed-on: https://go-review.googlesource.com/27641
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-08-25 20:09:04 +00:00
Keith Randall
3e270ab80b cmd/compile: clean up ctz ops
Now that we have ops that can return 2 results, have BSF return a result
and flags.  We can then get rid of the redundant comparison and use CMOV
instead of CMOVconst ops.

Get rid of a bunch of the ops we don't use.  Ctz{8,16}, plus all the Clzs,
and CMOVNEs.  I don't think we'll ever use them, and they would be easy
to add back if needed.

Change-Id: I8858a1d017903474ea7e4002fc76a6a86e7bd487
Reviewed-on: https://go-review.googlesource.com/27630
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-08-23 23:45:12 +00:00
Keith Randall
5ae8230769 cmd/compile: use shorter versions of zero-extend ops
Only need to zero-extend to 32 bits and we get the top
32 bits zeroed for free.

Only the WQ change actually generates different code.
The assembler did this optimization for us in the other two cases.
But we might as well do it during SSA so -S output more closely
matches the actual generated instructions.

Change-Id: I3e4ac50dc4da124014d4e31c86e9fc539d94f7fd
Reviewed-on: https://go-review.googlesource.com/23711
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-08-16 21:32:21 +00:00
Keith Randall
69a755b602 [dev.ssa] cmd/compile: port SSA backend to amd64p32
It's not a new backend, just a PtrSize==4 modification
of the existing AMD64 backend.

Change-Id: Icc63521a5cf4ebb379f7430ef3f070894c09afda
Reviewed-on: https://go-review.googlesource.com/25586
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-08-09 15:48:26 +00:00
Cherry Zhang
0484052358 [dev.ssa] cmd/compile: remove flags from regMask
Reg allocator skips flag-typed values. Flag allocator uses the type
and whether the op has "clobberFlags" set.

Tested on AMD64, ARM, ARM64, 386. Passed 'toolstash -cmp' on AMD64.
PPC64 is coded blindly.

Change-Id: Ib1cc27efecef6a1bb27f7d7ed035a582660d244f
Reviewed-on: https://go-review.googlesource.com/25480
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-08-07 03:08:03 +00:00
Keith Randall
d2286ea284 [dev.ssa] Merge remote-tracking branch 'origin/master' into mergebranch
Semi-regular merge from tip into dev.ssa.

Change-Id: Iadb60e594ef65a99c0e1404b14205fa67c32a9e9
2016-08-04 10:08:20 -07:00
Cherry Zhang
111d590f86 cmd/compile: fix possible spill of invalid pointer with DUFFZERO on AMD64
SSA compiler on AMD64 may spill Duff-adjusted address as scalar. If
the object is on stack and the stack moves, the spilled address become
invalid.

Making the spill pointer-typed does not work. The Duff-adjusted address
points to the memory before the area to be zeroed and may be invalid.
This may cause stack scanning code panic.

Fix it by doing Duff-adjustment in genValue, so the intermediate value
is not seen by the reg allocator, and will not be spilled.

Add a test to cover both cases. As it depends on allocation, it may
be not always triggered.

Fixes #16515.

Change-Id: Ia81d60204782de7405b7046165ad063384ede0db
Reviewed-on: https://go-review.googlesource.com/25309
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-07-29 01:09:55 +00:00
Keith Randall
1b0404c4ca [dev.ssa] cmd/compile: fix verbose typing of DIV
Use Cherry's awesome pair type constructor.

Change-Id: I282156a570ee4dd3548bd82fbf15b8d8eb5bedf6
Reviewed-on: https://go-review.googlesource.com/25009
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-07-18 21:13:15 +00:00
Keith Randall
aee8d8b9dd [dev.ssa] cmd/compile: implement more 64-bit ops on 386
add/sub/mul, plus constant input variants.

Change-Id: I1c8006727c4fdf73558da0e646e7d1fa130ed773
Reviewed-on: https://go-review.googlesource.com/25006
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-07-18 19:52:28 +00:00
Keith Randall
cf92e3845f [dev.ssa] cmd/compile: use 2-result divide op
We now allow Values to have 2 outputs.  Use that ability for amd64.
This allows x,y := a/b,a%b to use just a single divide instruction.

Update #6815

Change-Id: Id70bcd20188a2dd8445e631a11d11f60991921e4
Reviewed-on: https://go-review.googlesource.com/25004
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
2016-07-18 19:41:05 +00:00
Cherry Zhang
90883091ff [dev.ssa] cmd/compile: clean up hardcoded regmasks in ssa/regalloc.go
Auto-generate register masks and load them through Config.

Passed toolstash -cmp on AMD64.

Tests phi_ssa.go and regalloc_ssa.go in cmd/compile/internal/gc/testdata
passed on ARM.

Updates #15365.

Change-Id: I393924d68067f2dbb13dab82e569fb452c986593
Reviewed-on: https://go-review.googlesource.com/23292
Reviewed-by: David Chase <drchase@google.com>
2016-06-02 13:01:44 +00:00
Cherry Zhang
ccaed50c7b [dev.ssa] cmd/compile: handle boolean values for SSA on ARM
Fix hardcoded flag register mask in ssa/flagalloc.go by auto-generating
the mask.

Also fix a mistake (in previous CL) about conditional branches.

Progress on SSA backend for ARM. Still not complete. Now "container/ring"
package compiles and tests passed.

Updates #15365.

Change-Id: Id7c8805c30dbb8107baedb485ed0f71f59ed6ea8
Reviewed-on: https://go-review.googlesource.com/23093
Reviewed-by: Keith Randall <khr@golang.org>
2016-05-19 02:48:36 +00:00
Keith Randall
e4355aeedf cmd/compile: more sanity checks on rewrite rules
Make sure ops have the right number of args, set
aux and auxint only if allowed, etc.

Normalize error reporting format.

Change-Id: Ie545fcc5990c8c7d62d40d9a0a55885f941eb645
Reviewed-on: https://go-review.googlesource.com/22320
Reviewed-by: David Chase <drchase@google.com>
2016-04-26 18:01:55 +00:00
Keith Randall
9e3c68f1e0 cmd/compile: get rid of most byte and word insns for amd64
Now that we're using 32-bit ops for 8/16-bit logical operations
(to avoid partial register stalls), there's really no need to
keep track of the 8/16-bit ops at all.  Convert everything we
can to 32-bit ops.

This CL is the obvious stuff.  I might think a bit more about
whether we can get rid of weirder stuff like HMULWU.

The only downside to this CL is that we lose some information
about constants.  If we had source like:
  var a byte = ...
  a += 128
  a += 128
We will convert that to a += 256, when we could get rid of the
add altogether.  This seems like a fairly unusual scenario and
I'm happy with forgoing that optimization.

Change-Id: Ia7c1e5203d0d110807da69ed646535194a3efba1
Reviewed-on: https://go-review.googlesource.com/22382
Reviewed-by: Todd Neal <todd@tneal.org>
2016-04-23 16:30:27 +00:00
Keith Randall
0004f34cef cmd/compile: regalloc enforces 2-address instructions
Instead of being a hint, resultInArg0 is now enforced by regalloc.
This allows us to delete all the code from amd64/ssa.go which
deals with converting from a semantically three-address instruction
into some copies plus a two-address instruction.

Change-Id: Id4f39a80be4b678718bfd42a229f9094ab6ecd7c
Reviewed-on: https://go-review.googlesource.com/21816
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-04-10 23:20:38 +00:00
Dave Cheney
7208a2cd78 cmd/compile/internal/ssa: hide gen packge from ./make.bash
Fixes #15122

Change-Id: Ie2c802d78aea731e25bf4b193b3c2e4c884e0573
Reviewed-on: https://go-review.googlesource.com/21524
Run-TryBot: Dave Cheney <dave@cheney.net>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-05 05:53:15 +00:00
Keith Randall
af517da2f9 cmd/compile: Add more idx1 load/store instructions
Helpful for indexed loads and stores when the stride is not equal to
the size being loaded/stored.

Update #7927

Change-Id: I8714dd4c7b18a96a611bf5647ee21f753d723945
Reviewed-on: https://go-review.googlesource.com/21346
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-03-31 17:30:40 +00:00
Keith Randall
7fc5621991 cmd/compile: define high bits of AuxInt
Previously if we were only using the low bits of AuxInt,
the high bits were ignored and could be junk.  This CL
changes that behavior to define the high bits to be the
sign-extended version of the low bits for all cases.

There are 2 main benefits:
- Deterministic representation.  This helps with CSE.
  (Const8 [0x1]) and (Const8 [0x101]) used to be the same "value"
  but CSE couldn't see them as such.
- Testability.  We can check that all ops leave AuxInt in a state
  consistent with the new rule.  In the old scheme, it was hard
  to check whether a rule correctly used only the low-order bits.
Side benefits:
- ==0 and !=0 tests are easier.

Drawbacks:
- This differs from the runtime representation in registers,
  where it is important that we allow upper bits to be undefined
  (so we're not sign/zero-extending all the time).
- Ops that treat AuxInt as unsigned (shifts, mostly) need to be
  a bit more careful.

Change-Id: I9a685ff27e36dc03287c9ab1cecd6c0b4045c819
Reviewed-on: https://go-review.googlesource.com/21256
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-03-30 04:48:28 +00:00
Matthew Dempsky
da19a0cff4 cmd/compile: fix plan9-amd64 build
The previous rules to combine indexed loads produced addresses like:

    From: obj.Addr{
        Type:   TYPE_MEM,
        Reg:    REG_CX,
        Name:   NAME_AUTO,
        Offset: 121,
        ...
    }

which are erroneous because NAME_AUTO implies a base register of
REG_SP, and cmd/internal/obj/x86 makes many assumptions to this
effect.  Note that previously we were also producing an extra "ADDQ
SP, CX" instruction, so indexing off of SP was already handled.

The approach taken by this CL to address the problem is to instead
produce addresses like:

    From: obj.Addr{
        Type:   TYPE_MEM,
        Reg:    REG_SP,
        Name:   NAME_AUTO,
        Offset: 121,
        Index:  REG_CX,
        Scale:  1,
    }

and to omit the "ADDQ SP, CX" instruction.

Downside to this approach is it requires adding a lot of new
MOV[WLQ]loadidx1 instructions that nearly duplicate functionality of
the existing MOV[WLQ]loadidx[248] instructions, but with a different
Scale.

Fixes #15001.

Change-Id: Iad9a1a41e5e2552f8d22e3ba975e4ea0862dffd2
Reviewed-on: https://go-review.googlesource.com/21245
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-29 03:22:06 +00:00
David Chase
8eec2bbfbc cmd/compile: added some intrinsics to SSA back end
One intrinsic was needed to help get the very best
performance out of a future GC; as long as that one was
being added, I also added Bswap since that is sometimes
a handy thing to have.  I had intended to fill out the
bit-scan intrinsic family, but the mismatch between the
"scan forward" instruction and "count leading zeroes"
was large enough to cause me to leave it out -- it poses
a dilemma that I'd rather dodge right now.

These intrinsics are not exposed for general use.
That's a separate issue requiring an API proposal change
( https://github.com/golang/proposal )

All intrinsics are tested, both that they are substituted
on the appropriate architecture, and that they produce the
expected result.

Change-Id: I5848037cfd97de4f75bdc33bdd89bba00af4a8ee
Reviewed-on: https://go-review.googlesource.com/20564
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 16:29:59 +00:00
Keith Randall
68e86e6dfa cmd/compile: MOVBload and MOVBQZXload are the same op
No need to have both ops when they do the same thing.
Just declare MOVBload to zero extend and we can get rid
of MOVBQZXload.  Same for W and L.

Kind of a followon cleanup for https://go-review.googlesource.com/c/19506/
Should enable an easier fix for #14920

Change-Id: I7cfac909a8ba387f433a6ae75c050740ebb34d42
Reviewed-on: https://go-review.googlesource.com/21004
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-23 00:28:01 +00:00
Keith Randall
7177cb9fa4 cmd/compile: remove dots from register names
They are kind of useless and are cluttering up
https://go-review.googlesource.com/c/21000/

Change-Id: Iafdec75ada11c7ebdc40540d251fdc514bb00d3d
Reviewed-on: https://go-review.googlesource.com/21001
Reviewed-by: Minux Ma <minux@golang.org>
2016-03-22 17:30:30 +00:00
Michael Pratt
a4e31d42ee cmd/compile: remove amd64 code from package gc and the core gen tool
Parts of the SSA compiler in package gc contain amd64-specific code,
most notably Prog generation. Move this code into package amd64, so that
other architectures can be added more easily.

In package gc, this change is just moving code. There are no functional
changes or even any larger structural changes beyond changing function
names (mostly for export).

In the cmd/compile/internal/ssa/gen tool, more information is included
in arch to remove the AMD64-specific behavior in the main portion of the
tool. The generated opGen.go is identical.

Change-Id: I8eb37c6e6df6de1b65fa7dab6f3bc32c29daf643
Reviewed-on: https://go-review.googlesource.com/20609
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-14 16:59:03 +00:00
Todd Neal
f6ceed2cab cmd/compile: const folding for float32/64
Split the auxFloat type into 32/64 bit versions and perform checking for
exactly representable float32 values.  Perform const folding on
float32/64.  Comment out some const negation rules that the frontend
already performs.

Change-Id: Ib3f8d59fa8b30e50fe0267786cfb3c50a06169d2
Reviewed-on: https://go-review.googlesource.com/20568
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-13 13:32:41 +00:00
Keith Randall
369f4f5de5 cmd/compile: regalloc of two address instructions
x86 has a lot of instructions that require the output to be in the same
register as one of the inputs.  When allocating the output register,
allocate the same register as the input if it is available.

Improves the performance of golang.org/x/crypto/sha3 by
10% (from 6% slower than 1.6 to 4% faster).

Fixes #14745

Change-Id: I4d81785240c9368e4dc75107b45c959d200df8e6
Reviewed-on: https://go-review.googlesource.com/20488
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-03-11 04:13:07 +00:00
Keith Randall
12e60452e9 cmd/compile: Combine smaller loads into a larger load
This only deals with the loads themselves.  The bounds checks
are a separate issue.  Also doesn't handle stores, those are
harder because we need to make sure intermediate memory states
aren't observed (which is hard to do with rewrite rules).

Use one byte shorter instructions for zero-extending loads.

Update #14267

Change-Id: I40af25ab5208488151ba7db32bf96081878fa7d9
Reviewed-on: https://go-review.googlesource.com/20218
Reviewed-by: Alexandru Moșoi <alexandru@mosoi.ro>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-06 22:52:22 +00:00
Keith Randall
62ac107a34 cmd/compile: some SSA cleanup
Do some easy TODOs.
Move a bunch of other TODOs into bugs.

Change-Id: Iaba9dad6221a2af11b3cbcc512875f4a85842873
Reviewed-on: https://go-review.googlesource.com/20114
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-03-02 03:17:46 +00:00
Brad Fitzpatrick
5fea2ccc77 all: single space after period.
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.

This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:

$ perl -i -npe 's,^(\s*// .+[a-z]\.)  +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.)  +([A-Z])')
$ go test go/doc -update

Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-02 00:13:47 +00:00
Ilya Tocar
e96b232993 [dev.ssa] cmd/compile: promote byte/word operation
Writing to low 8/16 bits of register creates false dependency
Generate 32-bit operations when possible.

Change-Id: I8eb6c1c43a66424eec6baa91a660bceb6b80d1d3
Reviewed-on: https://go-review.googlesource.com/19506
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-01 15:54:52 +00:00
Todd Neal
4e95dfed01 [dev.ssa] cmd/compile: add max arg length to opcodes
Add the max arg length to opcodes and use it in zcse.  Doesn't affect
speed, but allows better checking in checkFunc and removes the need
to keep a list of zero arg opcodes up to date.

Change-Id: I157c6587154604119720ec6228b767b6e52bb5c7
Reviewed-on: https://go-review.googlesource.com/19994
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-28 14:50:19 +00:00
Keith Randall
a3055af45e [dev.ssa] cmd/compile: strength-reduce 64-bit constant divides
The frontend does this for 32 bits and below, but SSA needs
to do it for 64 bits.  The algorithms are all copied from
cgen.go:cgen_div.

Speeds up TimeFormat substantially: ~40% slower to ~10% slower.

Change-Id: I023ea2eb6040df98ccd9105e15ca6ea695610a7a
Reviewed-on: https://go-review.googlesource.com/19302
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-06 16:52:57 +00:00
Keith Randall
a6fb514bf8 [dev.ssa] cmd/compile: add store constant indexed operations
Change-Id: Ifb8eba1929c79ee7a8cae2191613c55a3b8f74e5
Reviewed-on: https://go-review.googlesource.com/19236
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-05 01:53:13 +00:00
Keith Randall
9278a04a8f [dev.ssa] cmd/compile: more combining of ops into instructions
Mostly indexed loads.  A few more LEA cases.

Change-Id: Idc1d447ed0dd6e906cd48e70307a95e77f61cf5f
Reviewed-on: https://go-review.googlesource.com/19172
Reviewed-by: Todd Neal <todd@tneal.org>
Run-TryBot: Keith Randall <khr@golang.org>
2016-02-04 22:30:29 +00:00
Keith Randall
16b1fce921 [dev.ssa] cmd/compile: add aux typing, flags to ops
Add the aux type to opcodes.
Add rematerializeable as a flag.

Change-Id: I906e19281498f3ee51bb136299bf26e13a54b2ec
Reviewed-on: https://go-review.googlesource.com/19088
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-02 02:55:13 +00:00
Keith Randall
1cc5789df9 [dev.ssa] cmd/compile: lots of small rewrite optimizations
Small optimizations I noticed while looking at Giovanni's test cases.

More shifts by constants.
Indexed stores for smaller types.
Fold LEA into loads/stores.
More extending loads.
CMP $0 of AND -> TEST

Fix order of TEST ops.

Giovanni's test cases at https://gist.github.com/rasky/62fba94e3a20d1b05b2a

Change-Id: I7077bc0b5319bf05767eeb39f401f4bb4b39f635
Reviewed-on: https://go-review.googlesource.com/19086
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
Reviewed-by: David Chase <drchase@google.com>
2016-01-31 05:30:13 +00:00
Keith Randall
f94e0745b3 [dev.ssa] cmd/compile: prepare for some load+op combining
Rename StoreConst to ValAndOff so we can use it for other ops.
Make ValAndOff print nicely.

Add some notes & checks related to my aborted attempt to
implement combined CMP+load ops.

Change-Id: I2f901d12d42bc5a82879af0334806aa184a97e27
Reviewed-on: https://go-review.googlesource.com/18947
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: David Chase <drchase@google.com>
2016-01-29 20:22:09 +00:00
Keith Randall
7b773946c0 [dev.ssa] cmd/compile: disable xor clearing when flags must be preserved
The x86 backend automatically rewrites MOV $0, AX to
XOR AX, AX.  That rewrite isn't ok when the flags register
is live across the MOV.  Keep track of which moves care
about preserving flags, then disable this rewrite for them.

On x86, Prog.Mark was being used to hold the length of the
instruction.  We already store that in Prog.Isize, so no
need to store it in Prog.Mark also.  This frees up Prog.Mark
to hold a bitmask on x86 just like all the other architectures.

Update #12405

Change-Id: Ibad8a8f41fc6222bec1e4904221887d3cc3ca029
Reviewed-on: https://go-review.googlesource.com/18861
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2016-01-26 17:40:22 +00:00
Keith Randall
3425295e91 [dev.ssa] cmd/compile: clean up comparisons
Add new constant-flags opcodes.  These can be generated from
comparisons that we know the result of, like x&31 < 32.

Constant-fold the constant-flags opcodes into all flag users.

Reorder some CMPxconst args so they read in the comparison direction.

Reorg deadcode removal a bit - it needs to remove the OpCopy ops it
generates when strength-reducing Phi ops.  So it needs to splice out all
the dead blocks and do a copy elimination before it computes live
values.

Change-Id: Ie922602033592ad8212efe4345394973d3b94d9f
Reviewed-on: https://go-review.googlesource.com/18267
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-01-13 18:42:00 +00:00
Keith Randall
4989337192 [dev.ssa] cmd/compile: allow control values to be CSEd
With the separate flagalloc pass, it should be fine to
allow CSE of control values.  The worst that can happen
is that the comparison gets un-CSEd by flagalloc.

Fix bug in flagalloc where flag restores were getting
clobbered by rematerialization during register allocation.

Change-Id: If476cf98b69973e8f1a8eb29441136dd12fab8ad
Reviewed-on: https://go-review.googlesource.com/17760
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
2015-12-12 06:41:05 +00:00