Commit graph

146 commits

Author SHA1 Message Date
Josh Bleecher Snyder
eaa198f3d1 cmd/compile: stop generating block successor vars in rewrite rules
They are left over from the days before
we had BlockKindFirst and swapSuccessors.

Change-Id: I9259d53ac2821ca4d5de5dd520ca4b78f52ecad4
Reviewed-on: https://go-review.googlesource.com/41206
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-21 04:11:51 +00:00
Matthew Dempsky
1e3570ac86 cmd/internal/objabi: extract shared functionality from obj
Now only cmd/asm and cmd/compile depend on cmd/internal/obj. Changing
the assembler backends no longer requires reinstalling cmd/link or
cmd/addr2line.

There's also now one canonical definition of the object file format in
cmd/internal/objabi/doc.go, with a warning to update all three
implementations.

objabi is still something of a grab bag of unrelated code (e.g., flag
and environment variable handling probably belong in a separate "tool"
package), but this is still progress.

Fixes #15165.
Fixes #20026.

Change-Id: Ic4b92fac7d0d35438e0d20c9579aad4085c5534c
Reviewed-on: https://go-review.googlesource.com/40972
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-04-19 00:00:09 +00:00
Keith Randall
53f8a6aeb0 cmd/compile: automatically handle commuting ops in rewrite rules
Note that this is a redo of an undo of the original buggy CL 38666.

We have lots of rewrite rules that vary only in the fact that
we have 2 versions for the 2 different orderings of various
commuting ops. For example:

(ADDL x (MOVLconst [c])) -> (ADDLconst [c] x)
(ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x)

It can get unwieldly quickly, especially when there is more than
one commuting op in a rule.

Our existing "fix" for this problem is to have rules that
canonicalize the operations first. For example:

(Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x)

Subsequent rules can then assume if there is a constant arg to Eq64,
it will be the first one. This fix kinda works, but it is fragile and
only works when we remember to include the required extra rules.

The fundamental problem is that the rule matcher doesn't
know anything about commuting ops. This CL fixes that fact.

We already have information about which ops commute. (The register
allocator takes advantage of commutivity.)  The rule generator now
automatically generates multiple rules for a single source rule when
there are commutative ops in the rule. We can now drop all of our
almost-duplicate source-level rules and the canonicalization rules.

I have some CLs in progress that will be a lot less verbose when
the rule generator handles commutivity for me.

I had to reorganize the load-combining rules a bit. The 8-way OR rules
generated 128 different reorderings, which was causing the generator
to put too much code in the rewrite*.go files (the big ones were going
from 25K lines to 132K lines). Instead I reorganized the rules to
combine pairs of loads at a time. The generated rule files are now
actually a bit (5%) smaller.

Make.bash times are ~unchanged.

Compiler benchmarks are not observably different. Probably because
we don't spend much compiler time in rule matching anyway.

I've also done a pass over all of our ops adding commutative markings
for ops which hadn't had them previously.

Fixes #18292

Change-Id: Ic1c0e43fbf579539f459971625f69690c9ab8805
Reviewed-on: https://go-review.googlesource.com/38801
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-04-03 22:03:43 +00:00
Keith Randall
63a72fd447 cmd/compile: strength-reduce floating point
x*2 -> x+x
x/c, c power of 2 -> x*(1/c)

Fixes #19827

Change-Id: I74c9f0b5b49b2ed26c0990314c7d1d5f9631b6f1
Reviewed-on: https://go-review.googlesource.com/39295
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-04-03 21:27:03 +00:00
Brad Fitzpatrick
69fe9ea43e cmd/compile/internal/ssa: use recently agreed upon generated code header
Updates #13560

Change-Id: I9bc08ca5cf0627e653d55f748ebb83be8b69ea3b
Reviewed-on: https://go-review.googlesource.com/39296
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-03 18:04:41 +00:00
Ben Shi
8577f81a10 cmd/compile/internal: Optimization with RBIT and REV
By checking GOARM in ssa/gen/ARM.rules, each intermediate operator
can be implemented via different instruction serials.

It is up to the user to choose between compitability and efficiency.

The Bswap32(x) is optimized to REV(x) when GOARM >= 6.
The CTZ(x) is optimized to CLZ(RBIT x) when GOARM == 7.

Change-Id: Ie9ee645fa39333fa79ad84ed4d1cefac30422814
Reviewed-on: https://go-review.googlesource.com/35610
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-31 15:10:24 +00:00
Keith Randall
68da265c8e Revert "cmd/compile: automatically handle commuting ops in rewrite rules"
This reverts commit 041ecb697f.

Reason for revert: Not working on S390x and some 386 archs.
I have a guess why the S390x is failing.  No clue on the 386 yet.
Revert until I can figure it out.

Change-Id: I64f1ce78fa6d1037ebe7ee2a8a8107cb4c1db70c
Reviewed-on: https://go-review.googlesource.com/38790
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-29 18:06:44 +00:00
Keith Randall
041ecb697f cmd/compile: automatically handle commuting ops in rewrite rules
We have lots of rewrite rules that vary only in the fact that
we have 2 versions for the 2 different orderings of various
commuting ops. For example:

(ADDL x (MOVLconst [c])) -> (ADDLconst [c] x)
(ADDL (MOVLconst [c]) x) -> (ADDLconst [c] x)

It can get unwieldly quickly, especially when there is more than
one commuting op in a rule.

Our existing "fix" for this problem is to have rules that
canonicalize the operations first. For example:

(Eq64 x (Const64 <t> [c])) && x.Op != OpConst64 -> (Eq64 (Const64 <t> [c]) x)

Subsequent rules can then assume if there is a constant arg to Eq64,
it will be the first one. This fix kinda works, but it is fragile and
only works when we remember to include the required extra rules.

The fundamental problem is that the rule matcher doesn't
know anything about commuting ops. This CL fixes that fact.

We already have information about which ops commute. (The register
allocator takes advantage of commutivity.)  The rule generator now
automatically generates multiple rules for a single source rule when
there are commutative ops in the rule. We can now drop all of our
almost-duplicate source-level rules and the canonicalization rules.

I have some CLs in progress that will be a lot less verbose when
the rule generator handles commutivity for me.

I had to reorganize the load-combining rules a bit. The 8-way OR rules
generated 128 different reorderings, which was causing the generator
to put too much code in the rewrite*.go files (the big ones were going
from 25K lines to 132K lines). Instead I reorganized the rules to
combine pairs of loads at a time. The generated rule files are now
actually a bit (5%) smaller.
[Note to reviewers: check these carefully. Most of the other rule
changes are trivial.]

Make.bash times are ~unchanged.

Compiler benchmarks are not observably different. Probably because
we don't spend much compiler time in rule matching anyway.

I've also done a pass over all of our ops adding commutative markings
for ops which hadn't had them previously.

Fixes #18292

Change-Id: I999b1307272e91965b66754576019dedcbe7527a
Reviewed-on: https://go-review.googlesource.com/38666
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-03-29 16:22:09 +00:00
philhofer
d68bb16b1e cmd/compile/internal/ssa: recognize constant pointer comparison
Teach the backend to recognize that the address of a symbol
is equal with itself, and that the addresses of two different
symbols are different.

Some examples of where this rule hits in the standard library:

 - inlined uses of (*time.Time).setLoc (e.g. time.UTC)
 - inlined uses of bufio.NewReader (via type assertion)

Change-Id: I23dcb068c2ec333655c1292917bec13bbd908c24
Reviewed-on: https://go-review.googlesource.com/38338
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-20 21:04:44 +00:00
Josh Bleecher Snyder
aea3aff669 cmd/compile: separate ssa.Frontend and ssa.TypeSource
Prior to this CL, the ssa.Frontend field was responsible
for providing types to the backend during compilation.
However, the types needed by the backend are few and static.
It makes more sense to use a struct for them
and to hang that struct off the ssa.Config,
which is the correct home for readonly data.
Now that Types is a struct, we can clean up the names a bit as well.

This has the added benefit of allowing early construction
of all types needed by the backend.
This will be useful for concurrent backend compilation.

Passes toolstash-check -all. No compiler performance change.

Updates #15756

Change-Id: I021658c8cf2836d6a22bbc20cc828ac38c7da08a
Reviewed-on: https://go-review.googlesource.com/38336
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-19 00:21:23 +00:00
Josh Bleecher Snyder
2cdb7f118a cmd/compile: move Frontend field from ssa.Config to ssa.Func
Suggested by mdempsky in CL 38232.
This allows us to use the Frontend field
to associate frontend state and information
with a function.
See the following CL in the series for examples.

This is a giant CL, but it is almost entirely routine refactoring.

The ssa test API is starting to feel a bit unwieldy.
I will clean it up separately, once the dust has settled.

Passes toolstash -cmp.

Updates #15756

Change-Id: I71c573bd96ff7251935fce1391b06b1f133c3caf
Reviewed-on: https://go-review.googlesource.com/38327
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-17 23:18:57 +00:00
Josh Bleecher Snyder
193510f2f6 cmd/compile: evaluate config as needed in rewrite rules
Prior to this CL, config was an explicit argument
to the SSA rewrite rules, and rules that needed
a Frontend got at it via config.
An upcoming CL moves Frontend from Config to Func,
so rules can no longer reach Frontend via Config.
Passing a Frontend as an argument to the rewrite rules
causes a 2-3% regression in compile times.
This CL takes a different approach:
It treats the variable names "config" and "fe"
as special and calculates them as needed.
The "as needed part" is also important to performance:
If they are calculated eagerly, the nilchecks themselves
cause a regression.

This introduces a little bit of magic into the rewrite
generator. However, from the perspective of the rules,
the config variable was already more or less magic.
And it makes the upcoming changes much clearer.

Passes toolstash -cmp.

Change-Id: I173f2bcc124cba43d53138bfa3775e21316a9107
Reviewed-on: https://go-review.googlesource.com/38326
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-17 22:35:29 +00:00
Cherry Zhang
c8f38b3398 cmd/compile: use type information in Aux for Store size
Remove size AuxInt in Store, and alignment in Move/Zero. We still
pass size AuxInt to Move/Zero, as it is used for partial Move/Zero
lowering (e.g. cmd/compile/internal/ssa/gen/386.rules:288).
SizeAndAlign is gone.

Passes "toolstash -cmp" on std.

Change-Id: I1ca34652b65dd30de886940e789fcf41d521475d
Reviewed-on: https://go-review.googlesource.com/38150
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-16 14:25:04 +00:00
Cherry Zhang
211c8c9f1a cmd/compile: pass types on SSA Store/Move/Zero ops
For SSA Store/Move/Zero ops, attach the type of the value being
stored to the op as the Aux field. This type will be used for
write barrier insertion (in a followup CL). Since SSA passes
do not accurately propagate types of values (because of type
casting), we can't simply use type of the store's arguments
for write barrier insertion.

Passes "toolstash -cmp" on std.

Updates #17583.

Change-Id: I051d5e5c482931640d1d7d879b2a6bb91f2e0056
Reviewed-on: https://go-review.googlesource.com/36838
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-16 14:22:53 +00:00
philhofer
295307ae78 cmd/compile: de-virtualize interface calls
With this change, code like

    h := sha1.New()
    h.Write(buf)
    sum := h.Sum()

gets compiled into static calls rather than
interface calls, because the compiler is able
to prove that 'h' is really a *sha1.digest.

The InterCall re-write rule hits a few dozen times
during make.bash, and hundreds of times during all.bash.

The most common pattern identified by the compiler
is a constructor like

    func New() Interface { return &impl{...} }

where the constructor gets inlined into the caller,
and the result is used immediately. Examples include
{sha1,md5,crc32,crc64,...}.New, base64.NewEncoder,
base64.NewDecoder, errors.New, net.Pipe, and so on.

Some existing benchmarks that change on darwin/amd64:

Crc64/ISO4KB-8        2.67µs ± 1%    2.66µs ± 0%  -0.36%  (p=0.015 n=10+10)
Crc64/ISO1KB-8         694ns ± 0%     690ns ± 1%  -0.59%  (p=0.001 n=10+10)
Adler32KB-8            473ns ± 1%     471ns ± 0%  -0.39%  (p=0.010 n=10+9)

On architectures like amd64, the reduction in code size
appears to contribute more to benchmark improvements than just
removing the indirect call, since that branch gets predicted
accurately when called in a loop.

Updates #19361

Change-Id: I57d4dc21ef40a05ec0fbd55a9bb0eb74cdc67a3d
Reviewed-on: https://go-review.googlesource.com/38139
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-03-14 18:49:23 +00:00
David Chase
b59a405656 Revert "cmd/compile: de-virtualize interface calls"
This reverts commit 4e0c7c3f61.

Reason for revert: The presence-of-optimization test program is fragile, breaks under noopt, and might break if the Go libraries are tweaked.  It needs to be (re)written without reference to other packages.

Change-Id: I3aaf1ab006a1a255f961a978e9c984341740e3c7
Reviewed-on: https://go-review.googlesource.com/38097
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-13 21:15:32 +00:00
Philip Hofer
4e0c7c3f61 cmd/compile: de-virtualize interface calls
With this change, code like

    h := sha1.New()
    h.Write(buf)
    sum := h.Sum()

gets compiled into static calls rather than
interface calls, because the compiler is able
to prove that 'h' is really a *sha1.digest.

The InterCall re-write rule hits a few dozen times
during make.bash, and hundreds of times during all.bash.

The most common pattern identified by the compiler
is a constructor like

    func New() Interface { return &impl{...} }

where the constructor gets inlined into the caller,
and the result is used immediately. Examples include
{sha1,md5,crc32,crc64,...}.New, base64.NewEncoder,
base64.NewDecoder, errors.New, net.Pipe, and so on.

Some existing benchmarks that change on darwin/amd64:

Crc64/ISO4KB-8        2.67µs ± 1%    2.66µs ± 0%  -0.36%  (p=0.015 n=10+10)
Crc64/ISO1KB-8         694ns ± 0%     690ns ± 1%  -0.59%  (p=0.001 n=10+10)
Adler32KB-8            473ns ± 1%     471ns ± 0%  -0.39%  (p=0.010 n=10+9)

On architectures like amd64, the reduction in code size
appears to contribute more to benchmark improvements than just
removing the indirect call, since that branch gets predicted
accurately when called in a loop.

Updates #19361

Change-Id: Ia9d30afdd5f6b4d38d38b14b88f308acae8ce7ed
Reviewed-on: https://go-review.googlesource.com/37751
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-13 18:24:57 +00:00
Josh Bleecher Snyder
4b3a55ec02 cmd/compile: add 32 bit (AddPtr (Const)) rule
This triggers about 50k times during 32 bit make.bash.

Change-Id: Ia0c2b1a8246b92173b4b0d94a4037626f76b6e73
Reviewed-on: https://go-review.googlesource.com/37998
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-10 17:24:06 +00:00
Michael Munday
945180fe2a cmd/compile: fix OffPtr type in 2-field struct Store rule
The type of the OffPtr for the first field was incorrect. It should
have been a pointer to the field type, rather than the field
type itself.

Fixes #19475.

Change-Id: I3960b404da0f4bee759331126cce6140d2ce1df7
Reviewed-on: https://go-review.googlesource.com/37869
Run-TryBot: Michael Munday <munday@ca.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-09 19:09:56 +00:00
philhofer
a6bd42f263 cmd/compile: emit OffPtr for first field in SSA'd structs
Given

  (Store [c] (OffPtr <T1> [0] (Addr <T> _)) _
    (Store [c] (Addr <T> _) _ _))

dead store elimination doesn't eliminate the inner
Store, because it addresses a type of a different width
than the first store.

When decomposing StructMake operations, always generate
an OffPtr to address struct fields so that dead stores to
the first field of the struct can be optimized away.

benchmarks affected on darwin/amd64:
HTTPClientServer-8        73.2µs ± 1%    72.7µs ± 1%  -0.69%  (p=0.022 n=9+10)
TimeParse-8                304ns ± 1%     300ns ± 0%  -1.61%  (p=0.000 n=9+9)
RegexpMatchEasy1_32-8     80.1ns ± 0%    79.5ns ± 1%  -0.84%  (p=0.000 n=8+9)
GobDecode-8               6.78ms ± 0%    6.81ms ± 1%  +0.46%  (p=0.000 n=9+10)
Gunzip-8                  36.1ms ± 1%    36.2ms ± 0%  +0.37%  (p=0.019 n=10+10)
JSONEncode-8              15.6ms ± 0%    15.7ms ± 0%  +0.69%  (p=0.000 n=9+10)

Change-Id: Ia80d73fd047f9400c616ca64fdee4f438a0e7f21
Reviewed-on: https://go-review.googlesource.com/37769
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-06 23:47:59 +00:00
Giovanni Bajo
4fc45ae879 cmd/compile: improve generic rules for BCE based on AND operations.
Match more patterns generated by the compiler where the index for
a bound check is bounded through a AND operation, with different
register sizes.

These rules trigger a dozen of times in a bootstrap.

Change-Id: Ic9fff16f21d08580f19a366c3ee1a372e58357d1
Reviewed-on: https://go-review.googlesource.com/37442
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-04 16:13:07 +00:00
Cherry Zhang
c8eaeb8cba cmd/compile: remove zeroing after newobject
The Zero op right after newobject has been removed. But this rule
does not cover Store of constant zero (for SSA-able types). Add
rules to cover Store op as well.

Updates #19027.

Change-Id: I5d2b62eeca0aa9ce8dc7205b264b779de01c660b
Reviewed-on: https://go-review.googlesource.com/36836
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-03 20:36:54 +00:00
Cherry Zhang
9b480521d8 cmd/compile: fix optimization of Zero newobject on amd64p32
On amd64p32, PtrSize and RegSize don't agree, and function return
value is aligned with RegSize. Fix this rule. Other architectures
are not affected, where PtrSize and RegSize are the same.

Change-Id: If187d3dfde3dc3b931b8e97db5eeff49a781551b
Reviewed-on: https://go-review.googlesource.com/37720
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-03 20:36:31 +00:00
philhofer
379567aad1 cmd/compile/ssa: more aggressive constant folding
Add rewrite rules that canonicalize the location
of constants in expressions, and fold conststants
that appear in operations that can be trivially
reassociated.

After this change, the compiler constant-folds
expressions like "4 + x - 1" and "4 & x & 1"

Benchmarks affected on darwin/amd64:

name                     old time/op    new time/op    delta
FmtFprintfInt-8            82.1ns ± 1%    81.7ns ± 1%  -0.46%  (p=0.023 n=8+9)
FmtFprintfIntInt-8          122ns ± 2%     120ns ± 2%  -1.48%  (p=0.047 n=10+10)
FmtManyArgs-8               493ns ± 0%     486ns ± 1%  -1.37%  (p=0.000 n=8+10)
Gzip-8                      230ms ± 0%     229ms ± 1%  -0.46%  (p=0.001 n=10+9)
HTTPClientServer-8         74.5µs ± 1%    73.7µs ± 1%  -1.11%  (p=0.000 n=10+10)
JSONDecode-8               51.7ms ± 0%    51.9ms ± 1%  +0.42%  (p=0.017 n=10+9)
RegexpMatchEasy0_32-8      82.6ns ± 1%    81.7ns ± 0%  -1.02%  (p=0.000 n=9+8)
RegexpMatchMedium_32-8      121ns ± 1%     120ns ± 1%  -1.48%  (p=0.001 n=10+10)
Revcomp-8                   426ms ± 1%     400ms ± 1%  -6.16%  (p=0.000 n=10+10)
TimeFormat-8                330ns ± 1%     327ns ± 0%  -0.82%  (p=0.000 n=10+10)

name                     old speed      new speed      delta
Gzip-8                   84.4MB/s ± 0%  84.8MB/s ± 1%  +0.47%  (p=0.001 n=10+9)
JSONDecode-8             37.6MB/s ± 0%  37.4MB/s ± 1%  -0.42%  (p=0.016 n=10+9)
RegexpMatchEasy0_32-8     387MB/s ± 1%   392MB/s ± 0%  +1.06%  (p=0.000 n=9+8)
RegexpMatchMedium_32-8   8.21MB/s ± 1%  8.34MB/s ± 1%  +1.58%  (p=0.000 n=10+9)
Revcomp-8                 597MB/s ± 1%   636MB/s ± 1%  +6.57%  (p=0.000 n=10+10)

Change-Id: Ie37ff91605b76a984a8400dfd1e34f50bf61c864
Reviewed-on: https://go-review.googlesource.com/37290
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-28 20:25:33 +00:00
Michael Munday
bd8a39b67a cmd/compile: emit fused multiply-{add,subtract} instructions on s390x
Explcitly block fused multiply-add pattern matching when a cast is used
after the multiplication, for example:

    - (a * b) + c        // can emit fused multiply-add
    - float64(a * b) + c // cannot emit fused multiply-add

float{32,64} and complex{64,128} casts of matching types are now kept
as OCONV operations rather than being replaced with OCONVNOP operations
because they now imply a rounding operation (and therefore aren't a
no-op anymore).

Operations (for example, multiplication) on complex types may utilize
fused multiply-add and -subtract instructions internally. There is no
way to disable this behavior at the moment.

Improves the performance of the floating point implementation of
poly1305:

name         old speed     new speed     delta
64           246MB/s ± 0%  275MB/s ± 0%  +11.48%   (p=0.000 n=10+8)
1K           312MB/s ± 0%  357MB/s ± 0%  +14.41%  (p=0.000 n=10+10)
64Unaligned  246MB/s ± 0%  274MB/s ± 0%  +11.43%  (p=0.000 n=10+10)
1KUnaligned  312MB/s ± 0%  357MB/s ± 0%  +14.39%   (p=0.000 n=10+8)

Updates #17895.

Change-Id: Ia771d275bb9150d1a598f8cc773444663de5ce16
Reviewed-on: https://go-review.googlesource.com/36963
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-02-28 15:34:20 +00:00
Josh Bleecher Snyder
417f49a363 cmd/compile: fold (NegNN (ConstNN ...))
Fix up and enable a few rules.
They trigger a handful of times in std,
despite the frontend handling.

Change-Id: I83378c057cbbc95a4f2b58cd8c36aec0e9dc547f
Reviewed-on: https://go-review.googlesource.com/37227
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-02-28 04:43:35 +00:00
Michael Munday
72a071c1da cmd/compile: rewrite pairs of shifts to extensions
Replaces pairs of shifts with sign/zero extension where possible.

For example:
(uint64(x) << 32) >> 32 -> uint64(uint32(x))

Reduces the execution time of the following code by ~4.5% on s390x:

for i := 0; i < N; i++ {
        x += (uint64(i)<<32)>>32
}

Change-Id: Idb2d56f27e80a2e1366bc995922ad3fd958c51a7
Reviewed-on: https://go-review.googlesource.com/37292
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-02-22 21:31:03 +00:00
Keith Randall
a9292b833b cmd/compile: fix 32-bit unsigned division on 64-bit machines
The type of an intermediate multiply was wrong.  When that
intermediate multiply was spilled, the top 32 bits were lost.

Fixes #19153

Change-Id: Ib29350a4351efa405935b7f7ee3c112668e64108
Reviewed-on: https://go-review.googlesource.com/37212
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-02-17 22:21:04 +00:00
Keith Randall
708ba22a0c cmd/compile: move constant divide strength reduction to SSA rules
Currently the conversion from constant divides to multiplies is mostly
done during the walk pass.  This is suboptimal because SSA can
determine that the value being divided by is constant more often
(e.g. after inlining).

Change-Id: If1a9b993edd71be37396b9167f77da271966f85f
Reviewed-on: https://go-review.googlesource.com/37015
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-02-17 06:16:44 +00:00
Josh Bleecher Snyder
47d2a4dafa cmd/compile: remove walkmul
Replace with generic rewrite rules.

Change-Id: I3ee32076cfd9db5801f1a7bdbb73a994255884a9
Reviewed-on: https://go-review.googlesource.com/36323
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-02-06 07:21:14 +00:00
Josh Bleecher Snyder
12c58bbf81 cmd/compile: optimize (ZeroExt (Const [c]))
These rules trigger 116 times while running make.bash.
And at least for the sample code at
https://github.com/golang/go/issues/18906#issuecomment-277174241
they are providing optimizations not already present
in amd64.

Updates #18906

Change-Id: I410a480f566f5ab176fc573fb5ac74f9cffec225
Reviewed-on: https://go-review.googlesource.com/36217
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-03 06:52:46 +00:00
Cherry Zhang
6ad2d6aa92 cmd/compile: simplify IsNonNil ConstNil
Change-Id: I9ed5a2065cef06708e319b16c801da2eff42004e
Reviewed-on: https://go-review.googlesource.com/35497
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-02-02 21:28:50 +00:00
Robert Griesemer
cfd17f51c8 [dev.inline] cmd/compile/internal/ssa: rename various fields from Line to Pos
This is a mostly mechanical rename followed by manual fixes where necessary.

Change-Id: Ie5c670b133db978f15dc03e50dc2da0c80fc8842
Reviewed-on: https://go-review.googlesource.com/34137
Reviewed-by: David Lazar <lazard@golang.org>
2016-12-08 21:36:52 +00:00
Robert Griesemer
82d0caea2c [dev.inline] cmd/internal/src: make Pos implementation abstract
Adjust cmd/compile accordingly.

This will make it easier to replace the underlying implementation.

Change-Id: I33645850bb18c839b24785b6222a9e028617addb
Reviewed-on: https://go-review.googlesource.com/34133
Reviewed-by: David Lazar <lazard@golang.org>
2016-12-08 21:31:28 +00:00
Keith Randall
688995d1e9 cmd/compile: do more type conversion inline
The code to do the conversion is smaller than the
call to the runtime.
The 1-result asserts need to call panic if they fail, but that
code is out of line.

The only conversions left in the runtime are those which
might allocate and those which might need to generate an itab.

Given the following types:
  type E interface{}
  type I interface { foo() }
  type I2 iterface { foo(); bar() }
  type Big [10]int
  func (b Big) foo() { ... }

This CL inlines the following conversions:

was assertE2T
  var e E = ...
  b := i.(Big)
was assertE2T2
  var e E = ...
  b, ok := i.(Big)
was assertI2T
  var i I = ...
  b := i.(Big)
was assertI2T2
  var i I = ...
  b, ok := i.(Big)
was assertI2E
  var i I = ...
  e := i.(E)
was assertI2E2
  var i I = ...
  e, ok := i.(E)

These are the remaining runtime calls:

convT2E:
  var b Big = ...
  var e E = b
convT2I:
  var b Big = ...
  var i I = b
convI2I:
  var i2 I2 = ...
  var i I = i2
assertE2I:
  var e E = ...
  i := e.(I)
assertE2I2:
  var e E = ...
  i, ok := e.(I)
assertI2I:
  var i I = ...
  i2 := i.(I2)
assertI2I2:
  var i I = ...
  i2, ok := i.(I2)

Fixes #17405
Fixes #8422

Change-Id: Ida2367bf8ce3cd2c6bb599a1814f1d275afabe21
Reviewed-on: https://go-review.googlesource.com/32313
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-11-02 21:33:03 +00:00
Keith Randall
741445068f cmd/compile: make [0]T and [1]T SSAable types
We used to have to keep on-stack copies of these types.
Now they can be registerized.

[0]T is kind of trivial but might as well handle it.

This change enables another change I'm working on to improve how x.(T)
expressions are handled (#17405).  This CL helps because now all
types that are direct interface types are registerizeable (e.g. [1]*byte).

No higher-degree arrays for now because non-constant indexes are hard.

Update #17405

Change-Id: I2399940965d17b3969ae66f6fe447a8cefdd6edd
Reviewed-on: https://go-review.googlesource.com/32416
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-10-31 19:44:19 +00:00
Keith Randall
ac74225dcc cmd/compile: remove redundant extension after shift
var x uint64
uint8(x >> 56)

We don't need to generate any code for the uint8().

Update #15090

Change-Id: Ie1ca4e32022dccf7f7bc42d531a285521fb67872
Reviewed-on: https://go-review.googlesource.com/28191
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-10-27 21:21:59 +00:00
Keith Randall
deb4177cf0 cmd/compile: use masks instead of branches for slicing
When we do

  var x []byte = ...
  y := x[i:]

We can't just use y.ptr = x.ptr + i, as the new pointer may point to the
next object in memory after the backing array.
We used to fix this by doing:

  y.cap = x.cap - i
  delta := i
  if y.cap == 0 {
    delta = 0
  }
  y.ptr = x.ptr + delta

That generates a branch in what is otherwise straight-line code.

Better to do:

  y.cap = x.cap - i
  mask := (y.cap - 1) >> 63 // -1 if y.cap==0, 0 otherwise
  y.ptr = x.ptr + i &^ mask

It's about the same number of instructions (~4, depending on what
parts are constant, and the target architecture), but it is all
inline. It plays nicely with CSE, and the mask can be computed
in parallel with the index (in cases where a multiply is required).

It is a minor win in both speed and space.

Change-Id: Ied60465a0b8abb683c02208402e5bb7ac0e8370f
Reviewed-on: https://go-review.googlesource.com/32022
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-10-27 20:22:49 +00:00
David Chase
0f29942489 cmd/compile: Repurpose old sliceopt.go for prove phase.
Adapt old test for prove's bounds check elimination.
Added missing rule to generic rules that lead to differences
between 32 and 64 bit platforms on sliceopt test.
Added debugging to prove.go that was helpful-to-necessary to
discover that missing rule.
Lowered debugging level on prove.go from 3 to 1; no idea
why it was previously 3.

Change-Id: I09de206aeb2fced9f2796efe2bfd4a59927eda0c
Reviewed-on: https://go-review.googlesource.com/23290
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-10-20 23:50:19 +00:00
Cherry Zhang
4d07d3e29c cmd/compile: re-enable nilcheck removal for newobject
Also add compiler debug ouput and add a test.

Fixes #15390.

Change-Id: Iceba1414c29bcc213b87837387bf8ded1f3157f1
Reviewed-on: https://go-review.googlesource.com/30011
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-09-28 19:41:49 +00:00
Keith Randall
3134ab3c2d cmd/compile: redo nil checks
Get rid of BlockCheck. Josh goaded me into it, and I went
down a rabbithole making it happen.

NilCheck now panics if the pointer is nil and returns void, as before.
BlockCheck is gone, and NilCheck is no longer a Control value for
any block. It just exists (and deadcode knows not to throw it away).

I rewrote the nilcheckelim pass to handle this case.  In particular,
there can now be multiple NilCheck ops per block.

I moved all of the arch-dependent nil check elimination done as
part of ssaGenValue into its own proper pass, so we don't have to
duplicate that code for every architecture.

Making the arch-dependent nil check its own pass means I needed
to add a bunch of flags to the opcode table so I could write
the code without arch-dependent ops everywhere.

Change-Id: I419f891ac9b0de313033ff09115c374163416a9f
Reviewed-on: https://go-review.googlesource.com/29120
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-09-15 02:42:13 +00:00
Martin Möhrmann
00459f05e0 cmd/compile: fold negation into comparison operators
This allows for example AMD64 ssa to generate
(SETNE x) instead of (XORLconst [1] SETE).

make.bash trigger count on AMD64:
691 generic.rules:225
  1 generic.rules:226
  4 generic.rules:228
  1 generic.rules:229
  8 generic.rules:231
  6 generic.rules:238
  2 generic.rules:257

Change-Id: I5b9827b2df63c8532675079e5a6026aa47bfd8dc
Reviewed-on: https://go-review.googlesource.com/28232
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-08-31 20:24:03 +00:00
Cherry Zhang
b2e0e9688a cmd/compile: remove Zero and NilCheck for newobject
Recognize runtime.newobject and don't Zero or NilCheck it.

Fixes #15914 (?)
Updates #15390.

TBD: add test

Change-Id: Ia3bfa5c2ddbe2c27c92d9f68534a713b5ce95934
Reviewed-on: https://go-review.googlesource.com/27930
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-08-30 23:10:43 +00:00
Josh Bleecher Snyder
5c3edc46a6 cmd/compile: add Trunc-of-Ext simplifications
This is a follow-up to the discussion
in CL 27853.

During make.bash, trigger count:

24 rewrite generic.rules:57
22 rewrite generic.rules:69
10 rewrite generic.rules:54
10 rewrite generic.rules:58
10 rewrite generic.rules:67
 7 rewrite generic.rules:66
 4 rewrite generic.rules:59
 3 rewrite generic.rules:50
 3 rewrite generic.rules:51
 3 rewrite generic.rules:52
 1 rewrite generic.rules:64

Change-Id: Id96cb6a707a4a564831f763c2d4d0e180c94bbef
Reviewed-on: https://go-review.googlesource.com/28088
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Martin Möhrmann <martisch@uos.de>
2016-08-30 20:53:36 +00:00
Kevin Burke
8e24a98abe cmd/compile: precompute constant square roots
If a program wants to evaluate math.Sqrt for any constant value
(for example, math.Sqrt(3)), we can replace that expression with
its evaluation (1.7320508075688772) at compile time, instead of
generating a SQRT assembly command or equivalent.

Adds tests that math.Sqrt generates the correct values. I also
compiled a short program and verified that the Sqrt expression was
replaced by a constant value in the "after opt" step.

Adds a short doc to the top of generic.rules explaining what the file
does and how other files interact with it.

Fixes #15543.

Change-Id: I6b6e63ac61cec50763a09ba581024adeee03d4fa
Reviewed-on: https://go-review.googlesource.com/27457
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2016-08-23 17:30:55 +00:00
Josh Bleecher Snyder
6a1153acb4 [dev.ssa] cmd/compile: refactor out rulegen value parsing
Previously, genMatch0 and genResult0 contained
lots of duplication: locating the op, parsing
the value, validation, etc.
Parsing and validation was mixed in with code gen.

Extract a helper, parseValue. It is responsible
for parsing the value, locating the op, and doing
shared validation.

As a bonus (and possibly as my original motivation),
make op selection pay attention to the number
of args present.
This allows arch-specific ops to share a name
with generic ops as long as there is no ambiguity.
It also detects and reports unresolved ambiguity,
unlike before, where it would simply always
pick the generic op, with no warning.

Also use parseValue when generating the top-level
op dispatch, to ensure its opinion about ops
matches genMatch0 and genResult0.

The order of statements in the generated code used
to depend on the exact rule. It is now somewhat
independent of the rule. That is the source
of some of the generated code changes in this CL.
See rewritedec64 and rewritegeneric for examples.
It is a one-time change.

The op dispatch switch and functions used to be
sorted by opname without architecture. The sort
now includes the architecture, leading to further
generated code changes.
See rewriteARM and rewriteAMD64 for examples.
Again, it is a one-time change.

There are no functional changes.

Change-Id: I22c989183ad5651741ebdc0566349c5fd6c6b23c
Reviewed-on: https://go-review.googlesource.com/24649
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2016-08-03 22:51:51 +00:00
Cherry Zhang
7d70f84f54 [dev.ssa] cmd/compile: add floating point optimizations in SSA for ARM
Add some simplification rules for floating point ops.

cmd/internal/obj/arm supports instructions that compare FP register
to 0, but runtime softfloat simulator does not. This CL adds these
instructions to softfloat simulator as well.

Updates #15365.

Change-Id: I29405b2bfcb4c8cf106cb7a1a811409fec91b170
Reviewed-on: https://go-review.googlesource.com/24790
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-07-16 03:13:22 +00:00
Josh Bleecher Snyder
f0bab31660 [dev.ssa] cmd/compile: add some constant folding optimizations
These were helpful for some autogenerated code
I'm working with.

Change-Id: I7b89c69552ca99bf560a14bfbcd6bd238595ddf6
Reviewed-on: https://go-review.googlesource.com/24742
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-07-06 16:06:57 +00:00
Cherry Zhang
f55317828b [dev.ssa] cmd/compile: ensure alignment for Zero and Move in SSA for ARM
Encode the size and the alignment into AuxInt of Zero and Move ops.
On AMD64, we simply don't look at the alignment. On ARM and PPC64, we
only generate aligned stores.

Updates #15365.

Change-Id: Ifdcc205c364f67c4516b9adebfe7d50d223b6863
Reviewed-on: https://go-review.googlesource.com/24511
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-07-02 22:22:12 +00:00
Keith Randall
1739657513 cmd/compile: shift tests, fix triple-shift rules
Add a bunch of tests for shifts.

Fix triple-shift rules to always take constant shifts as 64 bits.
(Earlier rules always promote shift amounts to 64 bits.)
Add overflow checks.

Increases generic rule coverage to 91%

Change-Id: I6b42d368d19d36ac482dbb8e0d4f67e30ad7145d
Reviewed-on: https://go-review.googlesource.com/23555
Reviewed-by: Todd Neal <todd@tneal.org>
2016-05-29 20:36:21 +00:00