Commit graph

11 commits

Author SHA1 Message Date
Paul E. Murphy
41c71d48a1 cmd/compile/internal: add RLDICR opcode for PPC64
This is encoded similarly to RLDICL, but can clear the least
significant bits.

Likewise, update the auxint encoding of RLDICL to match those
used by the rotate and mask word ssa opcodes for easier usage
within lowering rules. The RLDICL ssa opcode is not used yet.

Change-Id: I42486dd95714a3e8e2f19ab237a6cf3af520c905
Reviewed-on: https://go-review.googlesource.com/c/go/+/515575
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-08-14 20:31:17 +00:00
Keith Randall
cedf5008a8 cmd/compile: introduce separate memory op combining pass
Memory op combining is currently done using arch-specific rewrite rules.
Instead, do them as a arch-independent rewrite pass. This ensures that
all architectures (with unaligned loads & stores) get equal treatment.

This removes a lot of rewrite rules.

The new pass is a bit more comprehensive. It handles things like out-of-order
writes and is careful not to apply partial optimizations that then block
further optimizations.

Change-Id: I780ff3bb052475cd725a923309616882d25b8d9e
Reviewed-on: https://go-review.googlesource.com/c/go/+/478475
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2023-04-21 21:05:46 +00:00
Keith Randall
21d82e6ac8 cmd/compile: batch write barrier calls
Have the write barrier call return a pointer to a buffer into which
the generated code records pointers that need write barrier treatment.

Change-Id: I7871764298e0aa1513de417010c8d46b296b199e
Reviewed-on: https://go-review.googlesource.com/c/go/+/447781
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Bypass: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-02-24 00:21:13 +00:00
Paul E. Murphy
f9da938614 cmd/compile: remove unused ISELB PPC64 ssa opcode
The usage of ISELB has been removed as part of changes
made to support Power10 SETBC instructions.

Change-Id: I2fce4370f48c1eeee65d411dfd1bea4201f45b45
Reviewed-on: https://go-review.googlesource.com/c/go/+/465575
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2023-02-07 17:17:35 +00:00
Archana R
a432d89137 cmd/compile: add rules to emit SETBC/R instructions on power10
This CL adds rules that replaces instances of ISEL that produce
a boolean result based on a condition register by SETBC/SETBCR
operations. On Power10 these are convereted to SETBC/SETBCR
instructions that use one register instead of 3 registers
conventionally used by ISEL and hence reduces register pressure.
On loops written specifically to exercise such instances of ISEL
extensively, a performance improvement of 2.5% is seen on Power10.
Also added verification tests to verify correct generation of
SETBC/SETBCR instructions on Power10.

Change-Id: Ib719897f09d893de40324440a43052dca026e8fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/449795
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-02-06 12:49:53 +00:00
Archana R
cd1fc87156 cmd/compile: intrinsify math/bits/ReverseBytes{16|32|64} for ppc64/power10
This change intrinsifies ReverseBytes{16|32|64} by generating the
corresponding new instructions in Power10: brh, brd and brw and
adds a verification test for the same.
On Power 9 and 8, the .go code performs optimally as it is.

Performance improvement seen on Power10:
ReverseBytes32  1.38ns ± 0%  1.18ns ± 0%  -14.2
ReverseBytes64  1.52ns ± 0%  1.11ns ± 0%  -26.87
ReverseBytes16  1.41ns ± 1%  1.18ns ± 0%  -16.47

Change-Id: I88f127f3ab9ba24a772becc21ad90acfba324b37
Reviewed-on: https://go-review.googlesource.com/c/go/+/446675
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-02-03 19:01:06 +00:00
Keith Randall
45dc81d856 cmd/compile: add memory argument to GetCallerSP
We need to make sure that when we get the stack pointer, we get it
at the right time.

V = GetCallerSP
Call()
W = GetCallerSP

If Call causes a stack growth, then we will be in a situation
where V != W. So it matters when GetCallerSP operations get scheduled.
Add a memory argument to GetCallerSP so it can't be reordered with
things like calls.

Change-Id: I6cc801134c38e358c5a1ec0c09d38379a16a4184
Reviewed-on: https://go-review.googlesource.com/c/go/+/453515
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Martin Möhrmann <martin@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-01-19 22:43:22 +00:00
Keith Randall
888047c310 cmd/compile: fix conditional move rule on PPC64
Similar to CL 456556 but for ppc64 instead of arm64.

Change docs about how booleans are stored in registers for ppc64.
We now don't promise to keep the upper bits zeroed; they might be junk.

To test, I changed the boolean generation instructions (MOVBZload* and ISEL*
with boolean type) to OR in 0x100 to the result. all.bash still passed,
so I think nothing else is depending on the upper bits of booleans.

Update #57184

Change-Id: Ie66f8934a0dafa34d0a8c2a37324868d959a852c
Reviewed-on: https://go-review.googlesource.com/c/go/+/456437
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: KAMPANAT THUMWONG (KONG PC) <1992kongpc.kth@gmail.com>
Run-TryBot: Archana Ravindar <aravind5@in.ibm.com>
2022-12-11 16:34:38 +00:00
Paul E. Murphy
dc6b7c86df cmd/compile: merge zero constant ISEL in PPC64 lateLower pass
Add a new SSA opcode ISELZ, similar to ISELB to represent a select
of value or 0. Then, merge candidate ISEL opcodes inside the late
lower pass.

This avoids complicating rules within the the lower pass.

Change-Id: I3b14c94b763863aadc834b0e910a85870c131313
Reviewed-on: https://go-review.googlesource.com/c/go/+/442596
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
2022-11-14 19:44:47 +00:00
Lynn Boger
d0b0b10b5c cmd/compile: leverage cc ops in more cases on ppc64x
This updates some rules to use ops with CC variations to
set the condition code when the result of the operation is
zero. This allows the following compare with zero to be
removed since the equivalent condition code has already been
set.

In addition, a previous rule change to use ANDCCconst was modified
to allow any constant value, not just 1 in some cases.

Improvements in the reflect package benchmarks:

DeepEqual/int8-4                    23.9ns ± 1%    23.1ns ± 1%   -3.57%  (p=0.029 n=4+4)
DeepEqual/[]int8-4                   109ns ± 2%     102ns ± 1%   -6.67%  (p=0.029 n=4+4)
DeepEqual/int16-4                   23.8ns ± 1%    22.8ns ± 0%   -3.97%  (p=0.029 n=4+4)
DeepEqual/[]int16-4                  108ns ± 1%     102ns ± 0%   -6.25%  (p=0.029 n=4+4)
DeepEqual/int32-4                   24.9ns ± 3%    23.6ns ± 0%   -5.09%  (p=0.029 n=4+4)
DeepEqual/[]int32-4                  109ns ± 1%     103ns ± 0%   -5.64%  (p=0.029 n=4+4)
DeepEqual/int64-4                   25.5ns ± 1%    23.7ns ± 0%   -7.03%  (p=0.029 n=4+4)
DeepEqual/[]int64-4                  109ns ± 1%     102ns ± 0%   -6.73%  (p=0.029 n=4+4)
DeepEqual/int-4                     23.2ns ± 1%    22.7ns ± 0%   -2.05%  (p=0.029 n=4+4)
DeepEqual/[]int-4                    109ns ± 3%     101ns ± 0%   -7.18%  (p=0.029 n=4+4)
DeepEqual/uint8-4                   23.9ns ± 1%    23.5ns ± 0%   -1.69%  (p=0.029 n=4+4)
DeepEqual/[]uint8-4                 89.1ns ± 0%    85.6ns ± 1%   -3.95%  (p=0.029 n=4+4)
DeepEqual/uint16-4                  24.0ns ± 1%    23.8ns ± 0%   -0.76%  (p=0.343 n=4+4)
DeepEqual/[]uint16-4                 111ns ± 0%     106ns ± 4%   -4.74%  (p=0.029 n=4+4)
DeepEqual/uint32-4                  23.5ns ± 1%    23.0ns ± 0%   -2.15%  (p=0.029 n=4+4)
DeepEqual/[]uint32-4                 110ns ± 1%     104ns ± 0%   -5.66%  (p=0.029 n=4+4)
DeepEqual/uint64-4                  24.6ns ± 1%    24.3ns ± 0%   -1.10%  (p=0.143 n=4+4)
DeepEqual/[]uint64-4                 111ns ± 0%     105ns ± 1%   -5.16%  (p=0.029 n=4+4)
DeepEqual/uint-4                    23.6ns ± 0%    23.0ns ± 0%   -2.70%  (p=0.029 n=4+4)
DeepEqual/[]uint-4                   109ns ± 0%     103ns ± 1%   -5.74%  (p=0.029 n=4+4)
DeepEqual/uintptr-4                 25.1ns ± 1%    24.8ns ± 2%   -1.11%  (p=0.171 n=4+4)
DeepEqual/[]uintptr-4                111ns ± 0%     106ns ± 1%   -4.45%  (p=0.029 n=4+4)
DeepEqual/float32-4                 22.5ns ± 0%    22.2ns ± 0%   -1.29%  (p=0.029 n=4+4)
DeepEqual/[]float32-4                105ns ± 0%     101ns ± 1%   -3.75%  (p=0.029 n=4+4)
DeepEqual/float64-4                 22.7ns ± 2%    22.1ns ± 0%   -2.52%  (p=0.029 n=4+4)
DeepEqual/[]float64-4                105ns ± 1%     103ns ± 1%   -2.77%  (p=0.029 n=4+4)
DeepEqual/complex64-4               22.9ns ± 0%    22.8ns ± 0%   -0.48%  (p=0.029 n=4+4)
DeepEqual/[]complex64-4              107ns ± 0%     101ns ± 0%   -5.48%  (p=0.029 n=4+4)
DeepEqual/complex128-4              23.2ns ± 1%    22.6ns ± 0%   -2.34%  (p=0.029 n=4+4)
DeepEqual/[]complex128-4             107ns ± 0%     101ns ± 0%   -5.60%  (p=0.029 n=4+4)
DeepEqual/bool-4                    22.0ns ± 1%    21.7ns ± 0%   -1.44%  (p=0.029 n=4+4)
DeepEqual/[]bool-4                   106ns ± 1%     100ns ± 0%   -5.42%  (p=0.029 n=4+4)
DeepEqual/string-4                  26.7ns ± 1%    24.7ns ± 0%   -7.47%  (p=0.029 n=4+4)
DeepEqual/[]string-4                 112ns ± 0%     107ns ± 0%   -4.21%  (p=0.029 n=4+4)
DeepEqual/[]uint8#01-4              89.4ns ± 1%    85.5ns ± 1%   -4.44%  (p=0.029 n=4+4)
DeepEqual/[][]uint8-4                177ns ± 0%     173ns ± 1%   -2.22%  (p=0.029 n=4+4)
DeepEqual/[6]uint8-4                 137ns ± 1%     137ns ± 0%   -0.56%  (p=0.057 n=4+4)
DeepEqual/[][6]uint8-4               232ns ± 0%     230ns ± 1%   -1.09%  (p=0.029 n=4+4)

Change-Id: I275624e21dc4d70001032be48897f1504cbfdd1c
Reviewed-on: https://go-review.googlesource.com/c/go/+/427634
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-10-07 13:39:30 +00:00
Russ Cox
164406ad93 cmd/compile: rename gen and builtin to _gen and _builtin
These two directories are full of //go:build ignore files.
We can ignore them more easily by putting an underscore
at the start of the name. That also works around a bug
in Go 1.17 that was not fixed until Go 1.17.3.

Change-Id: Ia5389b65c79b1e6d08e4fef374d335d776d44ead
Reviewed-on: https://go-review.googlesource.com/c/go/+/435472
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-10-04 19:35:46 +00:00
Renamed from src/cmd/compile/internal/ssa/gen/PPC64Ops.go (Browse further)