This combines several CLs into a single patch of "glue"
for the generated SIMD extensions.
This glue includes GOEXPERIMENT checks that disable
the creation of user-visible "simd" types and
that disable the registration of "simd" intrinsics.
The simd type checks were changed to work for either
package "simd" or "internal/simd" so that moving that
package won't be quite so fragile.
cmd/compile, internal/simd: glue for adding SIMD extensions to Go
cmd/compile: theft of Cherry's sample SIMD compilation
Change-Id: Id44e2f4bafe74032c26de576a8691b6f7d977e01
Reviewed-on: https://go-review.googlesource.com/c/go/+/675598
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
This enables publicationBarrier to be used as an intrinsic on mipsx.
Change-Id: Ic199f34b84b3058bcfab79aac8f2399ff21a97ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/674856
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
This enables publicationBarrier to be used as an intrinsic on mips64x.
Change-Id: I4030ea65086c37ee1dcc1675d0d5d40ef8683851
Reviewed-on: https://go-review.googlesource.com/c/go/+/674855
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This CL enables intrinsic support to emit the following prefetch
instructions for loong64 platform:
1.Prefetch - prefetches data from memory address to cache;
2.PrefetchStreamed - prefetches data from memory address, with a
hint that this data is being streamed.
Benchmarks picked from go/test/bench/garbage
Parameters tested with:
GOMAXPROCS=8
tree2 -heapsize=1000000000 -cpus=8
tree -n=18
parser
peano
Benchmarks Loongson-3A6000-HV @ 2500.00MHz:
| bench.old | bench.new |
| sec/op | sec/op vs base |
Tree2-8 1238.2µ ± 24% 999.9µ ± 453% ~ (p=0.089 n=10)
Tree-8 277.4m ± 1% 275.5m ± 1% ~ (p=0.063 n=10)
Parser-8 3.564 ± 0% 3.509 ± 1% -1.56% (p=0.000 n=10)
Peano-8 39.12m ± 2% 38.85m ± 2% ~ (p=0.353 n=10)
geomean 83.19m 78.28m -5.90%
Change-Id: I59e9aa4f609a106d4f70706e6d6d1fe6738ab72a
Reviewed-on: https://go-review.googlesource.com/c/go/+/671876
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: Cherry Mui <cherryyz@google.com>
For riscv64/rva22u64 and above, we can intrinsify math/bits.OnesCount
using the CPOP/CPOPW machine instructions. Since the native Go
implementation of OnesCount is relatively expensive, it is also
worth emitting a check for Zbb support when compiled for rva20u64.
On a Banana Pi F3, with GORISCV64=rva22u64:
│ oc.1 │ oc.2 │
│ sec/op │ sec/op vs base │
OnesCount-8 16.930n ± 0% 4.389n ± 0% -74.08% (p=0.000 n=10)
OnesCount8-8 5.642n ± 0% 5.016n ± 0% -11.10% (p=0.000 n=10)
OnesCount16-8 9.404n ± 0% 5.015n ± 0% -46.67% (p=0.000 n=10)
OnesCount32-8 13.165n ± 0% 4.388n ± 0% -66.67% (p=0.000 n=10)
OnesCount64-8 16.300n ± 0% 4.388n ± 0% -73.08% (p=0.000 n=10)
geomean 11.40n 4.629n -59.40%
On a Banana Pi F3, compiled with GORISCV64=rva20u64 and with Zbb
detection enabled:
│ oc.3 │ oc.4 │
│ sec/op │ sec/op vs base │
OnesCount-8 16.930n ± 0% 5.643n ± 0% -66.67% (p=0.000 n=10)
OnesCount8-8 5.642n ± 0% 5.642n ± 0% ~ (p=0.447 n=10)
OnesCount16-8 10.030n ± 0% 6.896n ± 0% -31.25% (p=0.000 n=10)
OnesCount32-8 13.170n ± 0% 5.642n ± 0% -57.16% (p=0.000 n=10)
OnesCount64-8 16.300n ± 0% 5.642n ± 0% -65.39% (p=0.000 n=10)
geomean 11.55n 5.873n -49.16%
On a Banana Pi F3, compiled with GORISCV64=rva20u64 but with Zbb
detection disabled:
│ oc.3 │ oc.5 │
│ sec/op │ sec/op vs base │
OnesCount-8 16.93n ± 0% 29.47n ± 0% +74.07% (p=0.000 n=10)
OnesCount8-8 5.642n ± 0% 5.643n ± 0% ~ (p=0.191 n=10)
OnesCount16-8 10.03n ± 0% 15.05n ± 0% +50.05% (p=0.000 n=10)
OnesCount32-8 13.17n ± 0% 18.18n ± 0% +38.04% (p=0.000 n=10)
OnesCount64-8 16.30n ± 0% 21.94n ± 0% +34.60% (p=0.000 n=10)
geomean 11.55n 15.84n +37.16%
For hardware without Zbb, this adds ~5ns overhead, while for hardware
with Zbb we achieve a performance gain up of up to 11ns. It is worth
noting that OnesCount8 is cheap enough that it is preferable to stick
with the generic version in this case.
Change-Id: Id657e40e0dd1b1ab8cc0fe0f8a68df4c9f2d7da5
Reviewed-on: https://go-review.googlesource.com/c/go/+/660856
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Currently, only amd64 has an intrinsic for math/bits.OnesCount, which
generates the same code as math/bits.OnesCount64. Replace this with
an alias that maps math/bits.OnesCount to math/bits.OnesCount64 on
64 bit platforms.
Change-Id: Ifa12a2173a201aacd52c3c22b9a948be6e314405
Reviewed-on: https://go-review.googlesource.com/c/go/+/659215
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Rather than using a specific intrinsic for math/bits.Len, use a pair of
aliases instead. This requires less code and automatically adapts when
platforms have a math/bits.Len32 or math/bits.Len64 intrinsic.
Change-Id: I28b300172daaee26ef82a7530d9e96123663f541
Reviewed-on: https://go-review.googlesource.com/c/go/+/656995
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Every OS uses FMA now.
Change-Id: Ia7ffa77c52c45aefca611ddc54e9dfffb27a48da
Reviewed-on: https://go-review.googlesource.com/c/go/+/655877
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Decompose Ctz16 and Ctz8 within the SSA rules for LOONG64, MIPS, PPC64
and S390X, rather than having a custom intrinsic. Note that for PPC64 this
actually allows the existing Ctz16 and Ctz8 rules to be used.
Change-Id: I27a5e978f852b9d75396d2a80f5d7dfcb5ef7dd4
Reviewed-on: https://go-review.googlesource.com/c/go/+/651816
Reviewed-by: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Decompose BitLen16 and BitLen8 within the SSA rules for architectures that
support BitLen32 or BitLen64, rather than having a custom intrinsic.
Change-Id: Ie4188ce69d1021e63cec27a8e7418efb0714812b
Reviewed-on: https://go-review.googlesource.com/c/go/+/651817
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Use similar SIMD operations to the ones used in Abseil. We still
using 8-slot groups (even though the XMM registers could handle 16-slot
groups) to keep the implementation simpler (no changes to the memory
layout of maps).
Still, the implementations of matchH2 and matchEmpty are shorter than
the portable version using standard arithmetic operations. They also
return a packed bitset, which avoids the need to shift in bitset.first.
That said, the packed bitset is a downside in cognitive complexity, as
we have to think about two different possible representations. This
doesn't leak out of the API, but we do need to intrinsify bitset to
switch to a compatible implementation.
The compiler's intrinsics don't support intrinsifying methods, so the
implementations move to free functions.
This makes operations between 0-3% faster on my machine. e.g.,
MapGetHit/impl=runtimeMap/t=Int64/len=6-12 12.34n ± 1% 11.42n ± 1% -7.46% (p=0.000 n=25)
MapGetHit/impl=runtimeMap/t=Int64/len=12-12 15.14n ± 2% 14.88n ± 1% -1.72% (p=0.009 n=25)
MapGetHit/impl=runtimeMap/t=Int64/len=18-12 15.04n ± 6% 14.66n ± 2% -2.53% (p=0.000 n=25)
MapGetHit/impl=runtimeMap/t=Int64/len=24-12 15.80n ± 1% 15.48n ± 3% ~ (p=0.444 n=25)
MapGetHit/impl=runtimeMap/t=Int64/len=30-12 15.55n ± 4% 14.77n ± 3% -5.02% (p=0.004 n=25)
MapGetHit/impl=runtimeMap/t=Int64/len=64-12 15.26n ± 1% 15.05n ± 1% ~ (p=0.055 n=25)
MapGetHit/impl=runtimeMap/t=Int64/len=128-12 15.34n ± 1% 15.02n ± 2% -2.09% (p=0.000 n=25)
MapGetHit/impl=runtimeMap/t=Int64/len=256-12 15.42n ± 1% 15.15n ± 1% -1.75% (p=0.001 n=25)
MapGetHit/impl=runtimeMap/t=Int64/len=512-12 15.48n ± 1% 15.18n ± 1% -1.94% (p=0.000 n=25)
MapGetHit/impl=runtimeMap/t=Int64/len=1024-12 17.38n ± 1% 17.05n ± 1% -1.90% (p=0.000 n=25)
MapGetHit/impl=runtimeMap/t=Int64/len=2048-12 17.96n ± 0% 17.59n ± 1% -2.06% (p=0.000 n=25)
MapGetHit/impl=runtimeMap/t=Int64/len=4096-12 18.36n ± 1% 18.18n ± 1% -0.98% (p=0.013 n=25)
MapGetHit/impl=runtimeMap/t=Int64/len=8192-12 18.75n ± 0% 18.31n ± 1% -2.35% (p=0.000 n=25)
MapGetHit/impl=runtimeMap/t=Int64/len=65536-12 26.25n ± 0% 25.95n ± 1% -1.14% (p=0.000 n=25)
MapGetHit/impl=runtimeMap/t=Int64/len=262144-12 44.24n ± 1% 44.06n ± 1% ~ (p=0.181 n=25)
MapGetHit/impl=runtimeMap/t=Int64/len=1048576-12 85.02n ± 0% 85.35n ± 0% +0.39% (p=0.032 n=25)
MapGetHit/impl=runtimeMap/t=Int64/len=4194304-12 98.87n ± 1% 98.85n ± 1% ~ (p=0.799 n=25)
For #54766.
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-amd64-goamd64v3
Change-Id: Ic1b852f02744404122cb3672900fd95f4625905e
Reviewed-on: https://go-review.googlesource.com/c/go/+/626277
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
In Loongson's new microstructure LA664 (Loongson-3A6000) and later, the atomic
instruction AMSWAP[DB]{B,H} [1] is supported. Therefore, the implementation of
the atomic operation exchange can be selected according to the CPUCFG flag LAM_BH:
AMSWAPDBB(full barrier) instruction is used on new microstructures, and traditional
LL-SC is used on LA464 (Loongson-3A5000) and older microstructures. This can
significantly improve the performance of Go programs on new microstructures.
Because Xchg8 implemented using traditional LL-SC uses too many temporary
registers, it is not suitable for intrinsics.
goos: linux
goarch: loong64
pkg: internal/runtime/atomic
cpu: Loongson-3A6000 @ 2500.00MHz
BenchmarkXchg8 100000000 10.41 ns/op
BenchmarkXchg8-2 100000000 10.41 ns/op
BenchmarkXchg8-4 100000000 10.41 ns/op
BenchmarkXchg8Parallel 96647592 12.41 ns/op
BenchmarkXchg8Parallel-2 58376136 20.60 ns/op
BenchmarkXchg8Parallel-4 78458899 17.97 ns/op
goos: linux
goarch: loong64
pkg: internal/runtime/atomic
cpu: Loongson-3A5000-HV @ 2500.00MHz
BenchmarkXchg8 38323825 31.23 ns/op
BenchmarkXchg8-2 38368219 31.23 ns/op
BenchmarkXchg8-4 37154156 31.26 ns/op
BenchmarkXchg8Parallel 37908301 31.63 ns/op
BenchmarkXchg8Parallel-2 30413440 39.42 ns/op
BenchmarkXchg8Parallel-4 30737626 39.03 ns/op
For #69735
[1]: https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html
Change-Id: I02ba68f66a2210b6902344fdc9975eb62de728ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/623058
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
For the SubFromLen64 codegen test case to work as intended, we need
to fold c-(-(x-d)) into x+(c-d).
Still, some instances of LeadingZeros are not optimized into single
CLZ instructions right now (actually, the LeadingZeros micro-benchmarks
are currently still compiled with redundant adds/subs of 64, due to
interference of loop optimizations before lowering), but perf numbers
indicate it's not that bad after all.
Micro-benchmark results on Loongson 3A5000 and 3A6000:
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
LeadingZeros 3.660n ± 0% 1.348n ± 0% -63.17% (p=0.000 n=20)
LeadingZeros8 1.777n ± 0% 1.767n ± 0% -0.56% (p=0.000 n=20)
LeadingZeros16 2.816n ± 0% 1.770n ± 0% -37.14% (p=0.000 n=20)
LeadingZeros32 5.293n ± 1% 1.683n ± 0% -68.21% (p=0.000 n=20)
LeadingZeros64 3.622n ± 0% 1.349n ± 0% -62.76% (p=0.000 n=20)
geomean 3.229n 1.571n -51.35%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
LeadingZeros 2.410n ± 0% 1.103n ± 1% -54.23% (p=0.000 n=20)
LeadingZeros8 1.236n ± 0% 1.501n ± 0% +21.44% (p=0.000 n=20)
LeadingZeros16 2.106n ± 0% 1.501n ± 0% -28.73% (p=0.000 n=20)
LeadingZeros32 2.860n ± 0% 1.324n ± 0% -53.72% (p=0.000 n=20)
LeadingZeros64 2.6135n ± 0% 0.9509n ± 0% -63.62% (p=0.000 n=20)
geomean 2.159n 1.256n -41.81%
Updates #59120
This patch is a copy of CL 483356.
Co-authored-by: WANG Xuerui <git@xen0n.name>
Change-Id: Iee81a17f7da06d77a427e73dfcc016f2b15ae556
Reviewed-on: https://go-review.googlesource.com/c/go/+/624575
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Compared with the version generated by dec64.rules based on Ctz32,
the number of assembly instructions is reduced by half.
SwissMap uses TrailingZeros64 to find the first match in its control
group and may benefit from this CL on 386 architectures.
goos: linux
goarch: 386
cpu: 13th Gen Intel(R) Core(TM) i7-13700H
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
TrailingZeros64-20 0.8828n ± 1% 0.6299n ± 1% -28.65% (p=0.000 n=20)
Change-Id: Iba08a3f4e13efd3349715dfb7fcd5fd470286cd3
Reviewed-on: https://go-review.googlesource.com/c/go/+/624376
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
This is minor extension of the existing support for 32 and
64 bit types.
For #69735
Change-Id: I6828ec223951d2b692e077dc507b000ac23c32a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/617496
Reviewed-by: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
For #68578
Change-Id: Ia9580579bfc4709945bfcf6ec3803d5d11812187
Reviewed-on: https://go-review.googlesource.com/c/go/+/606901
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Moving these intrinsics to a base package enables other internal/runtime
packages to use them.
There is no immediate need for getclosureptr outside of runtime, but it
is moved for consistency with the other intrinsics.
For #54766.
Change-Id: Ia68b16a938c8cb84cb222469db28e3a83861be5d
Reviewed-on: https://go-review.googlesource.com/c/go/+/613262
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Moving these intrinsics to a base package enables other internal/runtime
packages to use them.
For #54766.
Change-Id: I45a530422207dd94b5ad4eee51216c9410a84040
Reviewed-on: https://go-review.googlesource.com/c/go/+/613261
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Moving these intrinsics to a base package enables other internal/runtime
packages to use them.
For #54766.
Change-Id: I0b3eded3bb45af53e3eb5bab93e3792e6a8beb46
Reviewed-on: https://go-review.googlesource.com/c/go/+/613260
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Add a check to ensure that intrinsics are not being overwritten.
Remove two S390X intrinsics that are being replaced by aliases and
are therefore ineffective.
Change-Id: I4187a169c14ca75c45a67f41a1d626d76b82bb72
Reviewed-on: https://go-review.googlesource.com/c/go/+/605479
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Create an intrinsicBuilders type that has functions for adding and
looking up intrinsics. This makes the implementation more self contained,
readable and testable. Additionally, pass an *intrinsicBuildConfig to
initIntrinsics to improve testability without needing to modify package
level variables.
Change-Id: I0ee0a19c192dd6da9f1c5f1c29b98a3ad8161fe2
Reviewed-on: https://go-review.googlesource.com/c/go/+/605478
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Joel Sing <joel@sing.id.au>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
This can be done efficiently with few instructions.
This also adds MULHDUCC for further codegen improvement.
Change-Id: I06320ba4383a679341b911a237a360ef07b19168
Reviewed-on: https://go-review.googlesource.com/c/go/+/605975
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Archana Ravindar <aravinda@redhat.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
The intrinsic handling code is a good thousand lines in the fairly
large ssa.go file. This code is already reasonably self-contained - factor
it out into a separate file so that future changes are easier to manage
(and it becomes easier to add/change intrinsics for an architecture).
Change-Id: I3c18d3d1bb6332f1817d902150e736373bf1ac81
Reviewed-on: https://go-review.googlesource.com/c/go/+/605477
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>