ARM64's IfElse behavior is reversed from other platforms. Reverse
it. Internally, its bitSelect is also the reverse of Wasm's
BitSelect. Reverse the ARM64 one to match.
Make Masked and IfElse tests portable.
Change-Id: Icd2dbcb3383b2be642fd6fc7115ef1cbef0f9b78
Reviewed-on: https://go-review.googlesource.com/c/go/+/793361
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
CL 789060 fixed a case in the backing store analysis
for range over slice statements.
This CL makes a corresponding update to the off-by-default
GOEXPERIMENT=runtimefreegc codegen tests.
While here, we slightly tweak the wording and regexp in the
equivalent default test code (for when GOEXPERIMENT=runtimefreegc
is disabled).
Updates #79909Fixes#79972
Change-Id: Ic6dfe04fee711b2b71a0edccb115477ad01dc5d0
Reviewed-on: https://go-review.googlesource.com/c/go/+/789980
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: t hepudds <thepudds1460@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
The range over slice statement keeps a pointer to the backing store of
the slice, making it from exclusive to nonexclusive at that point. Thus
we need to mark it as transition there.
Fixes#79909
Change-Id: I7292b5644ac658fa3a6ccd9fa949b454d2f3d770
Reviewed-on: https://go-review.googlesource.com/c/go/+/789060
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Semantically the nil check happens first, so we want the position of
the nil check.
In CL 659317 I added the don't-merge-with-store logic. Turns out that
was not right, it was just a way to work around the problem that I
have just fixed in the previous CL in this stack.
Fixes#79762
Change-Id: Id84d89d1843cc07b6f880f68d881c510d742c5aa
Reviewed-on: https://go-review.googlesource.com/c/go/+/785440
Reviewed-by: Mark Freeman <markfreeman@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Route HiToLo through Float64x2.SetElem/GetElem instead of Uint64x2
to avoid a round-trip through a GP register.
Update simd_arm64.go codegen test for current API.
This is a cherry-pick of CL 787302.
Updates #79899
Change-Id: I3d98bd137474a5188509e5ee365c0d9af386e32c
Reviewed-on: https://go-review.googlesource.com/c/go/+/787303
Reviewed-by: Arseny Samoylov <samoylov.arseny@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
The multi-register shift rewrites were flawed.
When bits is zero mod 64, SHRD and SHLD leave the destination unchanged,
so the result is lo rather than lo | hi.
We don't have enough information at hand to make better decisions here.
It'd take a lot of machinery to propagate non-zero-ness from prove,
and constant-only would have limited usefulness.
Conveniently, every occurrence in std guards against this.
This was introduced by me (eep) in CL 297050, and extended in CL 399061.
I re-measured on recent vintage amd64 machine, and the fused instructions
are no faster. SHRD/SHLD are pretty constrained (resultInArg0,
count in CX, clobbers flags).
Packages math and edwards25519:
│ with-fold │ without-fold │
│ sec/op │ sec/op vs base │
FMA-64 0.7609n ± 0% 0.7591n ± 1% ~ (p=0.436 n=10)
ScalarBaseMult-64 9.981µ ± 1% 9.827µ ± 0% -1.54% (p=0.000 n=10)
ScalarMult-64 33.01µ ± 0% 32.90µ ± 0% -0.33% (p=0.000 n=10)
geomean 630.5n 626.1n -0.70%
Clean up the ops as well, since nothing now generates them.
Change-Id: I37423aa558d7f626e81ee7db807b43de1747be1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/785801
Reviewed-by: Mark Freeman <markfreeman@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Add polynomial (carryless) multiply long using VPMULL/VPMULL2.
In this CL Uint64x2→Uint64x2 (2D→1Q).
GetHi folding produces VPMULL2 without extra instructions.
Also adds clmul_arm64.go with CarrylessMultiply{Even,Odd,OddEven,EvenOdd}
helper methods matching the amd64 API.
Also adds a feature check for ARM64.PMULL().
Directly based on CL 784020 which includes the 8-bit CLMUL.
Original author: alexander.musman@gmail.com
Change-Id: I6c554398f97c5c827bad92b271b8d03fd8adbd49
Reviewed-on: https://go-review.googlesource.com/c/go/+/785240
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Add hardware-backed comparison intrinsics (Equal, Greater, GreaterEqual)
with derived comparisons (Less, LessEqual, NotEqual), mask types with
Masked/IfElse methods using VBIT and VBIF, and Neg/Abs intrinsics for
all element types. VBIF complements VBIT so that IfElse with an inverted
mask (e.g. from NotEqual) folds away the VNOT.
ARM64 NEON uses bitwise mask representation (all-0 or all-1 per lane).
Comparisons use CMEQ/CMGT/CMHI/CMGE/CMHS and FCMEQ/FCMGT/FCMGE.
Neg/Abs instrinsics are implemented with VNEG/VFNEG and VABS/VFABS.
Here is a small runnable example:
```
package main
import (
"fmt"
"simd/archsimd"
)
func main() {
a := archsimd.LoadFloat32x4([]float32{1.0, -2.0, 3.0, -4.0})
b := archsimd.LoadFloat32x4([]float32{10.0, 20.0, 30.0, 40.0})
neg := a.Less(archsimd.Float32x4{})
result := a.IfElse(neg, b)
// Expected output: {0,1,0,1} {1,20,3,40} {0,-2,0,-4}
fmt.Println(neg.String(), result.String(), a.Masked(neg).String())
}
```
Change-Id: I353c34bbcfc7bff25f0c094b3dd13d5ecfb9af53
Reviewed-on: https://go-review.googlesource.com/c/go/+/776560
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
SetHi is emulated as VMOV Vn.D[0], Vd.D[1] and folded as destination of
narrow instruction into its variant that writes into upper half only.
GetHi is emulated as VMOV Vn.D[1], Dd and folded as a source of
long instruction into its variant reading upper half only.
Narrow and long instructions that these methods fold with will be added in follow-up CLs.
Simple example:
```
package main
import (
"fmt"
"simd/archsimd"
)
func main() {
x := archsimd.LoadUint32x4Array(&[4]uint32{1, 2, 0xFF, 0xFF})
y := archsimd.LoadUint32x4Array(&[4]uint32{10, 20, 0, 0})
s := x.SetHi(y)
g := s.GetHi()
fmt.Printf("%v.SetHi(%v) = %v\n", x, y, s) // {1,2,255,255}.SetHi({10,20,0,0}) = {1,2,10,20}
fmt.Printf("%v.GetHi() = %v\n", s, g) // {1,2,10,20}.GetHi() = {10,20,0,0}
}
```
Change-Id: Iaf2a6eca15c2be7800eaf72f066227666c7c0d95
Reviewed-on: https://go-review.googlesource.com/c/go/+/773721
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Add element-wise vector shift operations for ARM64 NEON,
supporting all integer element widths (B/H/S/D) for both signed
and unsigned types.
This adds:
- Shift (SSHL/USHL): per-element shift by signed amount from a second vector
- ShiftSaturated (SQSHL/UQSHL): saturating per-element shift
- ShiftLeftConst/ShiftRightConst (VSHL/VSSHR/VUSHR): shift by compile-time constant
- ShiftLeftSaturatedConst (VSQSHL/VUQSHL): saturating left shift by constant
- ShiftAllLeft/ShiftAllRight: shift all lanes by a scalar uint64
Lowering uses new case-based specialLower rules for const-shift
(immediate encoding) and ShiftAll (broadcast + VSSHL/VUSHL
with CSEL clamping for out-of-range amounts).
Test helpers are generated via tmplgen into
arm64_shift_helpers_test.go (20 type-specialized helpers for
ShiftConst, ShiftAll, and mixed-type Shift).
Example demonstrating Shift, ShiftLeftSaturatedConst, and ShiftAllRight:
```
package main
import (
"fmt"
"simd/archsimd"
)
func main() {
a := archsimd.LoadInt16x8([]int16{1, -1, 200, -200, 2049, -2049, 100, -100})
amt := archsimd.LoadInt16x8([]int16{2, -1, 3, 1, -2, 4, 256, -3})
fmt.Printf("%s\n%s\n%s\n",
a.Shift(amt).String(), // {4,-1,1600,-400,512,32752,100,-13}
a.ShiftLeftSaturatedConst(4).String(), // {16,-16,3200,-3200,32767,-32768,1600,-1600}
a.ShiftAllRight(2).String()) // {0,-1,50,-50,512,-513,25,-25}
}
```
Change-Id: Ife4aac499d8732f613325828c0ac16fdb7bedf0c
Reviewed-on: https://go-review.googlesource.com/c/go/+/767262
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Implement parseArgsMatchRule to support custom argument patterns
in SSA lowering rules (currently arm64 only). Use it to fold
Broadcast1To16 of a constant into VMOVI.16B. The other arrangements
(H8,S4,D2) are currently pending assembler VMOVI support.
Change-Id: Id36d7e032a940f8261bda10281235e2b818700a3
Reviewed-on: https://go-review.googlesource.com/c/go/+/767261
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Set simdRegMask=fp so the register allocator sees available registers.
On ARM64 floats occupy the lower part of NEON vectors, which in turn
occupy the lower 128 bits of SVE vectors.
Change-Id: I091c59b28b0be8011ac8889c21364eac40218fed
Reviewed-on: https://go-review.googlesource.com/c/go/+/780740
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
This allows us to properly generate the ops and merge/load rules for
various SIMD instructions that can use memory operands.
Fixes#78159
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-simd,gotip-linux-amd64_avx512-simd
Change-Id: Idec450c931c41bb903d4cc5b9b9ee8f610ee8796
Reviewed-on: https://go-review.googlesource.com/c/go/+/779521
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
This extends a pattern we already match for Add* to
- Sub
- Sub (with swapped arguments)
- Xor
- Com
- Neg
- Mul
This more or less equates to constant folding and is particularly hard to
benchmark objectively for the same reasons.
It is 1 or 3 (for mul) cycles faster in a microbenchmark.
However it may require constants that are harder to materialize.
We currently do not consider these drawbacks in generic.rules.
I didn't originally thought the o.Uses == 1 was required however
certain arches like PPC64 are able to merge the CMP into the operation
in limited conditions which are broken by this CL.
Also if o.Uses == 1 we aren't removing a user, we could extand the
liveness of o's argument, without removing o increasing register pressure.
The latency gains should be invisible on branches, maybe not if used by
CondSelect or CvtBoolToUint8, but don't bother with theses unproven
dices.
Change-Id: I4fe6b5149576d2549e1157e5cc891af9edb79d55
Reviewed-on: https://go-review.googlesource.com/c/go/+/750181
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Fixes#76056Fixes#76060
If we modify the issue's fieldReduceOnce2 function to:
// fieldReduceOnce reduces a value a < 2q.
func fieldReduceOnce2(a uint32) fieldElement {
x, b := bits.Sub(uint(a), uint(q), 0)
return fieldElement(subtle.ConstantTimeSelect(int(b), int(a), int(x)))
}
We get the wanted assembly*:
MOVL AX, CX
MOVL AX, DX
SUBQ $8380417, CX
CMOVQCS DX, CX
MOVQ CX, AX ; not ideal code size but handled by the register renaming unit
RET
Changes made to fieldReduceOnce2:
- fixed a bug where a and x arguments to subtle.ConstantTimeSelect were swapped.
we should use a when the sub underflows and x otherwise.
- use bits.Sub rather than bits.Sub32 which is intriscified.
*we use CMOVQCS + MOVQ because the CMOV randomly gets generated backward,
I believe this would be fixed if we teach regalloc to commut CMOV
(by swapping the two register args and inverting the condition).
Change-Id: I01eca545d3c5c8a1c1f5a107e0089f715359dfc6
Reviewed-on: https://go-review.googlesource.com/c/go/+/778141
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Assuming the CPU recognize SBB RX, RX as a dependency break,
this is a no-op however SET is much more canonical and easier
to match for.
Updates #76056
Change-Id: Icc590dbcc76a8ed2fca7b167cfb66a2d33d4d2d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/778140
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
In the sizespecializedmalloc goexperiment, we specialized the tiny
function per tiny size, so there was a different allocation function per
size from 1-15. This created a lot of functions for a code path that was
not executed that often. From the microbenchmarks, comparing the
consolidated tiny function in this cl with the per-size functions, the
specialized functions could be up to 20% faster, but for 8 byte
allocations, which are almost certainly the most common, the per-size
function was slower.
Look at the change description of CL 766980 for the results of those
microbenchmarks. The CL also contains the code used to run the
benchmark.
Since we've noticed significant icache pressure from all the functions,
and, the tiny functions aren't used as much as the other ones, and the
benefits seem to be mixed, consolidate the 15 functions into a single
function.
This cuts the size of the mallocgc* functions by about 20%.
For #79286
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64_c2s16-perf_vs_parent-sizespecializedmalloc,gotip-linux-amd64_c3h88-perf_vs_parent-sizespecializedmalloc,gotip-linux-arm64_c4ah72-perf_vs_parent-sizespecializedmalloc,gotip-linux-arm64_c4as16-perf_vs_parent-sizespecializedmalloc,gotip-linux-arm64_c4as16-perf_vs_parent,gotip-linux-arm64_c4ah72-perf_vs_parent,gotip-linux-amd64_c3h88-perf_vs_parent,gotip-linux-amd64_c2s16-perf_vs_parent
Change-Id: I824f65727a858158c14d2edd6fea1e846a6a6964
Reviewed-on: https://go-review.googlesource.com/c/go/+/772540
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Matloob <matloob@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
The canonical way to multiply by 2 is x<<1, this is what other
generic rules expect.
It is slower than x+x but arches rule can turn x<<1 back into x+x,
as this avoids adding many special cases for rules optimizing shifts
to also search x+x as x<<1.
Change-Id: I249c60cd2643db2e2a3503f3934211f80fb2912a
Reviewed-on: https://go-review.googlesource.com/c/go/+/774060
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
This CL contains API renamings:
Loads:
Adjust these names to be "scalable first"
LoadT => LoadTArray
LoadTSlice => LoadT
LoadTSlicePart(s []E) T => LoadTPart(s []E) T (note: the next CL will further refine it to return the elements loaded)
LoadTMasked - Let's drop this for now. Passing an array defeats the main purpose of suppressing faults. Passing a slice would require extra work bounds-checking work. It's not clear how to translate this into Go.
Stores:
T.Store => T.StoreArray (not necessary, but gives symmetry and compile-time bounds checking)
T.StoreSlice => T.Store
T.StoreSlicePart => T.StorePart
T.StoreMasked => T.StoreArrayMasked
We may want a slice version of masked store, but we'll leave it out for now. It requires bounds checking. Mostly this will be served by StorePart.
For #78979.
Change-Id: I16dbc269b4566380c19e769892ea55d849024e53
Reviewed-on: https://go-review.googlesource.com/c/go/+/775600
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Currently, a closure in a function is usually named after the
outer function, usually in the form of pkg.outer.funcN. When the
containing function is inlined, we attach the inlined caller's
name to the closure name, so this may become things like
callerpkg.caller.pkg.outer.funcN. With multiple levels of
inlining, this name can get pretty long and clutter.
This CL change the compiler to use the simple, pre-inlining name
for closures. That is, the closure is always named pkg.outer.funcN
where outer is the containing function in the source code. This
name is not changed during inlining. With inlining, there may be
multiple copies of the closure, all with the same name. They are
likely to be compiled identically, although technically it is
possible for the compiler to optimize them differently based on
the context. So we'll use a content hash to distinguish and
deduplicate them.
With the content-addressable symbol mechanism, the linker is
capable of handling multiple symbols with the same name, and use
the content hash to distinguish and deduplicate them. A
complication is that the compiler is not able to handle multiple
symbols with the same name when compiling a package. So we give
them temporarily unique suffixes during the compilation (based
on the inline call stack), and trim the suffix in the object file
and DWARF generation. So their linker symbols remain simple.
One caveat is nested closure (i.e. a closure within a closure).
Previously, a nested closure is named as topLevelFunc.funcN.M where
topLevelFunc.funcN is the outer closure. When the outer closure is
inlined, and the inlined caller is not a closure, it is named as
caller.topLevelFunc.funcN.funcM (note the extra "func"). This is
arguably a bug in the current code, as it decides whether to
include the "func" word based on whether the physical containing
function is a closure or not, not the source-level function. This
CL removes the "caller" part from the name, but does not address
the extra "func" word. So when the outer closure is inlined, the
inner closure will be named topLevelFunc.funcN.funcM, which
differs from the original topLevelFunc.funcN.M. This is not too
bad in that the name won't get too long, and still match the
source.
Fixes#60324.
Change-Id: Ia69c35a8f9b1a3b2c27db1a0959c1316be8b1f81
Reviewed-on: https://go-review.googlesource.com/c/go/+/770200
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Commit-Queue: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Bypass: Cherry Mui <cherryyz@google.com>
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
ScoreInductionInc was introduced in 19f05770b0.
The goal was to keep the i++ in-place in a register.
Placing ScoreInductionInc later than ScoreFlags further improves
the generated code, mainly for code involving carry chains.
For example, the math/big.addVW_ref inner loop was:
LEAQ 1(CX), R8
ADDQ DX, R9
MOVQ R9, (AX)(CX*8)
SBBQ R9, R9
NEGQ R9
MOVQ R8, CX
After this commit:
ADDQ DX, R9
MOVQ R9, (AX)(CX*8)
SBBQ R9, R9
NEGQ R9
INCQ CX
This is almost uniformly an improvement, across GOARCHes.
There are a few functions where this perturbs regalloc and causes
a little bit of movement, but they are rare and appear to be the
usual uninteresting regalloc change noise.
Change-Id: I883a92e4511136f478cf49471ba8b628434393dc
Reviewed-on: https://go-review.googlesource.com/c/go/+/773660
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Moves that read from read-only memory can't be reading the results
of a previous store. These are often generated by constant struct literals.
Moves whose results aren't needed because that memory is immediately
overwritten, are not needed.
Saves a few bytes of generated code (~<0.1%).
Change-Id: I8dab6d1b9c066d6b623eae8b8fe31a51dd3de006
Reviewed-on: https://go-review.googlesource.com/c/go/+/771780
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Jakub Ciolek <jakub@ciolek.dev>
Reviewed-by: David Chase <drchase@google.com>
When a generic function converts a shape-typed value to an interface
and then type-asserts or type-switches on it, some cases can never
match because the asserted concrete type has a different shape than
the source. For example:
func foo[S string | []byte](x S) {
switch any(x).(type) {
case string: // possible only when S has shape string
case []byte: // possible only when S has shape []uint8
}
}
Since instantiated generic funcs work on shapes, all instantiations
contain the code for all cases even if they will never be hit.
Detect OCONVIFACE of a shape type followed by a concrete type
assertion, and compare the shapes. If they are incompatible, the
assertion can never succeed for that instantiation.
This applies to both type switch cases (which are skipped entirely)
and comma-ok type assertions (which are replaced with zero, false).
The analysis also tracks through intermediate variables using a
pre-walk pass with ReassignOracle, so patterns like
iface := any(x)
v, ok := iface.(string)
are handled as well.
Updates #57072
Change-Id: I837f6089b9e431f856a528463075fd10abe464dc
Reviewed-on: https://go-review.googlesource.com/c/go/+/767640
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Switch cases that end in a fallthrough, and the case that follows it,
can't be optimized to a lookup table. Others should still be eligible
for optimization.
Change-Id: Iebdde2ab590f2be89ba08a2dc3326553c5a4083c
Reviewed-on: https://go-review.googlesource.com/c/go/+/764440
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This pass performs bitwise constant folding.
It's main goal is to optimize bitfields like generated by defer.
You might have 3 defers in a function and the middle one is always taken,
previously we couldn't remove the branch for it, this pass is able to do so.
This is hit 93 times uniqued by LOC when building the std.
My first thought was to implement this as parts of the limits code.
However the way limits allows to set knownBits tighter and vice-versa
means the code complexity between the two is multiplicative.
Thus I have avoided this, someone might change it in the future
but I don't have a good usecase now and this simple pass is sufficient.
I have tried multiple places for the pass,
we need it before any opt (here late opt) since we need the generic rules
to optimize any user of a constant folded value.
We also want one run of known bits after prove since prove removing some
never / always taken branches allows known bits to do a better job.
This yields real optimizations when you have a defer inside an always
taken branch.
I've thought prove might do a better job if some branches were removed by
running an early known bits first.
However after trying it, this never helped.
I am sure you can build an example where this becomes true, but at least
in the code I've looked at it didn't help.
Thus I decided against running known bits twice (before and after prove).
Fixes#78633
Change-Id: I90a46875cc11d5d26367f00ac83c29fed433cb6d
Reviewed-on: https://go-review.googlesource.com/c/go/+/765560
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: golang-scoped@luci-project-accounts.iam.gserviceaccount.com <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
The SSA generic rewrite rules implement DeMorgan's laws but are
missing the closely related boolean absorption laws:
x & (x | y) == x
x | (x & y) == x
These are fundamental boolean algebra identities (see
https://en.wikipedia.org/wiki/Absorption_law) that hold for all
bit patterns, all widths, signed and unsigned. Both GCC and LLVM
recognize and optimize these patterns at -O2.
Add two generic rules covering all four widths (8, 16, 32, 64).
Commutativity of AND/OR is handled automatically by the rule
engine, so all argument orderings are matched.
The rules eliminate two redundant ALU instructions per occurrence
and fire on real code (defer bit-manipulation patterns in runtime,
testing, go/parser, and third-party packages).
Fixes#78632
Change-Id: Ib59e839081302ad1635e823309d8aec768c25dcf
GitHub-Last-Rev: 23f8296ece
GitHub-Pull-Request: golang/go#78634
Reviewed-on: https://go-review.googlesource.com/c/go/+/765580
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Fixes#78558
I've also added tests to make sure PPC still generate ISEL when
the constant isn't 1.
This is to make sure we aren't generating a sequence that wouldn't
work right now.
But it does not mean we couldn't try to optimize other constants
on PPC64 if a fast sequence exists; for example like arm64's
inline register shifts.
Change-Id: Ic241d593149b7a11533948f5d4c52db357cc134f
Reviewed-on: https://go-review.googlesource.com/c/go/+/763340
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
Reviewed-by: Paul Murphy <paumurph@redhat.com>
Lookup tables for switch statements can be generalized to also support
bools, strings, floats, and complex numbers.
Change-Id: Ic3ece41fe2009050fbf08ba6f06ea8a567407974
Reviewed-on: https://go-review.googlesource.com/c/go/+/763320
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
- fix a bug where it wouldn't recognize 1<<63 as a power of two
- remove the IsSigned check; there is no such thing as a signed Mul
If the rule works for signed numbers it works for unsigned ones too.
Even if the intermediary steps makes no sense, it ends up wrapping
the right way around in the end.
Change-Id: I86182762aec5eff784e2d9bc49ee028825fb9ea0
Reviewed-on: https://go-review.googlesource.com/c/go/+/760843
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
On amd64 along:
if b { x += 1 } => x += b
We can also implement constants 2 4 and 8:
if b { x += 2 } => x += b * 2
This compiles to a displacement LEA.
Change-Id: Ib00fcc5059acb0ebb346e056c4a656f164cc63df
Reviewed-on: https://go-review.googlesource.com/c/go/+/760841
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Logical ops on uint8/uint16 (AND/OR/XOR) with constants sometimes
materialized the mask via MOVD (often as a negative immediate), even
when the value fit in the UI-immediate range. This prevented the backend
from selecting andi. / ori / xori forms.
This CL makes:
UI-immediate truncation is performed only at the use-site of
logical-immediate ops, and only when the constant does not fit in the
8- or 16-bit unsigned domain (m != uint8(m) / m != uint16(m)).
This avoids negative-mask materialization and enables correct emission of
UI-form logical instructions. Arithmetic SI-immediate instructions (addi, subfic, etc.) and other
use-patterns are unchanged.
Codegen tests are added to ensure the expected andi./ori/xori
patterns appear and that MOVD is not emitted for valid 8/16-bit masks.
Change-Id: I9fcdf4498c4e984c7587814fb9019a75865c4a0d
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10
Reviewed-on: https://go-review.googlesource.com/c/go/+/704015
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Paul Murphy <paumurph@redhat.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Switch statement containing integer constant cases and case bodies just
returning a constant should be optimizable to a simpler and faster table
lookup instead of a jump table.
That is, a switch like this:
switch x {
case 0: return 10
case 1: return 20
case 2: return 30
case 3: return 40
default: return -1
}
Could be optimized to this:
var table = [4]int{10, 20, 30, 40}
if uint(x) < 4 { return table[x] }
return -1
The resulting code is smaller and faster, especially on platforms where
jump tables are not supported.
goos: windows
goarch: arm64
pkg: cmd/compile/internal/test
│ .\old.txt │ .\new.txt │
│ sec/op │ sec/op vs base │
SwitchLookup8Predictable-12 2.708n ± 6% 2.249n ± 5% -16.97% (p=0.000 n=10)
SwitchLookup8Unpredictable-12 8.758n ± 7% 3.272n ± 4% -62.65% (p=0.000 n=10)
SwitchLookup32Predictable-12 2.672n ± 5% 2.373n ± 6% -11.21% (p=0.000 n=10)
SwitchLookup32Unpredictable-12 9.372n ± 7% 3.385n ± 6% -63.89% (p=0.000 n=10)
geomean 4.937n 2.772n -43.84%
Fixes#78203
Change-Id: I74fa3d77ef618412951b2e5c3cb6ebc760ce4ff1
Reviewed-on: https://go-review.googlesource.com/c/go/+/756340
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
If the bool comes from a local operation this is foldable into the comparison.
if a == b {
} else {
x++
}
becomes:
x += !(a == b)
becomes:
x += a != b
If the bool is passed in or loaded rather than being locally computed
this adds an extra XOR ^1 to invert it.
But at worst it should make the math equal to the compute + CMP + CMOV
which is a tie on modern CPUs which can execute CMOV on all int ALUs
and a win on the cheaper or older ones which can't.
Change-Id: Idd2566c7a3826ec432ebfbba7b3898aa0db4b812
Reviewed-on: https://go-review.googlesource.com/c/go/+/760922
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
After CL 760780, commas aren't allowed.
But some CLs that were already in flight don't know that.
Change-Id: I31f586c87def4a9746dc2c055923fce8bad6647e
Reviewed-on: https://go-review.googlesource.com/c/go/+/761620
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Change-Id: I081da8c79f0264118e079af21ff58c511ae37e6c
Reviewed-on: https://go-review.googlesource.com/c/go/+/760682
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@google.com>
Replace \s with a space in backtick-quoted strings
Replace \\s with a space in double-quoted strings
Change-Id: I0c8b249bb12c2c8ca69e683e4bc6f27544fd6094
Reviewed-on: https://go-review.googlesource.com/c/go/+/760680
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Paul Murphy <paumurph@redhat.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
A bunch of tests had broken yet undetected syntax errors
in their assembly output regexps. Things like mismatched quotes,
using ^ instead of - for negation, etc.
In addition, since CL 716060 using commas as separators between
regexps doesn't work, and ends up just silently dropping every
regexp after the comma.
Fix all these things, and add a test to make sure that we're not
silently dropping regexps on the floor.
After this CL I will do some cleanup to align with CL 716060, like
replacing commas and \s with spaces (which was the point of that CL,
but wasn't consistently rewritten everywhere).
Change-Id: I54f226120a311ead0c6c62eaf5d152ceed106034
Reviewed-on: https://go-review.googlesource.com/c/go/+/760521
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Paul Murphy <paumurph@redhat.com>
Auto-Submit: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Similar to CL 685676 but for XOR.
Change-Id: Ib5ffd4c13348f176a808b3218fdbbafc2c42794f
Reviewed-on: https://go-review.googlesource.com/c/go/+/760921
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Similar to CL 685676 but for OR.
Change-Id: I0ddfd457ed9e8888462306138a251ac48ad42084
Reviewed-on: https://go-review.googlesource.com/c/go/+/760920
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>