This includes code generated by simdgen CL 689955,
here because of git-facilitated pilot error
(the generated file should have been in the next CL
but that is related to this one, so, oh well).
Change-Id: Ibfea3f1cd93ca9cd12970edf15a013471677a6ba
Reviewed-on: https://go-review.googlesource.com/c/go/+/689936
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This is "glue" changes and hand work for the AVX2
masked loads/stores. Does not include generated
function/method declarations or intrinsic registration.
Change-Id: Ic95f90b117d0c471f174407ce3f729f1f517b23c
Reviewed-on: https://go-review.googlesource.com/c/go/+/689295
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This CL is partially generated by CL 689775.
Change-Id: I0c36fd2a44706c88db1a1d5ea4a6d0b9f891d85f
Reviewed-on: https://go-review.googlesource.com/c/go/+/689795
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This CL is partially generated by CL 688855.
Change-Id: I68d5fbad9445a3d2cf671822be1c0b82e7290396
Reviewed-on: https://go-review.googlesource.com/c/go/+/688875
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
This combines several CLs into a single patch of "glue"
for the generated SIMD extensions.
This glue includes GOEXPERIMENT checks that disable
the creation of user-visible "simd" types and
that disable the registration of "simd" intrinsics.
The simd type checks were changed to work for either
package "simd" or "internal/simd" so that moving that
package won't be quite so fragile.
cmd/compile, internal/simd: glue for adding SIMD extensions to Go
cmd/compile: theft of Cherry's sample SIMD compilation
Change-Id: Id44e2f4bafe74032c26de576a8691b6f7d977e01
Reviewed-on: https://go-review.googlesource.com/c/go/+/675598
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
This lets us get rid of lots of specialized opcodes for storing zero.
Instead, use regular store opcodes that just happen to use the zero
register as one of their inputs.
Change-Id: I2902a6f9b0831cb598df45189ca6bb57221bef72
Reviewed-on: https://go-review.googlesource.com/c/go/+/633075
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
ssa.Sym is only implemented by *ir.Name or *obj.LSym.
Change-Id: Ia171db618abd8b438fcc2cf402f40f3fe3ec6833
Reviewed-on: https://go-review.googlesource.com/c/go/+/660995
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
this removes the old conditional-on-register-value
handshake from the deferproc/deferprocstack logic.
The "line" for the recovery-exit frame itself (not the defers
that it runs) is the closing brace of the function.
Reduces code size slightly (e.g. go command is 0.2% smaller)
Sample output showing effect of this change, also what sort of
code it requires to observe the effect:
```
package main
import "os"
func main() {
g(len(os.Args) - 1) // stack[0]
}
var gi int
var pi *int = &gi
//go:noinline
func g(i int) {
switch i {
case 0:
defer func() {
println("g0", i)
q() // stack[2] if i == 0
}()
for j := *pi; j < 1; j++ {
defer func() {
println("recover0", recover().(string))
}()
}
default:
for j := *pi; j < 1; j++ {
defer func() {
println("g1", i)
q() // stack[2] if i == 1
}()
}
defer func() {
println("recover1", recover().(string))
}()
}
p()
} // stack[1] (deferreturn)
//go:noinline
func p() {
panic("p()")
}
//go:noinline
func q() {
panic("q()") // stack[3]
}
/* Sample output for "./foo foo":
recover1 p()
g1 1
panic: q()
goroutine 1 [running]:
main.q()
.../main.go:46 +0x2c
main.g.func3()
.../main.go:29 +0x48
main.g(0x1?)
.../main.go:37 +0x68
main.main()
.../main.go:6 +0x28
*/
```
Change-Id: Ie39ea62ecc244213500380ea06d44024cadc2317
Reviewed-on: https://go-review.googlesource.com/c/go/+/650795
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Change-Id: I07e7c8eaa5bd4bac0d576b2f2f4cd3f81b0b77a4
Reviewed-on: https://go-review.googlesource.com/c/go/+/630055
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Commit-Queue: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Compared with the version generated by dec64.rules based on Ctz32,
the number of assembly instructions is reduced by half.
SwissMap uses TrailingZeros64 to find the first match in its control
group and may benefit from this CL on 386 architectures.
goos: linux
goarch: 386
cpu: 13th Gen Intel(R) Core(TM) i7-13700H
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
TrailingZeros64-20 0.8828n ± 1% 0.6299n ± 1% -28.65% (p=0.000 n=20)
Change-Id: Iba08a3f4e13efd3349715dfb7fcd5fd470286cd3
Reviewed-on: https://go-review.googlesource.com/c/go/+/624376
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Change-Id: I7ff869e21e67cf6a193f7a92bf7b05f047ee005c
GitHub-Last-Rev: bf01f58249
GitHub-Pull-Request: golang/go#69957
Reviewed-on: https://go-review.googlesource.com/c/go/+/620778
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
For #68578
Change-Id: Ia9580579bfc4709945bfcf6ec3803d5d11812187
Reviewed-on: https://go-review.googlesource.com/c/go/+/606901
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
The SSA backend currently only handle struct with up to 4 fields. Thus,
there are different operations corresponding to number fields of the
struct.
This CL generalizes these with just one OpStructMake, allow struct types
with arbitrary number of fields.
However, the ssa.MaxStruct is still kept as-is, and future CL will
increase this value to optimize large structs.
Updates #24416
Change-Id: I192ffbea881186693584476b5639394e79be45c5
Reviewed-on: https://go-review.googlesource.com/c/go/+/611075
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Moving these intrinsics to a base package enables other internal/runtime
packages to use them.
For #54766.
Change-Id: I45a530422207dd94b5ad4eee51216c9410a84040
Reviewed-on: https://go-review.googlesource.com/c/go/+/613261
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Moving these intrinsics to a base package enables other internal/runtime
packages to use them.
For #54766.
Change-Id: I0b3eded3bb45af53e3eb5bab93e3792e6a8beb46
Reviewed-on: https://go-review.googlesource.com/c/go/+/613260
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
When GORISCV64 enables rva22u64, make use of integer MIN/MINU/MAX/MAXU
instructions in compiler rewrite rules.
Change-Id: I4e7c514516acad03f2869d4c8936f06582cf7ea9
Reviewed-on: https://go-review.googlesource.com/c/go/+/559660
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
For atomic AND and OR operations on memory, we currently have two
views of the op. One just does the operation on the memory and returns
just a memory. The other does the operation on the memory and returns
the old value (before having the logical operation done to it) and
memory.
Update #61395
These two type differently, and there's currently some confusion in
our rules about which is which. Use different names for the two
different flavors so we don't get them confused.
Change-Id: I07b4542db672b2cee98169ac42b67db73c482093
Reviewed-on: https://go-review.googlesource.com/c/go/+/594976
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Nicolas Hillegeer <aktau@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
The atomic And/Or operators were added by the CL 528797,
the compiler does not intrinsify them, this CL does it for
arm64.
Also, for the existing atomicAnd/Or operations, the updated
value are not used, but at that time we need a register to
temporarily hold it. Now that we have v.RegTmp, the new value
is not needed anymore. This CL changes it.
The other change is that the existing operations don't use their
result, but now we need the old value and not the new value for
the result.
And this CL alias all of the And/Or operations into sync/atomic
package.
Peformance on an ARMv8.1 machine:
old.txt new.txt
sec/op sec/op vs base
And32-160 8.716n ± 0% 4.771n ± 1% -45.26% (p=0.000 n=10)
And32Parallel-160 30.58n ± 2% 26.45n ± 4% -13.49% (p=0.000 n=10)
And64-160 8.750n ± 1% 4.754n ± 0% -45.67% (p=0.000 n=10)
And64Parallel-160 29.40n ± 3% 25.55n ± 5% -13.11% (p=0.000 n=10)
Or32-160 8.847n ± 1% 4.754±1% -46.26% (p=0.000 n=10)
Or32Parallel-160 30.75n ± 3% 26.10n ± 4% -15.14% (p=0.000 n=10)
Or64-160 8.825n ± 1% 4.766n ± 0% -46.00% (p=0.000 n=10)
Or64Parallel-160 30.52n ± 5% 25.89n ± 6% -15.17% (p=0.000 n=10)
For #61395
Change-Id: Ib1d1ac83f7f67dcf67f74d003fadb0f80932b826
Reviewed-on: https://go-review.googlesource.com/c/go/+/584715
Auto-Submit: Austin Clements <austin@google.com>
TryBot-Bypass: Austin Clements <austin@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Fannie Zhang <Fannie.Zhang@arm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Have nil checks return a pointer that is known non-nil. Users of
that pointer can use the result, ensuring that they are ordered
after the nil check itself.
The order dependence goes away after scheduling, when we've fixed
an order. At that point we move uses back to the original pointer
so it doesn't change regalloc any.
This prevents pointer arithmetic on nil from being spilled to the
stack and then observed by a stack scan.
Fixes#63657
Change-Id: I1a5fa4f2e6d9000d672792b4f90dfc1b7b67f6ea
Reviewed-on: https://go-review.googlesource.com/c/go/+/537775
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
sparse conditional constant propagation can discover optimization
opportunities that cannot be found by just combining constant folding
and constant propagation and dead code elimination separately.
This is a re-submit of PR#59575, which fix a broken dominance relationship caught by ssacheck
Updates https://github.com/golang/go/issues/59399
Change-Id: I57482dee38f8e80a610aed4f64295e60c38b7a47
GitHub-Last-Rev: 830016f24e
GitHub-Pull-Request: golang/go#60469
Reviewed-on: https://go-review.googlesource.com/c/go/+/498795
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
sparse conditional constant propagation can discover optimization opportunities that cannot be found by just combining constant folding and constant propagation and dead code elimination separately.
Updates #59399
Change-Id: Ia954e906480654a6f0cc065d75b5912f96f36b2e
GitHub-Last-Rev: 90fc02db99
GitHub-Pull-Request: golang/go#59575
Reviewed-on: https://go-review.googlesource.com/c/go/+/483875
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Have the write barrier call return a pointer to a buffer into which
the generated code records pointers that need write barrier treatment.
Change-Id: I7871764298e0aa1513de417010c8d46b296b199e
Reviewed-on: https://go-review.googlesource.com/c/go/+/447781
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Bypass: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Instead of keeping track of in which blocks write barriers complete,
introduce a new op that marks the exact memory state where the
write barrier completes.
For future use. This allows us to move some of the write barrier code
to between the start of the merging block and the WBend marker.
Change-Id: If3809b260292667d91bf0ee18d7b4d0eb1e929f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/447777
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
This change intrinsifies ReverseBytes{16|32|64} by generating the
corresponding new instructions in Power10: brh, brd and brw and
adds a verification test for the same.
On Power 9 and 8, the .go code performs optimally as it is.
Performance improvement seen on Power10:
ReverseBytes32 1.38ns ± 0% 1.18ns ± 0% -14.2
ReverseBytes64 1.52ns ± 0% 1.11ns ± 0% -26.87
ReverseBytes16 1.41ns ± 1% 1.18ns ± 0% -16.47
Change-Id: I88f127f3ab9ba24a772becc21ad90acfba324b37
Reviewed-on: https://go-review.googlesource.com/c/go/+/446675
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
We need to make sure that when we get the stack pointer, we get it
at the right time.
V = GetCallerSP
Call()
W = GetCallerSP
If Call causes a stack growth, then we will be in a situation
where V != W. So it matters when GetCallerSP operations get scheduled.
Add a memory argument to GetCallerSP so it can't be reordered with
things like calls.
Change-Id: I6cc801134c38e358c5a1ec0c09d38379a16a4184
Reviewed-on: https://go-review.googlesource.com/c/go/+/453515
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Martin Möhrmann <martin@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
The SPanchored opcode is identical to SP, except that it takes a memory
argument so that it (and more importantly, anything that uses it)
must be scheduled at or after that memory argument.
This opcode ensures that a LEAQ of a variable gets scheduled after the
corresponding VARDEF for that variable.
This may lead to less CSE of LEAQ operations. The effect is very small.
The go binary is only 80 bytes bigger after this CL. Usually LEAQs get
folded into load/store operations, so the effect is only for pointerful
types, large enough to need a duffzero, and have their address passed
somewhere. Even then, usually the CSEd LEAQs will be un-CSEd because
the two uses are on different sides of a function call and the LEAQ
ends up being rematerialized at the second use anyway.
Change-Id: Ib893562cd05369b91dd563b48fb83f5250950293
Reviewed-on: https://go-review.googlesource.com/c/go/+/452916
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Martin Möhrmann <martin@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
These two directories are full of //go:build ignore files.
We can ignore them more easily by putting an underscore
at the start of the name. That also works around a bug
in Go 1.17 that was not fixed until Go 1.17.3.
Change-Id: Ia5389b65c79b1e6d08e4fef374d335d776d44ead
Reviewed-on: https://go-review.googlesource.com/c/go/+/435472
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-10-04 19:35:46 +00:00
Renamed from src/cmd/compile/internal/ssa/gen/genericOps.go (Browse further)