Logical ops on uint8/uint16 (AND/OR/XOR) with constants sometimes
materialized the mask via MOVD (often as a negative immediate), even
when the value fit in the UI-immediate range. This prevented the backend
from selecting andi. / ori / xori forms.
This CL makes:
UI-immediate truncation is performed only at the use-site of
logical-immediate ops, and only when the constant does not fit in the
8- or 16-bit unsigned domain (m != uint8(m) / m != uint16(m)).
This avoids negative-mask materialization and enables correct emission of
UI-form logical instructions. Arithmetic SI-immediate instructions (addi, subfic, etc.) and other
use-patterns are unchanged.
Codegen tests are added to ensure the expected andi./ori/xori
patterns appear and that MOVD is not emitted for valid 8/16-bit masks.
Change-Id: I9fcdf4498c4e984c7587814fb9019a75865c4a0d
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10
Reviewed-on: https://go-review.googlesource.com/c/go/+/704015
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Paul Murphy <paumurph@redhat.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Switch statement containing integer constant cases and case bodies just
returning a constant should be optimizable to a simpler and faster table
lookup instead of a jump table.
That is, a switch like this:
switch x {
case 0: return 10
case 1: return 20
case 2: return 30
case 3: return 40
default: return -1
}
Could be optimized to this:
var table = [4]int{10, 20, 30, 40}
if uint(x) < 4 { return table[x] }
return -1
The resulting code is smaller and faster, especially on platforms where
jump tables are not supported.
goos: windows
goarch: arm64
pkg: cmd/compile/internal/test
│ .\old.txt │ .\new.txt │
│ sec/op │ sec/op vs base │
SwitchLookup8Predictable-12 2.708n ± 6% 2.249n ± 5% -16.97% (p=0.000 n=10)
SwitchLookup8Unpredictable-12 8.758n ± 7% 3.272n ± 4% -62.65% (p=0.000 n=10)
SwitchLookup32Predictable-12 2.672n ± 5% 2.373n ± 6% -11.21% (p=0.000 n=10)
SwitchLookup32Unpredictable-12 9.372n ± 7% 3.385n ± 6% -63.89% (p=0.000 n=10)
geomean 4.937n 2.772n -43.84%
Fixes#78203
Change-Id: I74fa3d77ef618412951b2e5c3cb6ebc760ce4ff1
Reviewed-on: https://go-review.googlesource.com/c/go/+/756340
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
If the bool comes from a local operation this is foldable into the comparison.
if a == b {
} else {
x++
}
becomes:
x += !(a == b)
becomes:
x += a != b
If the bool is passed in or loaded rather than being locally computed
this adds an extra XOR ^1 to invert it.
But at worst it should make the math equal to the compute + CMP + CMOV
which is a tie on modern CPUs which can execute CMOV on all int ALUs
and a win on the cheaper or older ones which can't.
Change-Id: Idd2566c7a3826ec432ebfbba7b3898aa0db4b812
Reviewed-on: https://go-review.googlesource.com/c/go/+/760922
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
After CL 760780, commas aren't allowed.
But some CLs that were already in flight don't know that.
Change-Id: I31f586c87def4a9746dc2c055923fce8bad6647e
Reviewed-on: https://go-review.googlesource.com/c/go/+/761620
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Change-Id: I081da8c79f0264118e079af21ff58c511ae37e6c
Reviewed-on: https://go-review.googlesource.com/c/go/+/760682
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@google.com>
Replace \s with a space in backtick-quoted strings
Replace \\s with a space in double-quoted strings
Change-Id: I0c8b249bb12c2c8ca69e683e4bc6f27544fd6094
Reviewed-on: https://go-review.googlesource.com/c/go/+/760680
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Paul Murphy <paumurph@redhat.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
A bunch of tests had broken yet undetected syntax errors
in their assembly output regexps. Things like mismatched quotes,
using ^ instead of - for negation, etc.
In addition, since CL 716060 using commas as separators between
regexps doesn't work, and ends up just silently dropping every
regexp after the comma.
Fix all these things, and add a test to make sure that we're not
silently dropping regexps on the floor.
After this CL I will do some cleanup to align with CL 716060, like
replacing commas and \s with spaces (which was the point of that CL,
but wasn't consistently rewritten everywhere).
Change-Id: I54f226120a311ead0c6c62eaf5d152ceed106034
Reviewed-on: https://go-review.googlesource.com/c/go/+/760521
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Paul Murphy <paumurph@redhat.com>
Auto-Submit: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Similar to CL 685676 but for XOR.
Change-Id: Ib5ffd4c13348f176a808b3218fdbbafc2c42794f
Reviewed-on: https://go-review.googlesource.com/c/go/+/760921
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Similar to CL 685676 but for OR.
Change-Id: I0ddfd457ed9e8888462306138a251ac48ad42084
Reviewed-on: https://go-review.googlesource.com/c/go/+/760920
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Theses test cases search for an AND that gets
optimized away after CL 760307.
Should help to fix riscv64 (coudn't check as
builders don't appear on https://build.golang.org/ )
and loong64 CI on master.
Change-Id: I57e4e5ab7d3003f239355137472585e46493d8dc
Reviewed-on: https://go-review.googlesource.com/c/go/+/760640
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
The prove pass removes superfluous bit masking. This was meant to
test the edge cases of the ppc64 folding rules which are exactly
the cases the prove pass now removes.
Fixes#78403
Change-Id: I45eeac58e01b42e19b8a06bb0d7af96c616ccbff
Reviewed-on: https://go-review.googlesource.com/c/go/+/760307
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Paul Murphy <paumurph@redhat.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
On ppc64/ppc64le, rewrite (x + x) << c to x << (c+1) for constant shifts. This removes an ADD, shortens the dependency chain, and reduces code size.
Add rules for both 64-bit (SLDconst) and 32-bit (SLWconst), and extend
test/codegen/shift.go with ppc64x checks to assert a single SLD/SLW and
forbid ADD. Aligns ppc64 with other architectures that already assert
similar codegen in shift.go.
Change-Id: Ie564afbb029a5bd48887b82b0c455ca1dddd5508
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10
Reviewed-on: https://go-review.googlesource.com/c/go/+/712000
Reviewed-by: Archana Ravindar <aravinda@redhat.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Look into the following block(s) for a load that can be paired with
the load we're trying to pair up.
This particularly helps the generated equality functions. Instead of doing
MOVD x(R0), R2
MOVD x(R1), R3
CMP R2, R3
BNE noteq
MOVD x+8(R0), R2
MOVD x+8(R1), R3
CMP R2, R3
BNE noteq
we do
LDP x(R0), (R2, R4)
LDP x(R1), (R3, R5)
CMP R2, R3
BNE noteq
CMP R4, R5
BNE noteq
Removes 5296 bytes of code from cmd/go.
Change-Id: I6368686892ac944783c8b07ed7252126d1ef4031
Reviewed-on: https://go-review.googlesource.com/c/go/+/740741
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Absorb unnecessary conversion between float32 and float64
if both src and dst are 32 bit.
Updates #75463
Change-Id: Ia71941223b5cca3fea66b559da7b8f916e63feaf
Reviewed-on: https://go-review.googlesource.com/c/go/+/733621
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Julian Zhu <jz531210@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Fixes#77720
Add a generic SSA rewrite that forwards `Load` from a `Move` destination
back to the `Move` source when it is provably safe, so field reads like
`s.h.Value().typ` don’t force a full struct copy.
- Add `Load <- Move` rewrite in `generic.rules` with safety guard:
non-volatile source
- Tweak `fixedbugs/issue22200*` so that it can still trigger the "stack frame too large" error.
- Regenerate `rewritegeneric.go`.
- Add `test/codegen/moveload.go` to assert no `MOVUPS` and direct `MOVBLZX`
in both direct and inlined forms.
Benchmark results (linux/amd64, i7-14700KF):
$ go test cmd/compile/internal/test -run='^$' -bench='MoveLoad' -count=20
Before:
BenchmarkMoveLoadTypViaValue-20 ~76.9 ns/op
BenchmarkMoveLoadTypViaPtr-20 ~1.97 ns/op
After:
BenchmarkMoveLoadTypViaValue-20 ~1.894 ns/op
BenchmarkMoveLoadTypViaPtr-20 ~1.905 ns/op
The rewrite removes the redundant struct copy in
`s.h.Value().typ`, bringing it in line with the direct pointer form.
Change-Id: Iddf2263e390030ba013e0642a695b87c75f899da
Reviewed-on: https://go-review.googlesource.com/c/go/+/748200
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Negation on a condition can be eliminated.
Change-Id: I94fab5f019cbaebb2ca589e1d8796a9cb72f3894
Reviewed-on: https://go-review.googlesource.com/c/go/+/748401
Reviewed-by: Xueqi Luo <1824368278@qq.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Julian Zhu <jz531210@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Before this CL, the simdgen contains a sign check to selectively enable
such rules for deduplication purposes. This left out `VPSRL` as it's
only available in unsigned form. This CL fixes that.
It looks like the previous documentation fix to SHA instruction might
not had run go generate, so this CL also contains the generated code for
that fix.
There is also a weird phantom import in
cmd/compile/internal/ssa/issue77582_test.go
This CL also fixes that
The trybot didn't complain?
Change-Id: Ibbf9f789c1a67af1474f0285ab376bc07f17667e
Reviewed-on: https://go-review.googlesource.com/c/go/+/748501
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Not a fix because there are other architectures
still to be done.
Updates #75463.
Change-Id: Ia5233c2b6c5f4439e269950efdd851e72e8e7ff6
Reviewed-on: https://go-review.googlesource.com/c/go/+/730160
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Not a fix because there are other architectures
still to be done.
Updates #75463.
Change-Id: Ifca03975023e4e5d0ffa98d1f877314a1a291be0
Reviewed-on: https://go-review.googlesource.com/c/go/+/729161
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Not a fix because there are other architectures
still to be done.
Updates #75463.
Change-Id: I3d7754ce4a26af0f5c4ef0be1254d164e68f8442
Reviewed-on: https://go-review.googlesource.com/c/go/+/729160
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
The subtle.ConstantTimeSelect function is now intrinsified on a number of
architectures. Add test coverage for code generation.
Change-Id: Iff97510838b39ec2137d64a62d2f516c94710c68
Reviewed-on: https://go-review.googlesource.com/c/go/+/748400
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Xueqi Luo <1824368278@qq.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Ensure that loading the value of one is included and specify the actual
constants for various instructions. Avoid potential ambiguity with
instructions (e.g. use "AND " to avoid matching "ANDI").
Change-Id: I42707bd68fab9e12d984670e64a02ee93667f77c
Reviewed-on: https://go-review.googlesource.com/c/go/+/748920
Reviewed-by: Julian Zhu <julian.oerv@isrc.iscas.ac.cn>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
When creating a dynamically-sized slice, the compiler attempts to use a
stack-allocated buffer if the slice does not escape and its buffer size
is ≤ 32 bytes.
In this case, the SSA will contain a (OpPhi (OpSliceMake) (OpSliceMake))
value: one OpSliceMake uses the stack-allocated buffer, and the other
uses the heap-allocated buffer. The len and cap arguments for these two
OpSliceMake values are expected to be identical.
This CL enables the prove pass to recognize this scenario and handle
OpSliceLen and OpSliceCap as intended.
Fixes#77375
Change-Id: Id77a2473caf66d366f5c94108aa6cb6d3df5b887
Reviewed-on: https://go-review.googlesource.com/c/go/+/740840
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
This change introduces a new SSA pass that converts conditionals
with logical AND into CMP/CCMP instruction sequences on ARM64.
Fixes#71268
Change-Id: I3eba9c05b88ed6eb70350d30f6e805e6a4dddbf1
Reviewed-on: https://go-review.googlesource.com/c/go/+/698099
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
This commit adds two test functions, bitsOptXor1 and bitsOptXor2,
to verify that the compiler correctly optimizes certain bitwise
expression patterns in future CLs.
Change-Id: Idf5bd1ff8653f8fa218604d857639e063546d8e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/736540
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Fixes#76449
This saves a single byte for the REX prefix per OpCopy it triggers on.
Change-Id: I1eab364d07354555ba2f23ffd2f9c522d4a04bd0
Reviewed-on: https://go-review.googlesource.com/c/go/+/731640
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
The bug was first introduced when the compiler is still written in C,
with CL 2254041. The static array was laid out with the wrong context,
causing a stack pointer will be stored in global object.
Fixes#61730Fixes#77193
Change-Id: I22c8393314d251beb53db537043a63714c84f36a
Reviewed-on: https://go-review.googlesource.com/c/go/+/737821
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
This makes it easier to identify which functions are used for memory
allocation by looking for functions that start with mallocgc. The Size
suffix is added so that the isSpecializedMalloc function in
cmd/compile/internal/ssa can distinguish between the generated functions
and the mallocgcTiny function called by mallocgc, similar to the SC
suffixes for the mallocgcSmallNoScanSC* and mallocgcSmallScanNoHeaderSC*
functons.
Change-Id: I6ad7f15617bf6f18ae5d1bfa2a0b94e86a6a6964
Reviewed-on: https://go-review.googlesource.com/c/go/+/735780
Reviewed-by: Michael Matloob <matloob@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Also more consistently include commas after constants to increase
accuracy (i.e. "1," cannot inadvertantly match "10")
Change-Id: I480a73859d2e83354b8e9f94bc73c6563976d0e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/733460
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
IsNaN's underlying instruction, VCMPPS (or VCMPPD), takes two
inputs, and computes either of them is NaN. Optimize the Or
pattern to generate two-operand form.
This implements the optimization mentioned in CL 733660.
Change-Id: I13943b377ee384864c913eed320763f333a03e41
Reviewed-on: https://go-review.googlesource.com/c/go/+/733680
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Fix a pattern in test/codegen/comparisons.go to use whitespace instead
of a tab. The test needs a whitespace to properly fail if the regalloc
change from CL686655 would be missing (in that case we would have a
spill immediately after call to memequal, which is supposed to be
captured by this pattern).
Change-Id: I5b6fb5e861b9327c7f071660592b8ffa265e0030
Reviewed-on: https://go-review.googlesource.com/c/go/+/727620
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Use Go idiomatic function names, use a common prefix, attempt to
maintain some consistency, avoid naming functions based upon
machine specific instructions and combine a duplicate test that
likely exists due to this confusion.
Change-Id: I996e9efd7497821edef94c1997d4a310d9d79a71
Reviewed-on: https://go-review.googlesource.com/c/go/+/732200
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Joel Sing <joel@sing.id.au>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Change-Id: Iba4d3ded15d578e97a978780069e70a51a5e944b
Reviewed-on: https://go-review.googlesource.com/c/go/+/732180
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Nicholas Husin <husin@google.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Joel Sing <joel@sing.id.au>
Merge List:
+ 2025-12-08 a33bbf1988 weak: fix weak pointer test to correctly iterate over weak pointers after GC
+ 2025-12-08 a88a96330f cmd/cgo: use doc link for cgo.Handle
+ 2025-12-08 276cc4d3db cmd/link: fix AIX builds after recent linker changes
+ 2025-12-08 f2d96272cb runtime/trace: update TestSubscribers to dump traces
+ 2025-12-08 4837bcc92c internal/trace: skip tests for alloc/free experiment by default
+ 2025-12-08 b5f6816cea cmd/link: generate DWARF for moduledata
+ 2025-12-08 44a39c9dac runtime: only run TestNotInGoMetricCallback when building with cgo
+ 2025-12-08 3a6a034cd6 runtime: disable TestNotInGoMetricCallback on FreeBSD + race
+ 2025-12-08 4122d3e9ea runtime: use atomic C types with atomic C functions
+ 2025-12-08 34397865b1 runtime: deflake TestProfBufWakeup
+ 2025-12-08 d4972f6295 runtime: mark getfp as nosplit
+ 2025-12-05 0d0d5c9a82 test/codegen: test negation with add/sub on riscv64
+ 2025-12-05 2e509e61ef cmd/go: convert some more tests to script tests
+ 2025-12-05 c270e71835 cmd/go/internal/vet: skip -fix on pkgs from vendor or non-main mod
+ 2025-12-05 745349712e runtime: don't count nGsyscallNoP for extra Ms in C
+ 2025-12-05 f3d572d96a cmd/go: fix race applying fixes in fix and vet -fix modes
+ 2025-12-05 76345533f7 runtime: expand Pinner documentation
+ 2025-12-05 b133524c0f cmd/go/testdata/script: skip vet_cache in short mode
+ 2025-12-05 96e142ba2b runtime: skip TestArenaCollision if we run out of hints
+ 2025-12-05 fe4952f116 runtime: relax threadsSlack in TestReadMetricsSched
+ 2025-12-05 8947f092a8 runtime: skip mayMoreStackMove in goroutine leak tests
+ 2025-12-05 44cb82449e runtime/race: set missing argument frame for ppc64x atomic And/Or wrappers
+ 2025-12-05 435e61c801 runtime: reject any goroutine leak test failure that failed to execute
+ 2025-12-05 54e5540014 runtime: print output in case of segfault in goroutine leak tests
+ 2025-12-05 9616c33295 runtime: don't specify GOEXPERIMENT=greenteagc in goroutine leak tests
+ 2025-12-05 2244bd7eeb crypto/subtle: add speculation barrier after DIT
+ 2025-12-05 f84f8d86be cmd/compile: fix mis-infer bounds in slice len/cap calculations
+ 2025-12-05 a70addd3b3 all: fix some comment issues
+ 2025-12-05 93b49f773d internal/runtime/maps: clarify probeSeq doc comment
+ 2025-12-04 91267f0a70 all: update vendored x/tools
+ 2025-12-04 a753a9ed54 cmd/internal/fuzztest: move fuzz tests out of cmd/go test suite
+ 2025-12-04 1681c3b67f crypto: use rand.IsDefaultReader instead of comparing to boring.RandReader
+ 2025-12-04 7b67b68a0d cmd/compile: use isUnsignedPowerOfTwo rather than isPowerOfTwo for unsigneds
+ 2025-12-03 2b62144069 all: REVERSE MERGE dev.simd (9ac524a) into master
Change-Id: Ia0cdf06cdde89b6a4db30ed15ed8e0bcbac6ae30
Also removes a few leftover TODOs and scraps of commented-out code
from simd development.
Updated etetest.sh to make it behave whether amd64 implies the
experiment, or not.
Fixes#76473.
Change-Id: I6d9792214d7f514cb90c21b101dbf7d07c1d0e55
Reviewed-on: https://go-review.googlesource.com/c/go/+/728220
TryBot-Bypass: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Change-Id: Ic0eca86d3c93707ebd7c716e774ebda55af4f196
Reviewed-on: https://go-review.googlesource.com/c/go/+/703755
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Julian Zhu <jz531210@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
These were broken by CL 721206, which changes Rsh to RshU for
positive inputs.
Change-Id: I9e38c3c428fb8aeb70cf51e7e76f4711c864f027
Reviewed-on: https://go-review.googlesource.com/c/go/+/723340
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This CL is part of a set of CLs that attempt to reduce how much work the
GC must do. See the design in https://go.dev/design/74299-runtime-freegc
This CL updates the compiler to examine append calls to prove
whether or not the slice is aliased.
If proven unaliased, the compiler automatically inserts a call to a new
runtime function introduced with this CL, runtime.growsliceNoAlias,
which frees the old backing memory immediately after slice growth is
complete and the old storage is logically dead.
Two append benchmarks below show promising results, executing up to
~2x faster and up to factor of ~3 memory reduction with this CL.
The approach works with multiple append calls for the same slice,
including inside loops, and the final slice memory can be escaping,
such as in a classic pattern of returning a slice from a function
after the slice is built. (The final slice memory is never freed with
this CL, though we have other work that tackles that.)
An example target for this CL is we automatically free the
intermediate memory for the appends in the loop in this function:
func f1(input []int) []int {
var s []int
for _, x := range input {
s = append(s, g(x)) // s cannot be aliased here
if h(x) {
s = append(s, x) // s cannot be aliased here
}
}
return s // slice escapes at end
}
In this case, the compiler and the runtime collaborate so that
the heap allocated backing memory for s is automatically freed after
a successful grow. (For the first grow, there is nothing to free,
but for the second and subsequent growths, the old heap memory is
freed automatically.)
The new runtime.growsliceNoAlias is primarily implemented
by calling runtime.freegc, which we introduced in CL 673695.
The high-level approach here is we step through the IR starting
from a slice declaration and look for any operations that either
alias the slice or might do so, and treat any IR construct we
don't specifically handle as a potential alias (and therefore
conservatively fall back to treating the slice as aliased when
encountering something not understood).
For loops, some additional care is required. We arrange the analysis
so that an alias in the body of a loop causes all the appends in that
same loop body to be marked aliased, even if the aliasing occurs after
the append in the IR:
func f2() {
var s []int
for i := range 10 {
s = append(s, i) // aliased due to next line
alias = s
}
}
For nested loops, we analyse the nesting appropriately so that
for example this append is still proven as non-aliased in the
inner loop even though it aliased for the outer loop:
func f3() {
for range 10 {
var s []int
for i := range 10 {
s = append(s, i) // append using non-aliased slice
}
alias = s
}
}
A good starting point is the beginning of the test/escape_alias.go file,
which starts with ~10 introductory examples with brief comments that
attempt to illustrate the high-level approach.
For more details, see the new .../internal/escape/alias.go file,
especially the (*aliasAnalysis).analyze method.
In the first benchmark, an append in a loop builds up a slice from
nothing, where the slice elements are each 64 bytes. In the table below,
'count' is the number of appends. With 1 append, there is no opportunity
for this CL to free memory. Once there are 2 appends, the growth from
1 element to 2 elements means the compiler-inserted growsliceNoAlias
frees the 1-element array, and we see a ~33% reduction in memory use
and a small reported speed improvement.
As the number of appends increases for example to 5, we are at
a ~20% speed improvement and ~45% memory reduction, and so on until
we reach ~40% faster and ~50% less memory allocated at the end of
the table.
There can be variation in the reported numbers based on -randlayout, so
this table is for 30 different values of -randlayout with a total
n=150. (Even so, there is still some variation, so we probably should
not read too much into small changes.) This is with GOAMD64=v3 on
a VM that gcc reports is cascadelake.
goos: linux
goarch: amd64
pkg: runtime
cpu: Intel(R) Xeon(R) CPU @ 2.80GHz
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ sec/op │ sec/op vs base │
Append64Bytes/count=1-4 31.09n ± 2% 31.69n ± 1% +1.95% (n=150)
Append64Bytes/count=2-4 73.31n ± 1% 70.27n ± 0% -4.15% (n=150)
Append64Bytes/count=3-4 142.7n ± 1% 124.6n ± 1% -12.68% (n=150)
Append64Bytes/count=4-4 149.6n ± 1% 127.7n ± 0% -14.64% (n=150)
Append64Bytes/count=5-4 277.1n ± 1% 213.6n ± 0% -22.90% (n=150)
Append64Bytes/count=6-4 280.7n ± 1% 216.5n ± 1% -22.87% (n=150)
Append64Bytes/count=10-4 544.3n ± 1% 386.6n ± 0% -28.97% (n=150)
Append64Bytes/count=20-4 1058.5n ± 1% 715.6n ± 1% -32.39% (n=150)
Append64Bytes/count=50-4 2.121µ ± 1% 1.404µ ± 1% -33.83% (n=150)
Append64Bytes/count=100-4 4.152µ ± 1% 2.736µ ± 1% -34.11% (n=150)
Append64Bytes/count=200-4 7.753µ ± 1% 4.882µ ± 1% -37.03% (n=150)
Append64Bytes/count=400-4 15.163µ ± 2% 9.273µ ± 1% -38.84% (n=150)
geomean 601.8n 455.0n -24.39%
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ B/op │ B/op vs base │
Append64Bytes/count=1-4 64.00 ± 0% 64.00 ± 0% ~ (n=150)
Append64Bytes/count=2-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150)
Append64Bytes/count=3-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150)
Append64Bytes/count=4-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150)
Append64Bytes/count=5-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150)
Append64Bytes/count=6-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150)
Append64Bytes/count=10-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150)
Append64Bytes/count=20-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150)
Append64Bytes/count=50-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150)
Append64Bytes/count=100-4 15.938Ki ± 0% 8.021Ki ± 0% -49.67% (n=150)
Append64Bytes/count=200-4 31.94Ki ± 0% 16.08Ki ± 0% -49.64% (n=150)
Append64Bytes/count=400-4 63.94Ki ± 0% 32.33Ki ± 0% -49.44% (n=150)
geomean 1.991Ki 1.124Ki -43.54%
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ allocs/op │ allocs/op vs base │
Append64Bytes/count=1-4 1.000 ± 0% 1.000 ± 0% ~ (n=150)
Append64Bytes/count=2-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150)
Append64Bytes/count=3-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150)
Append64Bytes/count=4-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150)
Append64Bytes/count=5-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150)
Append64Bytes/count=6-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150)
Append64Bytes/count=10-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150)
Append64Bytes/count=20-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150)
Append64Bytes/count=50-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150)
Append64Bytes/count=100-4 8.000 ± 0% 1.000 ± 0% -87.50% (n=150)
Append64Bytes/count=200-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150)
Append64Bytes/count=400-4 10.000 ± 0% 1.000 ± 0% -90.00% (n=150)
geomean 4.331 1.000 -76.91%
The second benchmark is similar, but instead uses an 8-byte integer
for the slice element. The first 4 appends in the loop never call into
the runtime thanks to the excellent CL 664299 introduced by Keith in
Go 1.25 that allows some <= 32 byte dynamically-sized slices to be on
the stack, so this CL is neutral for <= 32 bytes. Once the 5th append
occurs at count=5, a grow happens via the runtime and heap allocates
as normal, but freegc does not yet have anything to free, so we see
a small ~1.4ns penalty reported there. But once the second growth
happens, the older heap memory is now automatically freed by freegc,
so we start to see some benefit in memory reductions and speed
improvements, starting at a tiny speed improvement (close to a wash,
or maybe noise) by the second growth before count=10, and building up to
~2x faster with ~68% fewer allocated bytes reported.
goos: linux
goarch: amd64
pkg: runtime
cpu: Intel(R) Xeon(R) CPU @ 2.80GHz
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ sec/op │ sec/op vs base │
AppendInt/count=1-4 2.978n ± 0% 2.969n ± 0% -0.30% (p=0.000 n=150)
AppendInt/count=4-4 4.292n ± 3% 4.163n ± 3% ~ (p=0.528 n=150)
AppendInt/count=5-4 33.50n ± 0% 34.93n ± 0% +4.25% (p=0.000 n=150)
AppendInt/count=10-4 76.21n ± 1% 75.67n ± 0% -0.72% (p=0.000 n=150)
AppendInt/count=20-4 150.6n ± 1% 133.0n ± 0% -11.65% (n=150)
AppendInt/count=50-4 284.1n ± 1% 225.6n ± 0% -20.59% (n=150)
AppendInt/count=100-4 544.2n ± 1% 392.4n ± 1% -27.89% (n=150)
AppendInt/count=200-4 1051.5n ± 1% 702.3n ± 0% -33.21% (n=150)
AppendInt/count=400-4 2.041µ ± 1% 1.312µ ± 1% -35.70% (n=150)
AppendInt/count=1000-4 5.224µ ± 2% 2.851µ ± 1% -45.43% (n=150)
AppendInt/count=2000-4 11.770µ ± 1% 6.010µ ± 1% -48.94% (n=150)
AppendInt/count=3000-4 17.747µ ± 2% 8.264µ ± 1% -53.44% (n=150)
geomean 331.8n 246.4n -25.72%
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ B/op │ B/op vs base │
AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150)
AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150)
AppendInt/count=5-4 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=150)
AppendInt/count=10-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150)
AppendInt/count=20-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150)
AppendInt/count=50-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150)
AppendInt/count=100-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150)
AppendInt/count=200-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150)
AppendInt/count=400-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150)
AppendInt/count=1000-4 24.56Ki ± 0% 10.05Ki ± 0% -59.07% (n=150)
AppendInt/count=2000-4 58.56Ki ± 0% 20.31Ki ± 0% -65.32% (n=150)
AppendInt/count=3000-4 85.19Ki ± 0% 27.30Ki ± 0% -67.95% (n=150)
geomean ² -42.81%
│ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │
│ allocs/op │ allocs/op vs base │
AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150)
AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150)
AppendInt/count=5-4 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=150)
AppendInt/count=10-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150)
AppendInt/count=20-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150)
AppendInt/count=50-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150)
AppendInt/count=100-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150)
AppendInt/count=200-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150)
AppendInt/count=400-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150)
AppendInt/count=1000-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150)
AppendInt/count=2000-4 11.000 ± 0% 1.000 ± 0% -90.91% (n=150)
AppendInt/count=3000-4 12.000 ± 0% 1.000 ± 0% -91.67% (n=150)
geomean ² -72.76% ²
Of course, these are just microbenchmarks, but likely indicate
there are some opportunities here.
The immediately following CL 712422 tackles inlining and is able to get
runtime.freegc working automatically with iterators such as used by
slices.Collect, which becomes able to automatically free the
intermediate memory from its repeated appends (which earlier
in this work required a temporary hand edit to the slices package).
For now, we only use the NoAlias version for element types without
pointers while waiting on additional runtime support in CL 698515.
Updates #74299
Change-Id: I1b9d286aa97c170dcc2e203ec0f8ca72d84e8221
Reviewed-on: https://go-review.googlesource.com/c/go/+/710015
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Don't use the move2heap optimization if the move2heap is inside
a loop deeper than the declaration of the slice. We really only want
to do the move2heap operation once.
Change-Id: I4a68d01609c2c9d4e0abe4580839e70059393a81
Reviewed-on: https://go-review.googlesource.com/c/go/+/722440
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Introduce a new MemEq SSA operation for runtime.memequal. The operation
is initially implemented for arm64. The change adds opt rules (following
existing rules for call to runtime.memequal), working with MemEq, and a
later op version LoweredMemEq which may be lowered differently for more
constant size cases in future (for other targets as well as for arm64).
The new MemEq SSA operation does not have memory result, allowing cse of
loads operations around it.
Code size difference (for arm64 linux):
Executable Old .text New .text Change
-------------------------------------------------------
asm 1970420 1969668 -0.04%
cgo 1741220 1740212 -0.06%
compile 8956756 8959428 +0.03%
cover 1879332 1878772 -0.03%
link 2574116 2572660 -0.06%
preprofile 867124 866820 -0.04%
vet 2890404 2888596 -0.06%
Change-Id: I6ab507929b861884d17d5818cfbd152cf7879751
Reviewed-on: https://go-review.googlesource.com/c/go/+/686655
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Gets rid of some sign extensions, like arm64.
Change-Id: I9fc37e15a82718bfcf53db8cab0c4e7baaa0a747
Reviewed-on: https://go-review.googlesource.com/c/go/+/721522
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>