Commit graph

620 commits

Author SHA1 Message Date
Youlin Feng
cc571dab91 cmd/compile: deduplicate instructions when rewrite func results
After CL 628075, do not rely on the memory arg of an OpLocalAddr.

Fixes #74788

Change-Id: I4e893241e3949bb8f2d93c8b88cc102e155b725d
Reviewed-on: https://go-review.googlesource.com/c/go/+/691275
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Mark Freeman <mark@golang.org>
2025-07-30 09:38:10 -07:00
Cuong Manh Le
bd94ae8903 cmd/compile: use unsigned power-of-two detector for unsigned mod
Same as CL 689815, but for modulus instead of division.

Updates #74485

Change-Id: I73000231c886a987a1093669ff207fd9117a8160
Reviewed-on: https://go-review.googlesource.com/c/go/+/689895
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-07-29 16:22:40 -07:00
Cuong Manh Le
f3582fc80e cmd/compile: add unsigned power-of-two detector
Fixes #74485

Change-Id: Ia22a58ac43bdc36c8414d555672a3a3eafc749ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/689815
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2025-07-29 16:22:37 -07:00
Michael Munday
46b5839231 test/codegen: fix failing condmove wasm tests
These recently added tests failed when using the -all_codgen flag.

Fixes #74770

Change-Id: Idea1ea02af2bd9f45c7d0a28d633c7442328e6df
Reviewed-on: https://go-review.googlesource.com/c/go/+/690715
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Run-TryBot: Michael Munday <mikemndy@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Mark Freeman <mark@golang.org>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
TryBot-Bypass: Michael Knyszek <mknyszek@google.com>
2025-07-28 11:01:53 -07:00
Jorropo
ce05ad448f cmd/compile: rewrite condselects into doublings and halvings
For performance see CL 685676.

This allows something like:
  if y { x *= 2 }

To be compiled to:
  SHLXQ BX, AX, AX

Instead of:
  MOVQ    AX, CX
  SHLQ    $1, CX
  MOVBLZX BL, DX
  TESTQ   DX, DX
  CMOVQNE CX, AX

While ./make.bash uniqued per LOC, there is 2 doublings and 4 halvings.

Change-Id: Ic0727cbf429528a2dbf17cbfc3b0121db8387444
Reviewed-on: https://go-review.googlesource.com/c/go/+/685695
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-07-24 14:42:15 -07:00
Jorropo
fcd28070fe cmd/compile: add opt branchelim to rewrite some CondSelect into math
This allows something like:
  if y { x++ }

To be compiled to:
  MOVBLZX BX, CX
  ADDQ CX, AX

Instead of:
  LEAQ    1(AX), CX
  MOVBLZX BL, DX
  TESTQ   DX, DX
  CMOVQNE CX, AX

While ./make.bash uniqued per LOC, there is 100 additions and 75 substractions.

See benchmark here: https://go.dev/play/p/DJf5COjwhd_s

Either it's a performance no-op or it is faster:

  goos: linux
  goarch: amd64
  cpu: AMD Ryzen 5 3600 6-Core Processor
                                          │ /tmp/old.logs │            /tmp/new.logs             │
                                          │    sec/op     │    sec/op     vs base                │
  CmovInlineConditionAddLatency-12           0.5443n ± 5%   0.5339n ± 3%   -1.90% (p=0.004 n=10)
  CmovInlineConditionAddThroughputBy6-12      1.492n ± 1%    1.494n ± 1%        ~ (p=0.955 n=10)
  CmovInlineConditionSubLatency-12           0.5419n ± 3%   0.5282n ± 3%   -2.52% (p=0.019 n=10)
  CmovInlineConditionSubThroughputBy6-12      1.587n ± 1%    1.584n ± 2%        ~ (p=0.492 n=10)
  CmovOutlineConditionAddLatency-12          0.5223n ± 1%   0.2639n ± 4%  -49.47% (p=0.000 n=10)
  CmovOutlineConditionAddThroughputBy6-12     1.159n ± 1%    1.097n ± 2%   -5.35% (p=0.000 n=10)
  CmovOutlineConditionSubLatency-12          0.5271n ± 3%   0.2654n ± 2%  -49.66% (p=0.000 n=10)
  CmovOutlineConditionSubThroughputBy6-12     1.053n ± 1%    1.050n ± 1%        ~ (p=1.000 n=10)
  geomean

There are other benefits not tested by this benchmark:
- the math form is usually a couple bytes shorter (ICACHE)
- the math form is usually 0~2 uops shorter (UCACHE)
- the math form has usually less register pressure*
- the math form can sometimes be optimized further

*regalloc rarely find how it can use less registers

As far as pass ordering goes there are many possible options,
I've decided to reorder branchelim before late opt since:
- unlike running exclusively the CondSelect rules after branchelim,
  some extra optimizations might trigger on the adds or subs.
- I don't want to maintain a second generic.rules file of only the stuff,
  that can trigger after branchelim.
- rerunning all of opt a third time increase compilation time for little gains.

By elimination moving branchelim seems fine.

Change-Id: I869adf57e4d109948ee157cfc47144445146bafd
Reviewed-on: https://go-review.googlesource.com/c/go/+/685676
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-07-24 14:42:10 -07:00
Alexander Musman
bd80f74bc1 cmd/compile: fold shift through AND for slice operations
Fold a shift through AND when the AND gets a zero-or-one operand (e.g.
from arithmetic shift by 63 of a 64-bit value) for a common case with
slice operations:

    ASR     $63, R2, R2
    AND     R3<<3, R2, R2
    ADD     R2, R0, R2

As the operands are 64-bit, we can transform it to:

    AND     R2->63, R3, R2
    ADD     R2<<3, R0, R2

Code size improvement:
compile: .text:     9088004 ->  9086292 (-0.02%)
etcd:    .text:    10500276 -> 10498964 (-0.01%)

Change-Id: Ibcd5e67173da39b77ceff77ca67812fb8be5a7b5
Reviewed-on: https://go-review.googlesource.com/c/go/+/679895
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Mark Freeman <mark@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-07-24 13:47:20 -07:00
Alexander Musman
dcb479c2f9 cmd/compile: optimize slice bounds checking with SUB/SUBconst comparisons
Optimize ARM64 code generation for slice bounds checking by recognizing
patterns where comparisons to zero involve SUB or SUBconst operations.
This change adds SSA opt rules to simplify:
 (CMPconst [0] (SUB x y)) => (CMP x y)

The optimizations apply to EQ, NE, ULE, and UGT comparisons, enabling
more efficient bounds checking for slice operations.

Code size improvement:
compile: .text:    9088004  ->  9065988 (-0.24%)
etcd:    .text:    10500276 -> 10497092 (-0.03%)
Change-Id: I467cb27674351652bcacc52b87e1f19677bd46a8
Reviewed-on: https://go-review.googlesource.com/c/go/+/679915
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-07-24 12:39:53 -07:00
Paul Murphy
ee7bfbdbcc cmd/compile/internal/ssa: fix PPC64 merging of (AND (S[RL]Dconst ...)
CL 622236 forgot to check the mask was also a 32 bit rotate mask. Add
a modified version of isPPC64WordRotateMask which valids the mask is
contiguous and fits inside a uint32.

I don't this is possible when merging SRDconst, the first check should
always reject such combines. But, be extra careful and do it there
too.

Fixes #73153

Change-Id: Ie95f74ec5e7d89dc761511126db814f886a7a435
Reviewed-on: https://go-review.googlesource.com/c/go/+/679775
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-06-09 20:33:27 -07:00
Jake Bailey
27ff0f249c cmd/compile/internal/ssa: eliminate string copies for calls to unique.Make
unique.Make always copies strings passed into it, so it's safe to not
copy byte slices converted to strings either. Handle this just like map
accesses with string(b) as keys.

This CL only handles unique.Make(string(b)), not nested cases like
unique.Make([2]string{string(b1), string(b2)}); this could be done in a
followup CL but the map lookup code in walk is sufficiently different
than the call handling code that I didn't attempt it. (SSA is much
easier).

Fixes #71926

Change-Id: Ic2f82f2f91963d563b4ddb1282bd49fc40da8b85
Reviewed-on: https://go-review.googlesource.com/c/go/+/672135
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-21 20:20:31 -07:00
thepudds
f4de2ecffb cmd/compile/internal/walk: convert composite literals to interfaces without allocating
Today, this interface conversion causes the struct literal
to be heap allocated:

    var sink any

    func example1() {
        sink = S{1, 1}
    }

For basic literals like integers that are directly used in
an interface conversion that would otherwise allocate, the compiler
is able to use read-only global storage (see #18704).

This CL extends that to struct and array literals as well by creating
read-only global storage that is able to represent for example S{1, 1},
and then using a pointer to that storage in the interface
when the interface conversion happens.

A more challenging example is:

    func example2() {
        v := S{1, 1}
        sink = v
    }

In this case, the struct literal is not directly part of the
interface conversion, but is instead assigned to a local variable.

To still avoid heap allocation in cases like this, in walk we
construct a cache that maps from expressions used in interface
conversions to earlier expressions that can be used to represent the
same value (via ir.ReassignOracle.StaticValue). This is somewhat
analogous to how we avoided heap allocation for basic literals in
CL 649077 earlier in our stack, though here we also need to do a
little more work to create the read-only global.

CL 649076 (also earlier in our stack) added most of the tests
along with debug diagnostics in convert.go to make it easier
to test this change.

See the writeup in #71359 for details.

Fixes #71359
Fixes #71323
Updates #62653
Updates #53465
Updates #8618

Change-Id: I8924f0c69ff738ea33439bd6af7b4066af493b90
Reviewed-on: https://go-review.googlesource.com/c/go/+/649555
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-05-21 12:23:26 -07:00
Junyang Shao
d6c29c7156 cmd/compile: fix offset calculation error in memcombine
Fixes #73812

Change-Id: If7a6e103ae9e1442a2cf4a3c6b1270b6a1887196
Reviewed-on: https://go-review.googlesource.com/c/go/+/675175
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-21 12:17:08 -07:00
Xiaolin Zhao
4ce1c8e9e1 cmd/compile: add rules about ORN and ANDN
Reduce the number of go toolchain instructions on loong64 as follows.

    file      before    after     Δ       %
    addr2line 279880    279776  -104   -0.0372%
    asm       556638    556410  -228   -0.0410%
    buildid   272272    272072  -200   -0.0735%
    cgo       481522    481318  -204   -0.0424%
    compile   2457788   2457580 -208   -0.0085%
    covdata   323384    323280  -104   -0.0322%
    cover     518450    518234  -216   -0.0417%
    dist      340790    340686  -104   -0.0305%
    distpack  282456    282252  -204   -0.0722%
    doc       789932    789688  -244   -0.0309%
    fix       324332    324228  -104   -0.0321%
    link      704622    704390  -232   -0.0329%
    nm        277132    277028  -104   -0.0375%
    objdump   507862    507758  -104   -0.0205%
    pack      221774    221674  -100   -0.0451%
    pprof     1469816   1469552 -264   -0.0180%
    test2json 254836    254732  -104   -0.0408%
    trace     1100002   1099738 -264   -0.0240%
    vet       781078    780874  -204   -0.0261%
    go        1529116   1528848 -268   -0.0175%
    gofmt     318556    318448  -108   -0.0339%
    total     13792238 13788566 -3672  -0.0266%

Change-Id: I23fb3ebd41309252c7075e57ea7094e79f8c4fef
Reviewed-on: https://go-review.googlesource.com/c/go/+/674335
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
2025-05-21 08:28:37 -07:00
Xiaolin Zhao
d37a1bdd48 cmd/compile: fix the implementation of NORconst on loong64
In the loong64 instruction set, there is no NORI instruction,
so the immediate value in NORconst need to be stored in register
and then use the three-register NOR instruction.

Change-Id: I5ef697450619317218cb3ef47fc07e238bdc2139
Reviewed-on: https://go-review.googlesource.com/c/go/+/673836
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-20 20:24:09 -07:00
Junyang Shao
113b25774e cmd/compile: memcombine different size stores
This CL implements the TODO in combineStores to allow combining
stores of different sizes, as long as the total size aligns to
2, 4, 8.

Fixes #72832.

Change-Id: I6d1d471335da90d851ad8f3b5a0cf10bdcfa17c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/661855
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-20 13:00:16 -07:00
Julian Zhu
dfebef1c04 cmd/compile: fold negation into addition/subtraction on arm64
Fold negation into addition/subtraction and avoid double negation.

platform: linux/arm64

file      before    after     Δ       %
addr2line 3628108   3628116   +8      +0.000%
asm       6208353   6207857   -496    -0.008%
buildid   3460682   3460418   -264    -0.008%
cgo       5572988   5572492   -496    -0.009%
compile   26042159  26041039  -1120   -0.004%
cover     6304328   6303472   -856    -0.014%
dist      4139330   4139098   -232    -0.006%
doc       9429305   9428065   -1240   -0.013%
fix       3997189   3996733   -456    -0.011%
link      8212128   8210280   -1848   -0.023%
nm        3620056   3619696   -360    -0.010%
objdump   5920289   5919233   -1056   -0.018%
pack      2892250   2891778   -472    -0.016%
pprof     17094569  17092745  -1824   -0.011%
test2json 3335825   3335529   -296    -0.009%
trace     15842080  15841456  -624    -0.004%
vet       9472194   9471106   -1088   -0.011%
go        19081541  19081509  -32     -0.000%
total     154253374 154240622 -12752  -0.008%

platform: darwin/arm64

file    before    after     Δ       %
compile 27152002  27135490  -16512  -0.061%
link    8372914   8356402   -16512  -0.197%
go      19154802  19154778  -24     -0.000%
total   157734180 157701132 -33048  -0.021%

Change-Id: I15a349bfbaf7333ec3e4a62ae4d06f3f371dfb1d
Reviewed-on: https://go-review.googlesource.com/c/go/+/673715
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-20 11:08:28 -07:00
Keith Randall
3baf53aec6 cmd/compile: derive bounds on signed %N for N a power of 2
-N+1 <= x % N <= N-1

This is useful for cases like:

func setBit(b []byte, i int) {
    b[i/8] |= 1<<(i%8)
}

The shift does not need protection against larger-than-7 cases.
(It does still need protection against <0 cases.)

Change-Id: Idf83101386af538548bfeb6e2928cea855610ce2
Reviewed-on: https://go-review.googlesource.com/c/go/+/672995
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-19 15:21:54 -07:00
Julian Zhu
d52679006c cmd/compile: fold negation into addition/subtraction on mipsx
Fold negation into addition/subtraction and avoid double negation.

file      before    after     Δ       %
addr2line 3742022   3741986   -36     -0.001%
asm       6668616   6668628   +12     +0.000%
buildid   3583786   3583630   -156    -0.004%
cgo       6020370   6019634   -736    -0.012%
compile   29416016  29417336  +1320   +0.004%
cover     6801903   6801675   -228    -0.003%
dist      4485916   4485816   -100    -0.002%
doc       10652787  10652251  -536    -0.005%
fix       4115988   4115560   -428    -0.010%
link      9002328   9001616   -712    -0.008%
nm        3733148   3732780   -368    -0.010%
objdump   6163292   6163068   -224    -0.004%
pack      2944768   2944604   -164    -0.006%
pprof     18909973  18908773  -1200   -0.006%
test2json 3394662   3394778   +116    +0.003%
trace     17350911  17349751  -1160   -0.007%
vet       10077727  10077527  -200    -0.002%
go        19118769  19118609  -160    -0.001%
total     166182982 166178022 -4960   -0.003%

Change-Id: Id55698800fd70f3cb2ff48393584456b87208921
Reviewed-on: https://go-review.googlesource.com/c/go/+/673556
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-05-19 11:27:35 -07:00
Julian Zhu
8097cf14d2 cmd/compile: fold negation into addition/subtraction on mips64x
Fold negation into addition/subtraction and avoid double negation.

file      before    after     Δ       %
addr2line 4007310   4007470   +160    +0.004%
asm       7007636   7007436   -200    -0.003%
buildid   3839268   3838972   -296    -0.008%
cgo       6353466   6352738   -728    -0.011%
compile   30426920  30426896  -24     -0.000%
cover     7005408   7004744   -664    -0.009%
dist      4651192   4650872   -320    -0.007%
doc       10606050  10606034  -16     -0.000%
fix       4446414   4446390   -24     -0.001%
link      9237736   9237024   -712    -0.008%
nm        3999107   3999323   +216    +0.005%
objdump   6762424   6762144   -280    -0.004%
pack      3270757   3270493   -264    -0.008%
pprof     19428299  19361939  -66360  -0.342%
test2json 3717345   3717217   -128    -0.003%
trace     17382273  17381657  -616    -0.004%
vet       10689481  10688985  -496    -0.005%
go        19118769  19118609  -160    -0.001%
total     171949855 171878943 -70912  -0.041%

Change-Id: I35c1f264d216c214ea3f56252a9ddab8ea850fa6
Reviewed-on: https://go-review.googlesource.com/c/go/+/673555
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-16 11:06:06 -07:00
Keith Randall
d681270714 cmd/compile: allow load-op merging in additional situations
x += *p

We want to do this with a single load+add operation on amd64.
The tricky part is that we don't want to combine if there are
other uses of x after this instruction.

Implement a simple detector that seems to capture a common situation -
x += *p is in a loop, and the other use of x is after loop exit.
In that case, it does not hurt to do the load+add combo.

Change-Id: I466174cce212e78bde83f908cc1f2752b560c49c
Reviewed-on: https://go-review.googlesource.com/c/go/+/672957
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-15 15:21:36 -07:00
Keith Randall
19f05770b0 cmd/compile: schedule induction variable increments late
for ..; ..; i++ {
 ...
}

We want to schedule the i++ late in the block, so that all other
uses of i in the block are scheduled first. That way, i++ can
happen in place in a register instead of requiring a temporary register.

Change-Id: Id777407c7e67a5ddbd8e58251099b0488138c0df
Reviewed-on: https://go-review.googlesource.com/c/go/+/672998
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-05-15 14:06:41 -07:00
Xiaolin Zhao
c31a5c571f cmd/compile: fold negation into addition/subtraction on loong64
This change also avoid double negation, and add loong64 codegen for arithmetic tests.
Reduce the number of go toolchain instructions on loong64 as follows.

    file      before    after     Δ       %
    addr2line 279972    279896  -76    -0.0271%
    asm       556390    556310  -80    -0.0144%
    buildid   272376    272300  -76    -0.0279%
    cgo       481534    481550  +16    +0.0033%
    compile   2457992   2457396 -596   -0.0242%
    covdata   323488    323404  -84    -0.0260%
    cover     518630    518490  -140   -0.0270%
    dist      340894    340814  -80    -0.0235%
    distpack  282568    282484  -84    -0.0297%
    doc       790224    789984  -240   -0.0304%
    fix       324408    324348  -60    -0.0185%
    link      704910    704666  -244   -0.0346%
    nm        277220    277144  -76    -0.0274%
    objdump   508026    507878  -148   -0.0291%
    pack      221810    221786  -24    -0.0108%
    pprof     1470284   1469880 -404   -0.0275%
    test2json 254896    254852  -44    -0.0173%
    trace     1100390   1100074 -316   -0.0287%
    vet       781398    781142  -256   -0.0328%
    go        1529668   1529128 -540   -0.0353%
    gofmt     318668    318568  -100   -0.0314%
    total     13795746 13792094 -3652  -0.0265%

Change-Id: I88d1f12cfc4be0e92687c48e06a57213aa484aca
Reviewed-on: https://go-review.googlesource.com/c/go/+/672555
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-14 17:46:58 -07:00
Jakub Ciolek
c9d0fad5cb cmd/compile: add 2 phiopt cases
Add 2 more cases:

if a { x = value } else { x = a } => x = a && value
if a { x = a } else { x = value } => x = a || value

AND case goes from:

00006 (8)	TESTB	AX, AX
00007 (8)	JNE	9
00008 (13)	MOVL	AX, BX
00009 (13)	MOVL	BX, AX
00010 (13)	RET

to:

00006 (13)	ANDL	BX, AX
00007 (13)	RET

OR goes from:

00006 (19)	TESTB	AX, AX
00007 (19)	JNE	9
00008 (24)	MOVL	BX, AX
00009 (24)	RET

to:

00006 (24)	ORL	BX, AX
00007 (24)	RET

compilecmp linux/amd64:

runtime
runtime.lock2 847 -> 869  (+2.60%)
runtime.addspecial 542 -> 517  (-4.61%)
runtime.tracebackPCs changed
runtime.scanstack changed
runtime.mallocinit changed
runtime.traceback2 2238 -> 2206  (-1.43%)

runtime [cmd/compile]
runtime.lock2 860 -> 882  (+2.56%)
runtime.scanstack changed
runtime.addspecial 542 -> 517  (-4.61%)
runtime.traceback2 2238 -> 2206  (-1.43%)
runtime.lockWithRank 870 -> 890  (+2.30%)
runtime.tracebackPCs changed
runtime.mallocinit changed

strconv
strconv.ryuFtoaFixed32 changed
strconv.ryuFtoaFixed64 639 -> 638  (-0.16%)
strconv.readFloat changed
strconv.ryuFtoaShortest changed

strings
strings.(*Replacer).build changed

strconv [cmd/compile]
strconv.readFloat changed
strconv.ryuFtoaFixed64 639 -> 638  (-0.16%)
strconv.ryuFtoaFixed32 changed
strconv.ryuFtoaShortest changed

strings [cmd/compile]
strings.(*Replacer).build changed

regexp
regexp.makeOnePass.func1 changed

regexp [cmd/compile]
regexp.makeOnePass.func1 changed

encoding/json
encoding/json.indirect changed

database/sql
database/sql.driverArgsConnLocked changed

vendor/golang.org/x/text/unicode/norm
vendor/golang.org/x/text/unicode/norm.Form.transform changed

go/doc/comment
go/doc/comment.parseSpans changed

internal/diff
internal/diff.tgs changed

log/slog
log/slog.(*handleState).appendNonBuiltIns 1898 -> 1877  (-1.11%)

testing/fstest
testing/fstest.(*fsTester).checkGlob changed

runtime/pprof
runtime/pprof.(*profileBuilder).build changed

cmd/internal/dwarf
cmd/internal/dwarf.isEmptyInlinedCall 254 -> 244  (-3.94%)

go/printer
go/printer.keepTypeColumn 302 -> 270  (-10.60%)
go/printer.(*printer).binaryExpr changed

cmd/compile/internal/syntax
cmd/compile/internal/syntax.(*scanner).rune changed
cmd/compile/internal/syntax.(*scanner).number 2137 -> 2153  (+0.75%)

Change-Id: I7f95f54b03a35d0b616c40f38b415a7feb71be73
Reviewed-on: https://go-review.googlesource.com/c/go/+/666835
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Jakub Ciolek <jakub@ciolek.dev>
TryBot-Bypass: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-08 10:18:37 -07:00
Keith Randall
12110c3f7e cmd/compile: improve multiplication strength reduction
Use an automatic algorithm to generate strength reduction code.
You give it all the linear combination (a*x+b*y) instructions in your
architecture, it figures out the rest.

Just amd64 and arm64 for now.

Fixes #67575

Change-Id: I35c69382bebb1d2abf4bb4e7c43fd8548c6c59a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/626998
Reviewed-by: Jakub Ciolek <jakub@ciolek.dev>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-01 09:33:31 -07:00
Joel Sing
4d10d4ad84 cmd/compile,internal/cpu,runtime: intrinsify math/bits.OnesCount on riscv64
For riscv64/rva22u64 and above, we can intrinsify math/bits.OnesCount
using the CPOP/CPOPW machine instructions. Since the native Go
implementation of OnesCount is relatively expensive, it is also
worth emitting a check for Zbb support when compiled for rva20u64.

On a Banana Pi F3, with GORISCV64=rva22u64:

              │     oc.1     │                oc.2                 │
              │    sec/op    │   sec/op     vs base                │
OnesCount-8     16.930n ± 0%   4.389n ± 0%  -74.08% (p=0.000 n=10)
OnesCount8-8     5.642n ± 0%   5.016n ± 0%  -11.10% (p=0.000 n=10)
OnesCount16-8    9.404n ± 0%   5.015n ± 0%  -46.67% (p=0.000 n=10)
OnesCount32-8   13.165n ± 0%   4.388n ± 0%  -66.67% (p=0.000 n=10)
OnesCount64-8   16.300n ± 0%   4.388n ± 0%  -73.08% (p=0.000 n=10)
geomean          11.40n        4.629n       -59.40%

On a Banana Pi F3, compiled with GORISCV64=rva20u64 and with Zbb
detection enabled:

              │     oc.3     │                oc.4                 │
              │    sec/op    │   sec/op     vs base                │
OnesCount-8     16.930n ± 0%   5.643n ± 0%  -66.67% (p=0.000 n=10)
OnesCount8-8     5.642n ± 0%   5.642n ± 0%        ~ (p=0.447 n=10)
OnesCount16-8   10.030n ± 0%   6.896n ± 0%  -31.25% (p=0.000 n=10)
OnesCount32-8   13.170n ± 0%   5.642n ± 0%  -57.16% (p=0.000 n=10)
OnesCount64-8   16.300n ± 0%   5.642n ± 0%  -65.39% (p=0.000 n=10)
geomean          11.55n        5.873n       -49.16%

On a Banana Pi F3, compiled with GORISCV64=rva20u64 but with Zbb
detection disabled:

              │    oc.3     │                oc.5                 │
              │   sec/op    │   sec/op     vs base                │
OnesCount-8     16.93n ± 0%   29.47n ± 0%  +74.07% (p=0.000 n=10)
OnesCount8-8    5.642n ± 0%   5.643n ± 0%        ~ (p=0.191 n=10)
OnesCount16-8   10.03n ± 0%   15.05n ± 0%  +50.05% (p=0.000 n=10)
OnesCount32-8   13.17n ± 0%   18.18n ± 0%  +38.04% (p=0.000 n=10)
OnesCount64-8   16.30n ± 0%   21.94n ± 0%  +34.60% (p=0.000 n=10)
geomean         11.55n        15.84n       +37.16%

For hardware without Zbb, this adds ~5ns overhead, while for hardware
with Zbb we achieve a performance gain up of up to 11ns. It is worth
noting that OnesCount8 is cheap enough that it is preferable to stick
with the generic version in this case.

Change-Id: Id657e40e0dd1b1ab8cc0fe0f8a68df4c9f2d7da5
Reviewed-on: https://go-review.googlesource.com/c/go/+/660856
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-01 05:57:41 -07:00
Joel Sing
90e8b8cdae cmd/compile: intrinsify math/bits.Bswap on riscv64
For riscv64/rva22u64 and above, we can intrinsify math/bits.Bswap
using the REV8 machine instruction.

On a StarFive VisionFive 2 with GORISCV64=rva22u64:

                 │     rb.1     │                rb.2                 │
                 │    sec/op    │   sec/op     vs base                │
ReverseBytes-4     18.790n ± 0%   4.026n ± 0%  -78.57% (p=0.000 n=10)
ReverseBytes16-4    6.710n ± 0%   5.368n ± 0%  -20.00% (p=0.000 n=10)
ReverseBytes32-4   13.420n ± 0%   5.368n ± 0%  -60.00% (p=0.000 n=10)
ReverseBytes64-4   17.450n ± 0%   4.026n ± 0%  -76.93% (p=0.000 n=10)
geomean             13.11n        4.649n       -64.54%

Change-Id: I26eee34270b1721f7304bb1cddb0fda129b20ece
Reviewed-on: https://go-review.googlesource.com/c/go/+/660855
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
2025-05-01 05:57:13 -07:00
Keith Randall
7d0cb2a2ad cmd/compile: constant fold 128-bit multiplies
The full 64x64->128 multiply comes up when using bits.Mul64.
The 64x64->64+overflow multiply comes up in unsafe.Slice when using
a constant length.

Change-Id: I298515162ca07d804b2d699d03bc957ca30a4ebc
Reviewed-on: https://go-review.googlesource.com/c/go/+/667175
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-22 10:24:18 -07:00
Keith Randall
8af32240c6 cmd/compile: don't evaluate side effects of range over array
If the thing we're ranging over is an array or ptr to array, and
it doesn't have a function call or channel receive in it, then we
shouldn't evaluate it.

Typecheck the ranged-over value as a constant in that case.
That makes the unified exporter replace the range expression
with a constant int.

Change-Id: I0d4ea081de70d20cf6d1fa8d25ef6cb021975554
Reviewed-on: https://go-review.googlesource.com/c/go/+/659317
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
2025-04-21 15:50:43 -07:00
limeidan
09d76e59d2 cmd/compile: set unalignedOK to make memcombine work properly on loong64
goos: linux
goarch: loong64
pkg: unicode/utf8
cpu: Loongson-3A6000-HV @ 2500.00MHz
                            │     old     │                 new                 │
                            │   sec/op    │   sec/op     vs base                │
ValidTenASCIIChars            7.604n ± 0%   6.805n ± 0%  -10.51% (p=0.000 n=10)
Valid100KASCIIChars           37.41µ ± 0%   16.58µ ± 0%  -55.67% (p=0.000 n=10)
ValidTenJapaneseChars         60.84n ± 0%   58.62n ± 0%   -3.64% (p=0.000 n=10)
ValidLongMostlyASCII          113.5µ ± 0%   113.5µ ± 0%        ~ (p=0.303 n=10)
ValidLongJapanese             204.6µ ± 0%   206.8µ ± 0%   +1.07% (p=0.000 n=10)
ValidStringTenASCIIChars      7.604n ± 0%   6.803n ± 0%  -10.53% (p=0.000 n=10)
ValidString100KASCIIChars     38.05µ ± 0%   17.14µ ± 0%  -54.97% (p=0.000 n=10)
ValidStringTenJapaneseChars   60.58n ± 0%   59.48n ± 0%   -1.82% (p=0.000 n=10)
ValidStringLongMostlyASCII    113.5µ ± 0%   113.4µ ± 0%   -0.10% (p=0.000 n=10)
ValidStringLongJapanese       205.9µ ± 0%   207.3µ ± 0%   +0.67% (p=0.000 n=10)
geomean                       3.324µ        2.756µ       -17.08%

Change-Id: Id43b6e2e41907bd4b92f421dacde31f048db47d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/662495
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@google.com>
2025-04-09 09:18:20 -07:00
Alexander Musman
16a6b71f18 cmd/compile: improve store-to-load forwarding with compatible types
Improve the compiler's store-to-load forwarding optimization by relaxing the
type comparison condition. Instead of requiring exact type equality (CMPeq),
we now use copyCompatibleType which allows forwarding between compatible
types where safe.

Fix several size comparison bugs in the nested store patterns. Previously,
we were comparing the size of the outer store with the load type,
rather than comparing with the size of the actual store being forwarded
from.

Skip OpConvert in dead store elimination to help get rid of dead stores such
as zeroing slices. OpConvert, like OpInlMark, doesn't really use the memory.

This optimization is particularly beneficial for code that creates slices with
computed pointers, such as the runtime's heapBitsSlice function, where
intermediate calculations were previously causing the compiler to miss
store-to-load forwarding opportunities.

Local sweet run result on an x86_64 laptop:

                       │  Orig.res   │              Hopt.res              │
                       │   sec/op    │   sec/op     vs base               │
BiogoIgor-8               5.303 ± 1%    5.322 ± 1%       ~ (p=0.190 n=10)
BiogoKrishna-8            7.894 ± 1%    7.828 ± 2%       ~ (p=0.190 n=10)
BleveIndexBatch100-8      2.257 ± 1%    2.248 ± 2%       ~ (p=0.529 n=10)
EtcdPut-8                30.12m ± 1%   30.03m ± 1%       ~ (p=0.796 n=10)
EtcdSTM-8                127.1m ± 1%   126.2m ± 0%  -0.74% (p=0.023 n=10)
GoBuildKubelet-8          52.21 ± 0%    52.05 ± 1%       ~ (p=0.063 n=10)
GoBuildKubeletLink-8      4.342 ± 1%    4.305 ± 0%  -0.85% (p=0.000 n=10)
GoBuildIstioctl-8         43.33 ± 0%    43.24 ± 0%  -0.22% (p=0.015 n=10)
GoBuildIstioctlLink-8     4.604 ± 1%    4.598 ± 0%       ~ (p=0.063 n=10)
GoBuildFrontend-8         15.33 ± 0%    15.29 ± 0%       ~ (p=0.143 n=10)
GoBuildFrontendLink-8    740.0m ± 1%   737.7m ± 1%       ~ (p=0.912 n=10)
GopherLuaKNucleotide-8    9.590 ± 1%    9.656 ± 1%       ~ (p=0.165 n=10)
MarkdownRenderXHTML-8    96.97m ± 1%   97.26m ± 2%       ~ (p=0.105 n=10)
Tile38QueryLoad-8        335.9µ ± 1%   335.6µ ± 1%       ~ (p=0.481 n=10)
geomean                   1.336         1.333       -0.22%

Change-Id: I031552623e6d5a3b1b5be8325e6314706e45534f
Reviewed-on: https://go-review.googlesource.com/c/go/+/662075
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Carlos Amedee <carlos@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-04-04 08:25:47 -07:00
Joel Sing
e6c2e12c63 cmd/compile/internal/ssa: optimise more branches with zero on riscv64
Optimise more branches with zero on riscv64. In particular, BLTU with
zero occurs with IsInBounds checks for index zero. This currently results
in two instructions and requires an additional register:

   li      t2, 0
   bltu    t2, t1, 0x174b4

This is equivalent to checking if the bounds is not equal to zero. With
this change:

   bnez    t1, 0x174c0

This removes more than 500 instructions from the Go binary on riscv64.

Change-Id: I6cd861d853e3ef270bd46dacecdfaa205b1c4644
Reviewed-on: https://go-review.googlesource.com/c/go/+/606715
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-03-28 01:27:22 -07:00
Mark Freeman
6722c008c1 cmd/compile: rename some test packages in codegen
All other files here use the codegen package.

Change-Id: I714162941b9fa9051dacc29643e905fe60b9304b
Reviewed-on: https://go-review.googlesource.com/c/go/+/661135
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-03-27 13:54:37 -07:00
Joel Sing
6bf95d40bb test/codegen: add combined conversion and shift tests
This adds tests for type conversion and shifts, detailing various
poor bad code generation that currently exists for riscv64. This
will be addressed in future CLs.

Change-Id: Ie1d366dfe878832df691600f8500ef383da92848
Reviewed-on: https://go-review.googlesource.com/c/go/+/615678
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-03-25 06:53:49 -07:00
Joel Sing
b70244ff7a cmd/compile: intrinsify math/bits.Len on riscv64
For riscv64/rva22u64 and above, we can intrinsify math/bits.Len using the
CLZ/CLZW machine instructions.

On a StarFive VisionFive 2 with GORISCV64=rva22u64:

                 │   clz.b.1   │               clz.b.2               │
                 │   sec/op    │   sec/op     vs base                │
LeadingZeros-4     28.89n ± 0%   12.08n ± 0%  -58.19% (p=0.000 n=10)
LeadingZeros8-4    18.79n ± 0%   14.76n ± 0%  -21.45% (p=0.000 n=10)
LeadingZeros16-4   25.27n ± 0%   14.76n ± 0%  -41.59% (p=0.000 n=10)
LeadingZeros32-4   25.12n ± 0%   12.08n ± 0%  -51.92% (p=0.000 n=10)
LeadingZeros64-4   25.89n ± 0%   12.08n ± 0%  -53.35% (p=0.000 n=10)
geomean            24.55n        13.09n       -46.70%

Change-Id: I0dda684713dbdf5336af393f5ccbdae861c4f694
Reviewed-on: https://go-review.googlesource.com/c/go/+/652321
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-03-21 18:21:44 -07:00
Joel Sing
6fb7bdc96d cmd/compile: intrinsify math/bits.TrailingZeros on riscv64
For riscv64/rva22u64 and above, we can intrinsify math/bits.TrailingZeros
using the CTZ/CTZW machine instructions.

On a StarFive VisionFive 2 with GORISCV64=rva22u64:

                  │   ctz.b.1    │               ctz.b.2               │
                  │    sec/op    │   sec/op     vs base                │
TrailingZeros-4     25.500n ± 0%   8.052n ± 0%  -68.42% (p=0.000 n=10)
TrailingZeros8-4     14.76n ± 0%   10.74n ± 0%  -27.24% (p=0.000 n=10)
TrailingZeros16-4    26.84n ± 0%   10.74n ± 0%  -59.99% (p=0.000 n=10)
TrailingZeros32-4   25.500n ± 0%   8.052n ± 0%  -68.42% (p=0.000 n=10)
TrailingZeros64-4   25.500n ± 0%   8.052n ± 0%  -68.42% (p=0.000 n=10)
geomean              23.09n        9.035n       -60.88%

Change-Id: I71edf2b988acb7a68e797afda4ee66d7a57d587e
Reviewed-on: https://go-review.googlesource.com/c/go/+/652320
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
2025-03-15 19:07:53 -07:00
Joel Sing
21417518a9 cmd/compile: combine negation and word sign extension on riscv64
Use NEGW to produce a negated and sign extended word, rather than doing
the same via two instructions:

   neg     t0, t0
   sext.w  a0, t0

Becomes:

   negw    t0, t0

Change-Id: I824ab25001bd3304bdbd435e7b244fcc036ef212
Reviewed-on: https://go-review.googlesource.com/c/go/+/652319
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-03-15 06:05:16 -07:00
Joel Sing
10d070668c cmd/compile/internal/ssa: remove double negation with addition on riscv64
On riscv64, subtraction from a constant is typically implemented as an
ADDI with the negative constant, followed by a negation. However this can
lead to multiple NEG/ADDI/NEG sequences that can be optimised out.

For example, runtime.(*_panic).nextDefer currently contains:

   lbu     t0, 0(t0)
   addi    t0, t0, -8
   neg     t0, t0
   addi    t0, t0, -7
   neg     t0, t0

Which is now optimised to:

   lbu     t0, 0(t0)
   addi    t0, t0, -1

Change-Id: Idf5815e6db2e3705cc4a4811ca9130a064ae3d80
Reviewed-on: https://go-review.googlesource.com/c/go/+/652318
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: David Chase <drchase@google.com>
2025-03-15 06:04:28 -07:00
Joel Sing
a8f2e63f2f test/codegen: add a test for negation and conversion to int32
Codify the current code generation used on riscv64 in this case.

Change-Id: If4152e3652fc19d0aa28b79dba08abee2486d5ae
Reviewed-on: https://go-review.googlesource.com/c/go/+/652317
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-15 06:02:57 -07:00
Joel Sing
e1f9013a58 test/codegen: add riscv64 codegen for arithmetic tests
Codify the current riscv64 code generation for various subtract from
constant and addition/subtraction tests.

Change-Id: I54ad923280a0578a338bc4431fa5bdc0644c4729
Reviewed-on: https://go-review.googlesource.com/c/go/+/652316
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: David Chase <drchase@google.com>
2025-03-15 06:02:27 -07:00
Joel Sing
c01fa0cc21 test/codegen: add riscv64/rva23u64 specifiers to existing tests
Tests that exist for riscv64/rva22u64 should also be applied to
riscv64/rva23u64.

Change-Id: Ia529fdf0ac55b8bcb3dcd24fa80efef2351f3842
Reviewed-on: https://go-review.googlesource.com/c/go/+/652315
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
2025-03-15 05:58:43 -07:00
Joel Sing
c1c7e5902f test/codegen: tighten the TrailingZeros64 test on 386
Make the TrailingZeros64 code generation check more specific for 386.
Just checking for BSFL will match both the generic 64 bit decomposition
and the custom 386 lowering.

Change-Id: I62076f1889af0ef1f29704cba01ab419cae0c6e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/656996
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-03-14 15:04:38 -07:00
Joel Sing
af92bb594d test/codegen: remove plan9/amd64 specific array zeroing/copying tests
The compiler previously avoided the use of MOVUPS on plan9/amd64. This
was changed in CL 655875, however the codegen tests were not updated
and now fail (seemingly the full codegen tests do not run anywhere,
not even on the longtest builders).

Change-Id: I388b60e7b0911048d4949c5029347f9801c018a9
Reviewed-on: https://go-review.googlesource.com/c/go/+/656997
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Auto-Submit: Keith Randall <khr@google.com>
2025-03-13 05:19:13 -07:00
Xiaolin Zhao
b143c98169 cmd/compile: simplify bounded shift on loong64
Use the shiftIsBounded function to generate more efficient shift instructions.
This change also optimize shift ops when the shift value is v&63 and v&31.

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000-HV @ 2500.00MHz
                |  CL 627855   |               this CL                |
                |    sec/op    |    sec/op     vs base                |
LeadingZeros      1.1005n ± 0%   0.8425n ± 1%  -23.44% (p=0.000 n=10)
LeadingZeros8      1.502n ± 0%    1.501n ± 0%   -0.07% (p=0.001 n=10)
LeadingZeros16     1.502n ± 0%    1.501n ± 0%   -0.07% (p=0.000 n=10)
LeadingZeros32    0.9511n ± 0%   0.8050n ± 0%  -15.36% (p=0.000 n=10)
LeadingZeros64    1.1195n ± 0%   0.8423n ± 0%  -24.76% (p=0.000 n=10)
TrailingZeros     0.8086n ± 0%   0.8005n ± 0%   -1.00% (p=0.000 n=10)
TrailingZeros8     1.031n ± 1%    1.035n ± 1%        ~ (p=0.136 n=10)
TrailingZeros16   0.8114n ± 0%   0.8254n ± 1%   +1.73% (p=0.000 n=10)
TrailingZeros32   0.8090n ± 0%   0.8005n ± 0%   -1.05% (p=0.000 n=10)
TrailingZeros64   0.8089n ± 1%   0.8005n ± 0%   -1.04% (p=0.000 n=10)
OnesCount         0.8677n ± 0%   1.2010n ± 0%  +38.41% (p=0.000 n=10)
OnesCount8        0.8009n ± 0%   0.8004n ± 0%   -0.06% (p=0.000 n=10)
OnesCount16       0.9344n ± 0%   1.2010n ± 0%  +28.53% (p=0.000 n=10)
OnesCount32       0.8677n ± 0%   1.2010n ± 0%  +38.41% (p=0.000 n=10)
OnesCount64       1.2010n ± 0%   0.8671n ± 0%  -27.80% (p=0.000 n=10)
RotateLeft        0.8009n ± 0%   0.6671n ± 0%  -16.71% (p=0.000 n=10)
RotateLeft8        1.202n ± 0%    1.327n ± 0%  +10.40% (p=0.000 n=10)
RotateLeft16      0.8036n ± 0%   0.8218n ± 0%   +2.26% (p=0.000 n=10)
RotateLeft32      0.6674n ± 0%   0.8004n ± 0%  +19.94% (p=0.000 n=10)
RotateLeft64      0.6674n ± 0%   0.8004n ± 0%  +19.94% (p=0.000 n=10)
Reverse           0.4067n ± 1%   0.4122n ± 1%   +1.38% (p=0.001 n=10)
Reverse8          0.8009n ± 0%   0.8004n ± 0%   -0.06% (p=0.000 n=10)
Reverse16         0.8009n ± 0%   0.8005n ± 0%   -0.05% (p=0.000 n=10)
Reverse32         0.8009n ± 0%   0.8004n ± 0%   -0.06% (p=0.001 n=10)
Reverse64         0.8009n ± 0%   0.8004n ± 0%   -0.06% (p=0.008 n=10)
ReverseBytes      0.4057n ± 1%   0.4133n ± 1%   +1.90% (p=0.000 n=10)
ReverseBytes16    0.8009n ± 0%   0.8004n ± 0%   -0.07% (p=0.000 n=10)
ReverseBytes32    0.8009n ± 0%   0.8005n ± 0%   -0.05% (p=0.000 n=10)
ReverseBytes64    0.8009n ± 0%   0.8004n ± 0%   -0.06% (p=0.000 n=10)
Add                1.201n ± 0%    1.201n ± 0%        ~ (p=1.000 n=10)
Add32              1.201n ± 0%    1.201n ± 0%        ~ (p=0.474 n=10)
Add64              1.201n ± 0%    1.201n ± 0%        ~ (p=1.000 n=10)
Add64multiple      1.832n ± 0%    1.828n ± 0%   -0.22% (p=0.001 n=10)
Sub                1.201n ± 0%    1.201n ± 0%        ~ (p=1.000 n=10)
Sub32              1.602n ± 0%    1.601n ± 0%   -0.06% (p=0.000 n=10)
Sub64              1.201n ± 0%    1.201n ± 0%        ~ (p=0.474 n=10)
Sub64multiple      2.402n ± 0%    2.400n ± 0%   -0.10% (p=0.000 n=10)
Mul               0.8009n ± 0%   0.8004n ± 0%   -0.06% (p=0.000 n=10)
Mul32             0.8009n ± 0%   0.8004n ± 0%   -0.06% (p=0.000 n=10)
Mul64             0.8008n ± 0%   0.8004n ± 0%   -0.05% (p=0.000 n=10)
Div                9.083n ± 0%    7.638n ± 0%  -15.91% (p=0.000 n=10)
Div32              4.011n ± 0%    4.009n ± 0%   -0.05% (p=0.000 n=10)
Div64              9.711n ± 0%    8.204n ± 0%  -15.51% (p=0.000 n=10)
geomean            1.083n         1.078n        -0.40%

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
                |  CL 627855   |               this CL                |
                |    sec/op    |    sec/op     vs base                |
LeadingZeros       1.341n ± 4%    1.331n ± 2%   -0.71% (p=0.008 n=10)
LeadingZeros8      1.781n ± 0%    1.766n ± 1%   -0.84% (p=0.011 n=10)
LeadingZeros16     1.782n ± 0%    1.767n ± 0%   -0.79% (p=0.001 n=10)
LeadingZeros32     1.341n ± 1%    1.333n ± 0%   -0.52% (p=0.001 n=10)
LeadingZeros64     1.338n ± 0%    1.333n ± 0%   -0.37% (p=0.008 n=10)
TrailingZeros     0.9025n ± 0%   0.8077n ± 0%  -10.50% (p=0.000 n=10)
TrailingZeros8     1.056n ± 0%    1.089n ± 1%   +3.17% (p=0.001 n=10)
TrailingZeros16    1.101n ± 0%    1.102n ± 0%   +0.09% (p=0.011 n=10)
TrailingZeros32   0.9024n ± 1%   0.8083n ± 0%  -10.43% (p=0.000 n=10)
TrailingZeros64   0.9028n ± 1%   0.8087n ± 0%  -10.43% (p=0.000 n=10)
OnesCount          1.482n ± 1%    1.302n ± 0%  -12.15% (p=0.000 n=10)
OnesCount8         1.206n ± 0%    1.207n ± 2%   +0.12% (p=0.000 n=10)
OnesCount16        1.534n ± 0%    1.402n ± 0%   -8.58% (p=0.000 n=10)
OnesCount32        1.531n ± 1%    1.302n ± 0%  -14.99% (p=0.000 n=10)
OnesCount64        1.302n ± 0%    1.538n ± 1%  +18.16% (p=0.000 n=10)
RotateLeft        0.8083n ± 0%   0.8087n ± 1%        ~ (p=0.579 n=10)
RotateLeft8        1.310n ± 0%    1.323n ± 0%   +0.95% (p=0.001 n=10)
RotateLeft16       1.149n ± 0%    1.165n ± 1%   +1.35% (p=0.001 n=10)
RotateLeft32      0.8093n ± 0%   0.8105n ± 0%        ~ (p=0.393 n=10)
RotateLeft64      0.8088n ± 0%   0.8090n ± 0%        ~ (p=0.739 n=10)
Reverse           0.5109n ± 0%   0.5172n ± 1%   +1.25% (p=0.000 n=10)
Reverse8          0.8010n ± 0%   0.8011n ± 0%   +0.01% (p=0.000 n=10)
Reverse16         0.8010n ± 0%   0.8011n ± 0%   +0.01% (p=0.002 n=10)
Reverse32         0.8010n ± 0%   0.8011n ± 0%   +0.01% (p=0.000 n=10)
Reverse64         0.8010n ± 0%   0.8011n ± 0%   +0.01% (p=0.005 n=10)
ReverseBytes      0.5122n ± 2%   0.5182n ± 1%        ~ (p=0.060 n=10)
ReverseBytes16    0.8010n ± 0%   0.8011n ± 0%   +0.01% (p=0.005 n=10)
ReverseBytes32    0.8010n ± 0%   0.8011n ± 0%   +0.01% (p=0.005 n=10)
ReverseBytes64    0.8010n ± 0%   0.8011n ± 0%   +0.01% (p=0.001 n=10)
Add                1.201n ± 4%    1.202n ± 0%   +0.08% (p=0.028 n=10)
Add32              1.201n ± 0%    1.202n ± 2%   +0.08% (p=0.014 n=10)
Add64              1.201n ± 1%    1.202n ± 0%   +0.08% (p=0.025 n=10)
Add64multiple      1.902n ± 0%    1.913n ± 0%   +0.55% (p=0.004 n=10)
Sub                1.201n ± 0%    1.202n ± 3%   +0.08% (p=0.001 n=10)
Sub32              1.654n ± 0%    1.656n ± 1%        ~ (p=0.117 n=10)
Sub64              1.201n ± 0%    1.202n ± 0%   +0.08% (p=0.001 n=10)
Sub64multiple      2.180n ± 4%    2.159n ± 1%   -0.96% (p=0.006 n=10)
Mul               0.9345n ± 0%   0.9346n ± 0%   +0.01% (p=0.000 n=10)
Mul32              1.030n ± 0%    1.050n ± 1%   +1.94% (p=0.000 n=10)
Mul64             0.9345n ± 0%   0.9346n ± 1%   +0.01% (p=0.000 n=10)
Div                11.57n ± 1%    11.12n ± 0%   -3.85% (p=0.000 n=10)
Div32              4.337n ± 1%    4.341n ± 1%        ~ (p=0.286 n=10)
Div64              12.76n ± 0%    12.02n ± 3%   -5.80% (p=0.000 n=10)
geomean            1.252n         1.235n        -1.32%

Change-Id: Iec4cfd2b83bb0f946068c1d657369ff081d95b04
Reviewed-on: https://go-review.googlesource.com/c/go/+/628575
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: David Chase <drchase@google.com>
2025-03-12 18:18:03 -07:00
Xiaolin Zhao
2a772a2fe7 cmd/compile: optimize shifts of int32 and uint32 on loong64
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000-HV @ 2500.00MHz
                |  bench.old   |              bench.new               |
                |    sec/op    |    sec/op     vs base                |
LeadingZeros       1.100n ± 1%    1.101n ± 0%        ~ (p=0.566 n=10)
LeadingZeros8      1.501n ± 0%    1.502n ± 0%   +0.07% (p=0.000 n=10)
LeadingZeros16     1.501n ± 0%    1.502n ± 0%   +0.07% (p=0.000 n=10)
LeadingZeros32    1.2010n ± 0%   0.9511n ± 0%  -20.81% (p=0.000 n=10)
LeadingZeros64     1.104n ± 1%    1.119n ± 0%   +1.40% (p=0.000 n=10)
TrailingZeros     0.8137n ± 0%   0.8086n ± 0%   -0.63% (p=0.001 n=10)
TrailingZeros8     1.031n ± 1%    1.031n ± 1%        ~ (p=0.956 n=10)
TrailingZeros16   0.8204n ± 1%   0.8114n ± 0%   -1.11% (p=0.000 n=10)
TrailingZeros32   0.8145n ± 0%   0.8090n ± 0%   -0.68% (p=0.000 n=10)
TrailingZeros64   0.8159n ± 0%   0.8089n ± 1%   -0.86% (p=0.000 n=10)
OnesCount         0.8672n ± 0%   0.8677n ± 0%   +0.06% (p=0.000 n=10)
OnesCount8        0.8005n ± 0%   0.8009n ± 0%   +0.06% (p=0.000 n=10)
OnesCount16       0.9339n ± 0%   0.9344n ± 0%   +0.05% (p=0.000 n=10)
OnesCount32       0.8672n ± 0%   0.8677n ± 0%   +0.06% (p=0.000 n=10)
OnesCount64        1.201n ± 0%    1.201n ± 0%        ~ (p=0.474 n=10)
RotateLeft        0.8005n ± 0%   0.8009n ± 0%   +0.05% (p=0.000 n=10)
RotateLeft8        1.202n ± 0%    1.202n ± 0%        ~ (p=0.210 n=10)
RotateLeft16      0.8050n ± 0%   0.8036n ± 0%   -0.17% (p=0.002 n=10)
RotateLeft32      0.6674n ± 0%   0.6674n ± 0%        ~ (p=1.000 n=10)
RotateLeft64      0.6673n ± 0%   0.6674n ± 0%        ~ (p=0.072 n=10)
Reverse           0.4123n ± 0%   0.4067n ± 1%   -1.37% (p=0.000 n=10)
Reverse8          0.8005n ± 0%   0.8009n ± 0%   +0.05% (p=0.000 n=10)
Reverse16         0.8004n ± 0%   0.8009n ± 0%   +0.06% (p=0.000 n=10)
Reverse32         0.8004n ± 0%   0.8009n ± 0%   +0.06% (p=0.000 n=10)
Reverse64         0.8004n ± 0%   0.8009n ± 0%   +0.06% (p=0.001 n=10)
ReverseBytes      0.4100n ± 1%   0.4057n ± 1%   -1.06% (p=0.002 n=10)
ReverseBytes16    0.8004n ± 0%   0.8009n ± 0%   +0.07% (p=0.000 n=10)
ReverseBytes32    0.8005n ± 0%   0.8009n ± 0%   +0.05% (p=0.000 n=10)
ReverseBytes64    0.8005n ± 0%   0.8009n ± 0%   +0.05% (p=0.000 n=10)
Add                1.201n ± 0%    1.201n ± 0%        ~ (p=1.000 n=10)
Add32              1.201n ± 0%    1.201n ± 0%        ~ (p=0.474 n=10)
Add64              1.201n ± 0%    1.201n ± 0%        ~ (p=1.000 n=10)
Add64multiple      1.831n ± 0%    1.832n ± 0%        ~ (p=1.000 n=10)
Sub                1.201n ± 0%    1.201n ± 0%        ~ (p=1.000 n=10)
Sub32              1.601n ± 0%    1.602n ± 0%   +0.06% (p=0.000 n=10)
Sub64              1.201n ± 0%    1.201n ± 0%        ~ (p=0.474 n=10)
Sub64multiple      2.400n ± 0%    2.402n ± 0%   +0.10% (p=0.000 n=10)
Mul               0.8005n ± 0%   0.8009n ± 0%   +0.05% (p=0.000 n=10)
Mul32             0.8005n ± 0%   0.8009n ± 0%   +0.05% (p=0.000 n=10)
Mul64             0.8004n ± 0%   0.8008n ± 0%   +0.05% (p=0.000 n=10)
Div                9.107n ± 0%    9.083n ± 0%        ~ (p=0.255 n=10)
Div32              4.009n ± 0%    4.011n ± 0%   +0.05% (p=0.000 n=10)
Div64              9.705n ± 0%    9.711n ± 0%   +0.06% (p=0.000 n=10)
geomean            1.089n         1.083n        -0.62%

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
                |  bench.old   |              bench.new               |
                |    sec/op    |    sec/op     vs base                |
LeadingZeros       1.352n ± 0%    1.341n ± 4%   -0.81% (p=0.024 n=10)
LeadingZeros8      1.766n ± 0%    1.781n ± 0%   +0.88% (p=0.000 n=10)
LeadingZeros16     1.766n ± 0%    1.782n ± 0%   +0.88% (p=0.000 n=10)
LeadingZeros32     1.536n ± 0%    1.341n ± 1%  -12.73% (p=0.000 n=10)
LeadingZeros64     1.351n ± 1%    1.338n ± 0%   -0.96% (p=0.000 n=10)
TrailingZeros     0.9037n ± 0%   0.9025n ± 0%   -0.12% (p=0.020 n=10)
TrailingZeros8     1.087n ± 3%    1.056n ± 0%        ~ (p=0.060 n=10)
TrailingZeros16    1.101n ± 0%    1.101n ± 0%        ~ (p=0.211 n=10)
TrailingZeros32   0.9040n ± 0%   0.9024n ± 1%   -0.18% (p=0.017 n=10)
TrailingZeros64   0.9043n ± 0%   0.9028n ± 1%        ~ (p=0.118 n=10)
OnesCount          1.503n ± 2%    1.482n ± 1%   -1.43% (p=0.001 n=10)
OnesCount8         1.207n ± 0%    1.206n ± 0%   -0.12% (p=0.000 n=10)
OnesCount16        1.501n ± 0%    1.534n ± 0%   +2.13% (p=0.000 n=10)
OnesCount32        1.483n ± 1%    1.531n ± 1%   +3.27% (p=0.000 n=10)
OnesCount64        1.301n ± 0%    1.302n ± 0%   +0.08% (p=0.000 n=10)
RotateLeft        0.8136n ± 4%   0.8083n ± 0%   -0.66% (p=0.002 n=10)
RotateLeft8        1.311n ± 0%    1.310n ± 0%        ~ (p=0.786 n=10)
RotateLeft16       1.165n ± 0%    1.149n ± 0%   -1.33% (p=0.001 n=10)
RotateLeft32      0.8138n ± 1%   0.8093n ± 0%   -0.57% (p=0.017 n=10)
RotateLeft64      0.8149n ± 1%   0.8088n ± 0%   -0.74% (p=0.000 n=10)
Reverse           0.5195n ± 1%   0.5109n ± 0%   -1.67% (p=0.000 n=10)
Reverse8          0.8007n ± 0%   0.8010n ± 0%   +0.04% (p=0.000 n=10)
Reverse16         0.8007n ± 0%   0.8010n ± 0%   +0.04% (p=0.000 n=10)
Reverse32         0.8007n ± 0%   0.8010n ± 0%   +0.04% (p=0.012 n=10)
Reverse64         0.8007n ± 0%   0.8010n ± 0%   +0.04% (p=0.010 n=10)
ReverseBytes      0.5120n ± 1%   0.5122n ± 2%        ~ (p=0.306 n=10)
ReverseBytes16    0.8007n ± 0%   0.8010n ± 0%   +0.04% (p=0.000 n=10)
ReverseBytes32    0.8007n ± 0%   0.8010n ± 0%   +0.04% (p=0.000 n=10)
ReverseBytes64    0.8007n ± 0%   0.8010n ± 0%   +0.04% (p=0.000 n=10)
Add                1.201n ± 0%    1.201n ± 4%        ~ (p=0.334 n=10)
Add32              1.201n ± 0%    1.201n ± 0%        ~ (p=0.563 n=10)
Add64              1.201n ± 0%    1.201n ± 1%        ~ (p=0.652 n=10)
Add64multiple      1.909n ± 0%    1.902n ± 0%        ~ (p=0.126 n=10)
Sub                1.201n ± 0%    1.201n ± 0%        ~ (p=1.000 n=10)
Sub32              1.655n ± 0%    1.654n ± 0%        ~ (p=0.589 n=10)
Sub64              1.201n ± 0%    1.201n ± 0%        ~ (p=1.000 n=10)
Sub64multiple      2.150n ± 0%    2.180n ± 4%   +1.37% (p=0.000 n=10)
Mul               0.9341n ± 0%   0.9345n ± 0%   +0.04% (p=0.011 n=10)
Mul32              1.053n ± 0%    1.030n ± 0%   -2.23% (p=0.000 n=10)
Mul64             0.9341n ± 0%   0.9345n ± 0%   +0.04% (p=0.018 n=10)
Div                11.59n ± 0%    11.57n ± 1%        ~ (p=0.091 n=10)
Div32              4.337n ± 0%    4.337n ± 1%        ~ (p=0.783 n=10)
Div64              12.81n ± 0%    12.76n ± 0%   -0.39% (p=0.001 n=10)
geomean            1.257n         1.252n        -0.46%

Change-Id: I9e93ea49736760c19dc6b6463d2aa95878121b7b
Reviewed-on: https://go-review.googlesource.com/c/go/+/627855
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
2025-03-10 17:55:10 -07:00
Joel Sing
927fdb7843 cmd/compile: simplify intrinsification of TrailingZeros16 and TrailingZeros8
Decompose Ctz16 and Ctz8 within the SSA rules for LOONG64, MIPS, PPC64
and S390X, rather than having a custom intrinsic. Note that for PPC64 this
actually allows the existing Ctz16 and Ctz8 rules to be used.

Change-Id: I27a5e978f852b9d75396d2a80f5d7dfcb5ef7dd4
Reviewed-on: https://go-review.googlesource.com/c/go/+/651816
Reviewed-by: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-02-27 03:45:44 -08:00
Mateusz Poliwczak
43e6525986 cmd/compile: load properly constant values from itabs
While looking at the SSA of following code, i noticed
that these rules do not work properly, and the types
are loaded indirectly through an itab, instead of statically.

type M interface{ M() }
type A interface{ A() }

type Impl struct{}
func (*Impl) M() {}
func (*Impl) A() {}

func main() {
        var a M = &Impl{}
        a.(A).A()
}

Change-Id: Ia275993f81a2e7302102d4ff87ac28586023d13c
GitHub-Last-Rev: 4bfc901917
GitHub-Pull-Request: golang/go#71784
Reviewed-on: https://go-review.googlesource.com/c/go/+/649500
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-02-19 13:39:00 -08:00
Jakub Ciolek
d524e1eccd cmd/compile: on AMD64, turn x < 128 into x <= 127
x < 128 -> x <= 127
x >= 128 -> x > 127

This allows for shorter encoding as 127 fits into
a single-byte immediate.

archive/tar benchmark (Alder Lake 12600K)

name              old time/op    new time/op    delta
/Writer/USTAR-16    1.46µs ± 0%    1.32µs ± 0%  -9.43%  (p=0.008 n=5+5)
/Writer/GNU-16      1.85µs ± 1%    1.79µs ± 1%  -3.47%  (p=0.008 n=5+5)
/Writer/PAX-16      3.21µs ± 0%    3.11µs ± 2%  -2.96%  (p=0.008 n=5+5)
/Reader/USTAR-16    1.38µs ± 1%    1.37µs ± 0%    ~     (p=0.127 n=5+4)
/Reader/GNU-16       798ns ± 1%     800ns ± 2%    ~     (p=0.548 n=5+5)
/Reader/PAX-16      3.07µs ± 1%    3.00µs ± 0%  -2.35%  (p=0.008 n=5+5)
[Geo mean]          1.76µs         1.70µs       -3.15%

compilecmp:

hash/maphash
hash/maphash.(*Hash).Write 517 -> 510  (-1.35%)

runtime
runtime.traceReadCPU 1626 -> 1615  (-0.68%)

runtime [cmd/compile]
runtime.traceReadCPU 1626 -> 1615  (-0.68%)

math/rand/v2
type:.eq.[128]float32 65 -> 59  (-9.23%)

bytes
bytes.trimLeftUnicode 378 -> 373  (-1.32%)
bytes.IndexAny 1189 -> 1157  (-2.69%)
bytes.LastIndexAny 1256 -> 1239  (-1.35%)
bytes.lastIndexFunc 263 -> 261  (-0.76%)

strings
strings.FieldsFuncSeq.func1 411 -> 399  (-2.92%)
strings.EqualFold 625 -> 624  (-0.16%)
strings.trimLeftUnicode 248 -> 231  (-6.85%)

math/rand
type:.eq.[128]float32 65 -> 59  (-9.23%)

bytes [cmd/compile]
bytes.LastIndexAny 1256 -> 1239  (-1.35%)
bytes.lastIndexFunc 263 -> 261  (-0.76%)
bytes.trimLeftUnicode 378 -> 373  (-1.32%)
bytes.IndexAny 1189 -> 1157  (-2.69%)

regexp/syntax
regexp/syntax.(*parser).parseEscape 1113 -> 1102  (-0.99%)

math/rand/v2 [cmd/compile]
type:.eq.[128]float32 65 -> 59  (-9.23%)

strings [cmd/compile]
strings.EqualFold 625 -> 624  (-0.16%)
strings.FieldsFuncSeq.func1 411 -> 399  (-2.92%)
strings.trimLeftUnicode 248 -> 231  (-6.85%)

math/rand [cmd/compile]
type:.eq.[128]float32 65 -> 59  (-9.23%)

regexp
regexp.(*inputString).context 198 -> 197  (-0.51%)
regexp.(*inputBytes).context 221 -> 212  (-4.07%)

image/jpeg
image/jpeg.(*decoder).processDQT 500 -> 491  (-1.80%)

regexp/syntax [cmd/compile]
regexp/syntax.(*parser).parseEscape 1113 -> 1102  (-0.99%)

regexp [cmd/compile]
regexp.(*inputString).context 198 -> 197  (-0.51%)
regexp.(*inputBytes).context 221 -> 212  (-4.07%)

encoding/csv
encoding/csv.(*Writer).fieldNeedsQuotes 269 -> 266  (-1.12%)

cmd/vendor/golang.org/x/sys/unix
type:.eq.[131]struct 855 -> 823  (-3.74%)

vendor/golang.org/x/text/unicode/norm
vendor/golang.org/x/text/unicode/norm.nextDecomposed 4831 -> 4826  (-0.10%)
vendor/golang.org/x/text/unicode/norm.(*Iter).returnSlice 281 -> 275  (-2.14%)

vendor/golang.org/x/text/secure/bidirule
vendor/golang.org/x/text/secure/bidirule.init.0 85 -> 83  (-2.35%)

go/scanner
go/scanner.isDigit 100 -> 98  (-2.00%)
go/scanner.(*Scanner).next 431 -> 422  (-2.09%)
go/scanner.isLetter 142 -> 124  (-12.68%)

encoding/asn1
encoding/asn1.parseTagAndLength 1189 -> 1182  (-0.59%)
encoding/asn1.makeField 3481 -> 3463  (-0.52%)

text/scanner
text/scanner.(*Scanner).next 1242 -> 1236  (-0.48%)

archive/tar
archive/tar.isASCII 133 -> 127  (-4.51%)
archive/tar.(*Writer).writeRawFile 1206 -> 1198  (-0.66%)
archive/tar.(*Reader).readHeader.func1 9 -> 7  (-22.22%)
archive/tar.toASCII 393 -> 383  (-2.54%)
archive/tar.splitUSTARPath 405 -> 396  (-2.22%)
archive/tar.(*Writer).writePAXHeader.func1 627 -> 620  (-1.12%)

text/template
text/template.jsIsSpecial 59 -> 57  (-3.39%)

go/doc
go/doc.assumedPackageName 714 -> 701  (-1.82%)

vendor/golang.org/x/net/http/httpguts
vendor/golang.org/x/net/http/httpguts.headerValueContainsToken 965 -> 952  (-1.35%)
vendor/golang.org/x/net/http/httpguts.tokenEqual 280 -> 269  (-3.93%)
vendor/golang.org/x/net/http/httpguts.IsTokenRune 28 -> 26  (-7.14%)

net/mail
net/mail.isVchar 26 -> 24  (-7.69%)
net/mail.isAtext 106 -> 104  (-1.89%)
net/mail.(*Address).String 1084 -> 1052  (-2.95%)
net/mail.isQtext 39 -> 37  (-5.13%)
net/mail.isMultibyte 9 -> 7  (-22.22%)
net/mail.isDtext 45 -> 43  (-4.44%)
net/mail.(*addrParser).consumeQuotedString 1050 -> 1029  (-2.00%)
net/mail.quoteString 741 -> 714  (-3.64%)

cmd/internal/obj/s390x
cmd/internal/obj/s390x.preprocess 6405 -> 6393  (-0.19%)

cmd/internal/obj/x86
cmd/internal/obj/x86.toDisp8 303 -> 301  (-0.66%)

fmt [cmd/compile]
fmt.Fprintf 4726 -> 4662  (-1.35%)

go/scanner [cmd/compile]
go/scanner.(*Scanner).next 431 -> 422  (-2.09%)
go/scanner.isLetter 142 -> 124  (-12.68%)
go/scanner.isDigit 100 -> 98  (-2.00%)

cmd/compile/internal/syntax
cmd/compile/internal/syntax.(*source).nextch 879 -> 847  (-3.64%)

cmd/vendor/golang.org/x/mod/module
cmd/vendor/golang.org/x/mod/module.checkElem 1253 -> 1235  (-1.44%)
cmd/vendor/golang.org/x/mod/module.escapeString 519 -> 517  (-0.39%)

go/doc [cmd/compile]
go/doc.assumedPackageName 714 -> 701  (-1.82%)

cmd/compile/internal/syntax [cmd/compile]
cmd/compile/internal/syntax.(*scanner).escape 1965 -> 1933  (-1.63%)
cmd/compile/internal/syntax.(*scanner).next 8975 -> 8847  (-1.43%)

cmd/internal/obj/s390x [cmd/compile]
cmd/internal/obj/s390x.preprocess 6405 -> 6393  (-0.19%)

cmd/internal/obj/x86 [cmd/compile]
cmd/internal/obj/x86.toDisp8 303 -> 301  (-0.66%)

cmd/internal/gcprog
cmd/internal/gcprog.(*Writer).Repeat 688 -> 677  (-1.60%)
cmd/internal/gcprog.(*Writer).varint 442 -> 439  (-0.68%)

cmd/compile/internal/ir
cmd/compile/internal/ir.splitPkg 331 -> 325  (-1.81%)

cmd/compile/internal/ir [cmd/compile]
cmd/compile/internal/ir.splitPkg 331 -> 325  (-1.81%)

net/http
net/http.containsDotDot.FieldsFuncSeq.func1 411 -> 399  (-2.92%)
net/http.isNotToken 33 -> 30  (-9.09%)
net/http.containsDotDot 606 -> 588  (-2.97%)
net/http.isCookieNameValid 197 -> 191  (-3.05%)
net/http.parsePattern 4330 -> 4317  (-0.30%)
net/http.ParseCookie 1099 -> 1096  (-0.27%)
net/http.validMethod 197 -> 187  (-5.08%)

cmd/vendor/golang.org/x/text/unicode/norm
cmd/vendor/golang.org/x/text/unicode/norm.(*Iter).returnSlice 281 -> 275  (-2.14%)
cmd/vendor/golang.org/x/text/unicode/norm.nextDecomposed 4831 -> 4826  (-0.10%)

net/http/cookiejar
net/http/cookiejar.encode 1936 -> 1918  (-0.93%)

expvar
expvar.appendJSONQuote 972 -> 965  (-0.72%)

cmd/cgo/internal/test
cmd/cgo/internal/test.stack128 116 -> 114  (-1.72%)

cmd/vendor/rsc.io/markdown
cmd/vendor/rsc.io/markdown.newATXHeading 1637 -> 1628  (-0.55%)
cmd/vendor/rsc.io/markdown.isUnicodePunct 197 -> 179  (-9.14%)

Change-Id: I578bdf42ef229d687d526e378d697ced51e1880c
Reviewed-on: https://go-review.googlesource.com/c/go/+/639935
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-02-16 07:23:13 -08:00
Keith Randall
beac2f7d3b cmd/compile: fix sign extension of paired 32-bit loads on arm64
Fixes #71759

Change-Id: Iab05294ac933cc9972949158d3fe2bdc3073df5e
Reviewed-on: https://go-review.googlesource.com/c/go/+/649895
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-02-15 07:53:28 -08:00
Keith Randall
187fd2698d cmd/compile: make write barrier code amenable to paired loads/stores
It currently isn't because it does load/store/load/store/...
Rework to do overwrite processing in pairs so it is instead
load/load/store/store/...

Change-Id: If7be629bc4048da5f2386dafb8f05759b79e9e2b
Reviewed-on: https://go-review.googlesource.com/c/go/+/631495
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-13 14:08:14 -08:00
Keith Randall
a0029e95e5 cmd/compile: regalloc: handle desired registers of 2-output insns
Particularly with 2-word load instructions, this becomes important.
Classic example is:

    func f(p *string) string {
        return *p
    }

We want the two loads to put the return values directly into
the two ABI return registers.

At this point in the stack, cmd/go is 1.1% smaller.

Change-Id: I51fd1710238e81d15aab2bfb816d73c8e7c207b1
Reviewed-on: https://go-review.googlesource.com/c/go/+/631137
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-13 14:08:07 -08:00